Time Series Forecasting

Siddharth Gupta
8 min readNov 30, 2022

--

What is Time Series?

Layman Words: Time Series is a sequence taken at successive, equally spaced moments. It is, therefore, a collection of discrete-time data. Some examples are given below.

  • The daily closing price of a stock.
  • Daily product sales for a store.
  • Each quarter’s unemployment for a state.
  • The everyday average price of gasoline.

Formal Definition: A time series is a sequence of different data points that happened one after another over a certain amount of time. A run chart is frequently used to plot a time series (a temporal line chart).

Note: Data collected irregularly or only once are not time series.

Time Series Analysis

Methods for analysing time series data to extract valuable statistics and other data features are called “time series analysis”. Time series analysis can be helpful to see how a given asset, security, or economic variable changes over time. It compares changes in the selected data point to changes in other variables during the same period.
The measurement of population change over time is one non-financial application of time series.

Time series forecasting employs a model to project future values based on already-seen values. Although regression analysis is frequently used to examine links between one or more different time series, this analysis is not typically referred to as “time series analysis,” specifically referring to relationships between various points in time within a single string.
Real-valued continuous data, discrete numeric data, and discrete symbolic data can all be subjected to time series analysis (i.e. sequences of characters, such as letters and words in the English language).

How should we Analyse Time Series?

Some quick steps for reference:

  • Gathering and Data Cleaning.
  • Creating a visualisation that considers time vs a critical feature.
  • Observing the series’ Stationarity.
  • Creating graphs to comprehend its nature.
  • Building models with AR, MA, ARMA, and ARIMA.
  • Gaining knowledge from prediction.

Elements of Time Series Analysis

  • Trend
  • Seasonality
  • Cyclical
  • Irregularity
  • Trend: In which each divergence within the provided dataset is a continuous timeline, and there is no predetermined interval. There would be a trend Positive, Negative, or Null trend.
  • Seasonality: A continuous timeline dataset with regular or fixed interval shifts. It would be a sawtooth or bell curve.
  • Cyclical: Lacking a defined interval, characterised by erratic movement and its pattern.
  • Irregularity: Unexpected situations, events, scenarios, and spikes that last only a short while are irregular.

Limitations of Time Series Analysis

The constraints of time series are listed below, and we must account for these when conducting our study.

  • TSA does not support the missing data.
  • The relationships between the data points must be linear.
  • Data transformations are costly because they are necessary.
  • Models primarily function with Uni-variate data.

Time Series Data Types:

While studying time series, there are two primary data types.

  • Stationary
  • Non- Stationary

Stationary: A dataset should adhere to general guidelines and exclude time series components with Trends, Seasonality, Cyclical, and Irregularities.

  • During the analysis, their MEAN value in the data should be constant.
  • The VARIANCE should consistently match the timeframe.
  • COVARIANCE calculates the correlation between two variables.
  • No Seasonality (Assumptions: There is one and the only assumption that is “stationary”, which means that the origin of time does not affect the properties of the process under the statistical factor)

Non-Stationary: It is the opposite of Stationarity.

How to check for Stationarity?

We must determine whether the provided dataset is stationary throughout the TSA model preparation phase.

  • Global vs Local check (find mean of global vs local and compare)
  • Visual Inspection

The above methods only sometimes work. Therefore we are using statistical and plot tests.

Statistical Test: Two tests are available to test the dataset for stationarity.

  • Augmented Dickey-Fuller (ADF) Test
  • Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

Augmented Dickey-Fuller (ADF) Test or Unit Root Test: The ADF test is the most popular statistical test with the following assumptions.

  • Null Hypothesis (H0): Series is non-stationary
  • Alternate Hypothesis (HA): Series is stationary
  • p-value >0.05 Fail to reject (H0)
  • p-value <= 0.05 Accept (H1)

Kwiatkowski–Phillips–Schmidt–Shin (KPSS):

These tests compare the NULL Hypothesis (HO), which contends that a unit root exists, against the alternative that the time series is stationary around a deterministic trend. We must ensure the dataset is steady because TSA needs stationary data for its additional analysis.

How to convert from Non-Stationary to Stationary?

Detrending: It includes removing the trend effects from the dataset that has been provided and just displaying the values that diverge from the trend. It always makes it possible to find cyclical patterns.

Differencing: The trend and seasonality are diminished through this straightforward translation of the series into a new time series, which we employ to eliminate the series’ dependence on time and stabilise the time series mean.

Yt= Yt — Yt-1
Yt=Value with time

Seasonal Differencing: Y(t) — Y(t-n)

Transformation: This uses the three techniques of the Power Transform, Square Root, and Log Transfer. Log Transfer is the one that is most frequently used.

Now, let us do Time Series Forecasting on Daily Minimum Temperature Dataset by using Auto Regressive Model.

Dataset available at: https://github.com/jbrownlee/Datasets/blob/master/daily-min-temperatures.csv

Implementation of Auto-Regressive Model

When there is some correlation between values in a particular time series and those that come before and after, a basic model called autoregression can estimate future performance based on past performance (back and forth).

  • Regression: A method for predicting an item’s continuous value from input factors.
  • Auto: Predicts future values using its historical importance.

A linear regression model incorporating lagged variables as input is an AR model. The sci-kit-learn package can build the linear regression model efficiently by providing the data. The statsmodels library offers functions that are particular to autoregression models, where you must train the model and supply the necessary lag value.

AR model equation: Yt =C+b1 Yt-1+ b2 Yt-2+……+ bp Yt-p+ Ert

  • Yt = C1Yt-1 + C2 (1st Order Auto Regression)
  • Yt = C1 + C2Yt-1 + C3Yt-2 (2nd Order Auto Regression)

(Let’s compare to Y=mX+c)

Key Parameters

  • p = past values
  • Yt = Function of different past values
  • Ert = errors in time
  • C = intercept

Importing all the necessary libraries

import numpy as np
import pandas as pd
from matplotlib import pyplot
from statsmodels.tsa.ar_model import AutoReg

Reading and Visualising the Dataset

df = pd.read_csv('DailyMinTemperatures.csv', index_col=0, parse_dates = True)
X = df.values
print('Shape of data \t', df.shape)
print('Orignal Dataset:\n', df.head())
print('After Extracting only temperature:\n', X)
df[:300].plot()
from pandas.plotting import lag_plot
lag_plot(df)
pyplot.show()
# We can see a large ball of observations along a diagonal line of the plot.
# It clearly shows a relationship or some correlation.
# Check for Stationarity
from statsmodels.tsa.stattools import adfuller

dftest = adfuller(df['Temp'], autolag = 'AIC')

print("1 . ADF : ", dftest[0])
print("2 . P-Value : ", dftest[1])
print("3 . Num of Lags : ", dftest[2])
print("4 . Num Of Observations Used For ADF Regression and Critical values Calculation : ", dftest[3])
print("5 . Critical Values : ")
for key, val in dftest[4].items():
print("\t", key, ": ", val)

We performed the ADF test to check for stationarity; the p-value comes out to be less than 0.5, so we will accept the alternative hypothesis, i.e. series is stationary.

Auto-Correlation Function (ACF): ACF shows how similar a value is to the initial value within a particular time series. (OR) It gauges the degree of resemblance between a time series under consideration and its lag-added counterpart at various points during our observations.

Autocorrelation is calculated using Python’s statsmodels package. In the given dataset, this is utilised to identify several trends and the impact of earlier observed values on present-day observations.

Partial Auto-Correlation (PACF): PACF needs further explanation because it resembles the Auto-Correlation Function. The correlation between the sequence and itself is always displayed across a certain amount of time units per sequence order, with all indirect effects(errors) being subtracted from the time series being shown.

Note: Both ACF and PACF require stationary time series for analysis.

from statsmodels.graphics.tsaplots import plot_pacf, plot_acf
pacf = plot_pacf(df['Temp'], lags = 25)
acf = plot_acf(df['Temp'], lags = 25)

Splitting the Dataset and Training Auto Regression

# Split Dataset into Training and Test: Last 7 Days
train = X[:len(X)-7]
test = X[len(X)-7:]
model = AutoReg(train, lags = 10).fit()
print(model.summary())

Making Predictions on Test Set and Plotting the Graph for Comparison

# Make Predictions on Test Set and Compare
pred = model.predict(start = len(train), end = len(X)-1, dynamic = False)
from matplotlib import pyplot
pyplot.plot(pred)
pyplot.plot(test, color = 'red')
print(pred)

Calculating Error

# Calculate Error
from math import sqrt
from sklearn.metrics import mean_squared_error
rmse = sqrt(mean_squared_error(test, pred))
print(rmse)

Making Future Predictions for the next 7 Days

# Make Future Predictions
pred_future = model.predict(start = len(X)+1, end = len(X)+7, dynamic = False)
print("The future prediction for the next week")
print(pred_future)
print('Number of Predictions Made: \t', len(pred_future))

Thank You for reading the blog; I hope you must have gained something from it. If you like this blog, give it a clap and share it with your friends.
Please give your views in the comments.
We can connect on LinkedIn https://www.linkedin.com/in/stardawgpower/

Sign up to discover human stories that deepen your understanding of the world.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Siddharth Gupta
Siddharth Gupta

No responses yet

Write a response