Time Series Analysis

Time series analysis is a specific way of analyzing a sequence of data points collected over an interval of time. It requires a large number of data points to ensure consistency and reliability (Tableau.com). It considers that data points over a period of time may have autocorrelation or seasonal variation. The purpose of time series analysis is to understand the structure of time series data and to fit a model for forecasting. It can be used in economic forecasting, stock market analysis, sales forecasting, return projections. Different methods like Autoregression (AR), Moving Average (MR), Autoregressive Moving Average (ARMA), Autoregressive integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA) can be used for forecasting. Here, I will be using ARIMA model for forecasting the NEPSE index and GDP of Nepal.

ARIMA Model

ARIMA stands for Autoregressive Integrated Moving Average is model for forecasting time series data by making the time series into stationary. The condition for ARIMA model is that the time series data must be stationary which can be made by differencing. A non seasonal ARIMA model is defined as ARIMA(p,d,q) model where; p is the number of autoregressive terms, d is the number of non seasonal differences needed for stationarity and q is the number of lagged forecast errors in the prediction equation. The ARIMA model is:

Predicted Yt = Constant + Linear Combination Lags of Y (upto p lags) + Linear Combination of Lagged Forecast errors (upto q lags) \[Y_t = \alpha + \beta_1Y_{t-1}+\beta_2Y_{t-2}+...+\beta_pY_{1-p}\epsilon_t+\phi_1\epsilon_{t-1}+\phi_2\epsilon_{t-2}+...\phi_q\epsilon_{t-q}\] where,
\(Y_{t-1}\) is the lag1 of the series,
\(\beta_1\) is the coefficient of autoregressive model for lag 1 that the model estimate, and \(\alpha\) is the constant.
Similarly, \(\epsilon_{t-1}\) is the error term for lag 1 and \(\phi_1\) is the coefficient of moving average model for lag 1.

If \(d = 0: y_t = Y_t\)
If \(d = 1: y_t = Y_t - Y_{t-1}\)
If \(d = 2: y_t = (Y_t - Y_{t-1}) - (Y_{t-1} - Y_{t-2})\)

NEPSE index and GDP data

For the time series analysis, I have used the GDP data from 1960 to 2021. The data can be obtained from Worldbank. For forecasting the NEPSE index, I have used the daily NEPSE index from February 2014 to Mid-September 2022. The daily NEPSE indices have been aggregated to determine the monthly average NEPSE index for the time period. The daily NEPSE index data is obtained from Merolagani. My focus was to perform the time series analysis of NEPSE index and GDP data of Nepal. However, I also wanted to look how the GDP growth rate and NEPSE annual return are related. Hence, I have also looked at the correlation coefficient between these two variables.

GDP of Nepal

Plot

Table

NEPSE Index

Plot

Table

GDP growth and NEPSE return

Plot

Table

Correlation Coefficient between GDP growth rate and NEPSE annual return

Correlation Coefficient measures the strength of relationship between two variables. The magnitude of correlation coefficient ranges from -1.0 to +1.0. A correlation coefficient of -1.0 is perfect negative correlation i.e. when one variable increases by 1 unit other decreases by 1 unit. Similarly, a correlation coefficient of +1.0 is perfect positive correlation coefficient i.e. when one variable increases by 1 unit other variable increases by 1 unit. Karl Pearson Correlation Coefficient is given by: \[r = \frac{\sum(X-\bar{X})(Y-\bar{Y})}{{\sqrt{\sum(X-\bar{X})^2}}{\sqrt{\sum(Y-\bar{Y})^2}}}\] Using the Karl Pearson correlation coefficient on the data from 1998 to 2022, the correlation coefficient between NEPSE Annual Return and GDP growth rate is found to be -0.0077. This means insignificant (very weak) negative relationship between those two variables.

Forecasting

Forecasting GDP

I have used ARIMA model to forecast the GDP of Nepal. For forecasting, I have used the GDP data from 1960 - 2021.

Test of Stationarity

A stationary time series is one whose mean and variance don’t change with time. Thus, time series with trends, or with seasonality, are not stationary. While, a white noise series is stationary (Towardsdatascience).

## Warning in adf.test(gdp_nepal_ts, k = 15): p-value greater than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  gdp_nepal_ts
## Dickey-Fuller = 1.0306, Lag order = 15, p-value = 0.99
## alternative hypothesis: stationary

At lag = 15, p-value from Augmented Dickey-Fuller test is 0.99 which is greater than \(\alpha\)=0.05 at 95% confidence interval. This suggests that null hypothesis is accepted i.e. the time series data is not stationary.

Looking at the acf and pacf plot, we can see the vertical lines crossing the blue horizontal line is substantial. This supports the adf test.

ARIMA Model

Using the Akaike Information Criterion (AIC) for evaluating the ARIMA model, the best ARIMA model is ARIMA(2,2,1). AIC is the mathematical method for evaluating how well a model fits the data it was generated from. Among the ARIMA models generated, ARIMA(2,2,1) has the lowest AIC value i.e. 2662.796. In this model p = 2 which is the AR term, d = 2 is the number of differencing needed and q = 1 is the MA term.The coefficients of AR1, AR2 and MA1 are 0.0781, -0.3977 and -0.7538 respectively.

## 
##  ARIMA(2,2,2)                    : 2664.789
##  ARIMA(0,2,0)                    : 2689.463
##  ARIMA(1,2,0)                    : 2686.87
##  ARIMA(0,2,1)                    : 2666.652
##  ARIMA(1,2,2)                    : 2667.914
##  ARIMA(2,2,1)                    : 2662.796
##  ARIMA(1,2,1)                    : 2668.024
##  ARIMA(2,2,0)                    : 2672.262
##  ARIMA(3,2,1)                    : 2664.79
##  ARIMA(3,2,0)                    : 2668.664
##  ARIMA(3,2,2)                    : Inf
## 
##  Best model: ARIMA(2,2,1)
## Series: gdp_nepal_ts 
## ARIMA(2,2,1) 
## 
## Coefficients:
##          ar1      ar2      ma1
##       0.0781  -0.3977  -0.7538
## s.e.  0.1344   0.1370   0.0898
## 
## sigma^2 = 9.886e+17:  log likelihood = -1327.4
## AIC=2662.8   AICc=2663.52   BIC=2671.17

Test of Stationarity of the model

Plotting the acf and pacf plots of the residuals of the model show that the number of verticle lines crossing the horizontal blue lines have reduced significantly indicating the stationarity of the time series model.

Forecasting

Using the ARIMA(2,2,1) model, I have made the forecasting of GDP for next 10 years at confidence interval of 95%.

Forecasting NEPSE Index

For forecasting the NEPSE index, I have used the daily NEPSE index data from February 2014 to September 2022 which is aggregated for average monthly index during the period. The NEPSE index has been forecasted using the ARIMA model similar to above.

Test of Stationarity

The Augmented Dickey-Fuller test at the lag of 15 gives the p-value 0.429 which is much greater than \(\alpha\)=0.05 at 95% confidence interval. Hence, it supports that the null hypothesis is accepted i.e. the time series data is stationarity.
Similarly, acf and pacf plots have significant number of vertical lines crossing the blue horizontal line. This also supports the result of adf test.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  nepse_2014_2022_monthly
## Dickey-Fuller = -2.366, Lag order = 15, p-value = 0.4249
## alternative hypothesis: stationary

Model

Using the auto.arima() function, the best model is ARIMA(2,1,2) for which the AIC is 1255.602. For this model p = 2 is the AR term, d = 1 is the number of differencing required and q = 2 is the MA term. The coefficients of AR1, AR2, MA1 and MA2 are -0.0734, -0.604, 0.1496 and 0.8870.

## 
##  ARIMA(2,1,2) with drift         : 1256.861
##  ARIMA(0,1,0) with drift         : 1260.57
##  ARIMA(1,1,0) with drift         : 1258.491
##  ARIMA(0,1,1) with drift         : 1259.345
##  ARIMA(0,1,0)                    : 1259.542
##  ARIMA(1,1,2) with drift         : 1260.689
##  ARIMA(2,1,1) with drift         : 1260.703
##  ARIMA(3,1,2) with drift         : Inf
##  ARIMA(2,1,3) with drift         : Inf
##  ARIMA(1,1,1) with drift         : 1258.716
##  ARIMA(1,1,3) with drift         : 1257.68
##  ARIMA(3,1,1) with drift         : 1258.822
##  ARIMA(3,1,3) with drift         : Inf
##  ARIMA(2,1,2)                    : 1255.602
##  ARIMA(1,1,2)                    : 1259.152
##  ARIMA(2,1,1)                    : 1258.981
##  ARIMA(3,1,2)                    : Inf
##  ARIMA(2,1,3)                    : Inf
##  ARIMA(1,1,1)                    : 1257.009
##  ARIMA(1,1,3)                    : 1256.135
##  ARIMA(3,1,1)                    : 1257.256
##  ARIMA(3,1,3)                    : Inf
## 
##  Best model: ARIMA(2,1,2)
## Series: nepse_2014_2022_monthly 
## ARIMA(2,1,2) 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2
##       -0.0734  -0.6040  0.1496  0.8870
## s.e.   0.1811   0.1201  0.1187  0.0771
## 
## sigma^2 = 12144:  log likelihood = -622.8
## AIC=1255.6   AICc=1256.23   BIC=1268.73

Test of Stationarity test for model’s residuals

The acf and pacf plots have very minimum number of vertical lines crossing the blue horizontal lines indicating the stationarity of the model.

Forecasting

For forecasting the NEPSE index, I have used the daily NEPSE index from February 2014 to September 2022 from which average monthly NEPSE index has been calculated. Similar to GDP forecasting, I have used the ARIMA model and forecasted the NEPSE index for next 12 months.

Conclusion

ARIMA is found to be more effective especially for short term time series forecasting (Box 1970; Jarrett 1991). ARIMA model can increase the effectiveness of forecasting even with the minimum parameters. However, long term forecasting eventually goes to be straight line and it is poor at forecasting series with turning points.
This blog explains the basic of ARIMA model and presents the process to perform the time series forecasting using this model. For more information regarding ARIMA model, visit here. My code for time series analysis can be found here.
Lastly all thanks to Code For Nepal for providing me with opportunity to learn R.