A data set containing observations on a single phenomenon observed over multiple time periods is called time-series. In time-series data, both the values and the ordering of the data points have meaning. Some methods have also been developed for its analysis to suit the distinct features of time series data, which differ both from cross section and panel or pooled data. Various approaches are available for time series modeling (Paul, 2020).

This independent project #2 will cover about time series analysis using R as well as differences in time series analysis using stationary and non-stationary data. This independent project will explain step-by-step and structured as follow:

  1. Overview
  2. Checking and Modelling Process
  3. Running the Analysis
  4. Conclusion

Overview

This independent project will use daily stock information of PT Telkom Indonesia Tbk with a ticker code TLKM, from 2007 to 2023 with over 4000 information as time-series observation. First we will need to load some required packages for this project.

library(quantmod)   # to load the dataset we will be using
library(ggplot2)    # to create charts
library(gridExtra)  # to arrange multiple plots into one
library(forecast)   # to forecast our models
library(tseries)    # to perform multiple testing of our models later
library(zoo)        # to convert dataset format
library(kableExtra) # to create tidy tables
library(dplyr)      # to clean the dataset

To download our TLKM daily stock information, the following command will be executed.

test1=getSymbols("TLKM.JK",auto.assign=F)

The command above will get us TLKM daily stock information from 2007 until 2023 in the form of an xts object and store it in test1 as follow:

Limited to 5 rows only
Index TLKM.JK.Open TLKM.JK.High TLKM.JK.Low TLKM.JK.Close TLKM.JK.Volume TLKM.JK.Adjusted
2007-01-02 2070 2070 2040 2070 115235000 1298.511
2007-01-03 2070 2080 2050 2070 72295000 1298.511
2007-01-04 2030 2090 2030 2030 80587500 1273.419
2007-01-05 2030 2030 2010 2030 54030000 1273.419
2007-01-08 2000 2020 2000 2000 32652500 1254.600
Note:
More on the formatting of table using kable, see here.

As we can see, the stock information we downloaded provide us with information such as date, daily opening price, highest price as of date, lowest price as of date, daily closing price, transaction volume, and adjusted price.

However in this project, we will be using only daily closing price and daily return. To do that, we will be execute this following command:

a=test1%>%Cl()
b=dailyReturn(a)
test3=merge(a,b,all=F)|>
  fortify.zoo()|>
  rename(close="TLKM.JK.Close")|>
  rename(dr="daily.returns")|>
  rename(date="Index")

And we have a stock information consists of date, daily closing price, and daily return as follow:

Limited to 5 rows only
date close dr
2007-01-02 2070 0.0000000
2007-01-03 2070 0.0000000
2007-01-04 2030 -0.0193237
2007-01-05 2030 0.0000000
2007-01-08 2000 -0.0147783

Next we will visualize both daily stock information to see the differences and spot a specific pattern if any, using following commands:

fig1=test3|>
  ggplot(aes(x=date,y=close))+
  geom_line()+
  labs(x="Time",y="TLKM Closing Price")+
  ggtitle("Nonstationary")
fig2=test3|>
  ggplot(aes(x=date,y=dr))+
  geom_line()+
  labs(x="Time",y="TLKM Daily Return")+
  ggtitle("Stationary")
grid.arrange(fig1,fig2)

As we can see, fig1 is TLKM daily closing price over time with larger variance, inconstant mean, and positive-upward linear trend. Therefore it can be said that fig1 is the example of non-stationary movement. Non-stationary data, as a rule, are unpredictable and cannot be modeled or forecasted. On the other hand, fig2 is TLKM daily return, calculated as follow: \[ \frac{(Closing\text{ }Price)_t-(Closing\text{ }Price)_\text{t-1}}{(Closing\text{ }Price)_\text{t-1}} \] It can be visually visible in fig2 that the movement is never far from zero, which indicates that the daily return has smaller and constant variance, constant mean, and no trend at all. Therefore, fig2 is the example of stationary movement.

Checking and Modelling Process

In this section we will cover two processes, the first one is the time series analysis using non-stationary movement, while the second one is the time series analysis using stationary movement. The checking process begin from checking for stationary, running the Autocorrelation Function (ACF) plot, fit the model, and make a forecast.

Non-stationary Movements

As seen in fig1, the TLKM daily closing stock price is the example of non stationary movements. First we will check for the stationary using Augmented Dickey Fuller test as follow:

adf.test(test3$close)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  test3$close
## Dickey-Fuller = -2.3168, Lag order = 15, p-value = 0.4442
## alternative hypothesis: stationary

In statistics, an augmented Dickey–Fuller test (ADF) tests the null hypothesis that a unit root is present in a time series sample. The complete hypothesis is as follow: \[ H_0:\text{ }\gamma\text{ }\geqslant 0\\ H_a:\text{ }\gamma\text{ }< 0 \] The gamma symbol indicate that a unit root is present in the time series sample. The ADF test resulting in a p-value = 0.4442. This result showing that we failed to reject the null hypothesis and indicates that the time series object is non-stationary. Second we will plot the AC function to check for the stationarity of the time series object using the following commands:

ggAcf(test3$close,type="correlation")+ggtitle("ACF PLot: Nonstationary")+theme_classic()

In the figure above, we can see that ACF is geometrically declining with lags. Next we will fit four different time series model, Autoregressive (AR) model, Moving Average (MA) model, Autoregressive Moving Average (ARMA) model, and Autoregressive Integrated Moving Average (ARIMA) model as follow:

ns_ar=auto.arima(test3$close,max.d=0,max.q=0,allowdrift=T)
ns_ma=auto.arima(test3$close,max.d=0,max.p=0,allowdrift=T)
ns_arma=auto.arima(test3$close,max.d=0,allowdrift=T)
ns_arima=auto.arima(test3$close,allowdrift=T)

The auto.arima() function returns best ARIMA model according to either AIC, AICc or BIC value. The function conducts a search over possible model within the order constraints provided. Follow by check residuals of each model fit and plot the PAC function for each residuals using following commands:

res_ns_ar=resid(ns_ar)
res_ns_ma=resid(ns_ma)
res_ns_arma=resid(ns_arma)
res_ns_arima=resid(ns_arima)
fig3=ggAcf(res_ns_ar,type="partial")
fig4=ggAcf(res_ns_ma,type="partial")
fig5=ggAcf(res_ns_arma,type="partial")
fig6=ggAcf(res_ns_arima,type="partial")
grid.arrange(fig3,fig4,fig5,fig6,nrow=2,ncol=2)

Stationary Movements

In fig2, the TLKM daily return is the example of stationary movements. First we will check for the stationary using Augmented Dickey Fuller test as follow:

adf.test(test3$dr)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  test3$dr
## Dickey-Fuller = -17.082, Lag order = 15, p-value = 0.01
## alternative hypothesis: stationary

The ADF test resulting in a p-value < 0.01. This result suggest to reject the null hypothesis and indicates that the time series object is stationary. Second we will plot the AC function to check for the stationarity of the time series object using the following commands:

ggAcf(test3$dr,type="correlation")+ggtitle("ACF PLot: Stationary")+theme_classic()

Next we will fit four different time series model, Autoregressive (AR) model, Moving Average (MA) model, Autoregressive Moving Average (ARMA) model, and Autoregressive Integrated Moving Average (ARIMA) model as follow:

s_ar=auto.arima(test3$dr,max.d=0,max.q=0,allowdrift=T)
s_ma=auto.arima(test3$dr,max.d=0,max.p=0,allowdrift=T)
s_arma=auto.arima(test3$dr,max.d=0,allowdrift=T)
s_arima=auto.arima(test3$dr,allowdrift=T)

Follow by check residuals of each model fit and plot the PAC function for each residuals using following commands:

res_s_ar=resid(s_ar)
res_s_ma=resid(s_ma)
res_s_arma=resid(s_arma)
res_s_arima=resid(s_arima)
fig7=ggAcf(res_s_ar,type="partial")
fig8=ggAcf(res_s_ma,type="partial")
fig9=ggAcf(res_s_arma,type="partial")
fig10=ggAcf(res_s_arima,type="partial")
grid.arrange(fig7,fig8,fig9,fig10,nrow=2,ncol=2)

Findings

The Ljung-Box test is a hypothesis test that checks if a time series contains an autocorrelation. The null hypothesis is that the residuals are independently distributed. The alternative hypothesis is that the residuals are not independently distributed and exhibit a serial correlation. In this section, we will perform Ljung-Box test and forecast both non-stationary and stationary data

Non-stationary Movements

The following commands are use to perform Ljung-Box test on non-stationary time series movement.

Box.test(res_ns_ar,type="Ljung-Box",lag=1)
## 
##  Box-Ljung test
## 
## data:  res_ns_ar
## X-squared = 4001.9, df = 1, p-value < 2.2e-16
Box.test(res_ns_ma,type="Ljung-Box",lag=1)
## 
##  Box-Ljung test
## 
## data:  res_ns_ma
## X-squared = 483.16, df = 1, p-value < 2.2e-16
Box.test(res_ns_arma,type="Ljung-Box",lag=1)
## 
##  Box-Ljung test
## 
## data:  res_ns_arma
## X-squared = 483.16, df = 1, p-value < 2.2e-16
Box.test(res_ns_arima,type="Ljung-Box",lag=1)
## 
##  Box-Ljung test
## 
## data:  res_ns_arima
## X-squared = 0.038878, df = 1, p-value = 0.8437

From the Ljung-Box tests above, we can clearly see, only ARIMA model generate p-value = 0.8437 while the other model generate way smaller p-values. Thus indicating that the AR, MA, and ARMA model for non-stationary data, contains an autocorrelation.

However, even if three out of 4 non-stationary models contain autocorrelation, in this project we will keep using all of the models to make forecasts, and plot the forecasts. In reality, models that have indication of autocorrelation are exclude from forecasting process.

fc_ns_ar=forecast(ns_ar,h=360,level=80)
fc_ns_ma=forecast(ns_ma,h=360,level=80)
fc_ns_arma=forecast(ns_arma,h=360,level=80)
fc_ns_arima=forecast(ns_arima,h=360,level=80)
fig11=autoplot(fc_ns_ar)
fig12=autoplot(fc_ns_ma)
fig13=autoplot(fc_ns_arma)
fig14=autoplot(fc_ns_arima)
grid.arrange(fig11,fig12,fig13,fig14,nrow=4,ncol=1)

The forecast that we create in this project is for the next 360 days period with 80% interval confidence level.

Stationary Movements

The following commands are use to perform Ljung-Box test on stationary time series movement.

Box.test(res_s_ar,type="Ljung-Box",lag=1)
## 
##  Box-Ljung test
## 
## data:  res_s_ar
## X-squared = 0.0037895, df = 1, p-value = 0.9509
Box.test(res_s_ma,type="Ljung-Box",lag=1)
## 
##  Box-Ljung test
## 
## data:  res_s_ma
## X-squared = 0.0016149, df = 1, p-value = 0.9679
Box.test(res_s_arma,type="Ljung-Box",lag=1)
## 
##  Box-Ljung test
## 
## data:  res_s_arma
## X-squared = 0.00080955, df = 1, p-value = 0.9773
Box.test(res_s_arima,type="Ljung-Box",lag=1)
## 
##  Box-Ljung test
## 
## data:  res_s_arima
## X-squared = 0.00080955, df = 1, p-value = 0.9773

From the Ljung-Box tests above, we can clearly see, all model generate p-values over 0.05. Thus indicating that the AR, MA, ARMA, and ARIMA model for stationary data, free from any autocorrelation indication.

Since all models free from autocorrelation indication, we will use all of the models to make forecasts, and plot the forecasts using following commands:

fc_s_ar=forecast(s_ar,h=360,level=80)
fc_s_ma=forecast(s_ma,h=360,level=80)
fc_s_arma=forecast(s_arma,h=360,level=80)
fc_s_arima=forecast(s_arima,h=360,level=80)
fig15=autoplot(fc_s_ar)
fig16=autoplot(fc_s_ma)
fig17=autoplot(fc_s_arma)
fig18=autoplot(fc_s_arima)
grid.arrange(fig15,fig16,fig17,fig18,nrow=4,ncol=1)

Conclusion

The goal of this independent project is to compare the results of time series analysis and forecast using non-stationary and stationary object. From Ljung-Box tests, it can clearly visible that stationary objects are more stable to use for forecasting instead of non-stationary movements.

The problem with non-stationary data is that for most of the time series models, the model assumptions are violated when non-stationary data is used. This leads to the estimators no longer having the nice properties such as asymptotic normality and sometimes even consistency. So if we apply a model that requires a stationary series to a non-stationary series, you will likely get poor estimates of the model parameters and hence poor forecasts.

Moreover, the auto.arima() function can automatically identify the best ARIMA model fit, so we don’t have to perform trial-and-error to determine the best fit.