1. Introduction

In this report we will do stock forecasting using ARIMA. We will use EMTK(Elang Mahkota Teknologi Tbk PT) stock for this report. EMTK is an Indonesian technology, telecommunication and media conglomerate and headquartered in Jakarta. We are going to use stock price from January 1, 2016 until February 28, 2022.

2. Library and Data Input

library(dplyr)
library(ggplot2)
library(quantmod)
library(forecast)
library(TTR) 
library(MLmetrics)
library(tseries)
library(highcharter)
library(caTools)
library(rsconnect)

We are going to download stock price information from yahoo finance using getSymbols from quantmod library.

emtk6 <- getSymbols("EMTK.JK", src="yahoo", from="2016-01-01", to="2022-02-28", auto.assign = F)
head(emtk6)
##            EMTK.JK.Open EMTK.JK.High EMTK.JK.Low EMTK.JK.Close EMTK.JK.Volume
## 2016-01-04         1030         1030        1030          1030              0
## 2016-01-05         1000         1000         995           995          25000
## 2016-01-06          995          995         995           995              0
## 2016-01-07         1000         1000        1000          1000          10000
## 2016-01-08         1000         1000        1000          1000           2000
## 2016-01-11          980         1000         950          1000         138000
##            EMTK.JK.Adjusted
## 2016-01-04        1015.6310
## 2016-01-05         981.1193
## 2016-01-06         981.1193
## 2016-01-07         986.0496
## 2016-01-08         986.0496
## 2016-01-11         986.0496

The data that we obtain from yahoo finance is is OHLC data. We will use Adjusted Close Price.

3. Stock Price Chart

In this section we will visualize the stock price movement from 2016 to 2022 using chartSeries from quantmod library.

chartSeries(emtk6, name = "EMTK 2016-2022")

We are taking monthly adjusted price and convert to time series object.

ad_emtk <- Ad(to.monthly(emtk6))
ts_ad_emtk <- as.ts(ad_emtk, start=c(2016,1))

Decomposing monthly adjusted price using additive type due to no multiplication pattern in stock price chart.

dc_ad_emtk <- decompose(ts_ad_emtk,type = "additive")
plot(dc_ad_emtk)

Looking at decomposition above we can see that:

  • the trend tends to rise but is preceded by a slight dip before huge increase in the beginning of 2021.
  • for seasonal, EMTK price increase in January, and May - July. In August - November where the price dip and regain the position from December-January.
dc_ad_emtk$seasonal
##             Jan        Feb        Mar        Apr        May        Jun
## 2016   66.51741   29.07267   24.41173   57.94734   60.49764   76.63931
## 2017   66.51741   29.07267   24.41173   57.94734   60.49764   76.63931
## 2018   66.51741   29.07267   24.41173   57.94734   60.49764   76.63931
## 2019   66.51741   29.07267   24.41173   57.94734   60.49764   76.63931
## 2020   66.51741   29.07267   24.41173   57.94734   60.49764   76.63931
## 2021   66.51741   29.07267   24.41173   57.94734   60.49764   76.63931
## 2022   66.51741   29.07267                                            
##             Jul        Aug        Sep        Oct        Nov        Dec
## 2016   81.01844  -67.96790  -58.43586 -122.03846 -135.74797  -11.91436
## 2017   81.01844  -67.96790  -58.43586 -122.03846 -135.74797  -11.91436
## 2018   81.01844  -67.96790  -58.43586 -122.03846 -135.74797  -11.91436
## 2019   81.01844  -67.96790  -58.43586 -122.03846 -135.74797  -11.91436
## 2020   81.01844  -67.96790  -58.43586 -122.03846 -135.74797  -11.91436
## 2021   81.01844  -67.96790  -58.43586 -122.03846 -135.74797  -11.91436
## 2022

Below are interactive chart using highchart using highcharter library. We are going to compare in this interactive chart the price movement with Single Moving Average plot. We can use this plot to highlight some important technical analysis that affect our decision, such as golden or death cross. In other word, the golden cross is considered as a good time to buy stock and the death cross is considered as a good time to sell stock.

highchart(type="stock") %>% 
  hc_add_series(emtk6) %>% 
  hc_add_series(SMA(na.omit(Ad(emtk6)),n=50),name="SMA(50)") %>% 
  hc_add_series(SMA(na.omit(Ad(emtk6)),n=200),name="SMA(200)") %>% 
  hc_title(text="<b>EMTK Price Candle Stick Chart 2016-2022</b>")

We also want to compare EMTK with other similar companies in the same industries also comparing with JSKE as benchmark:

  • DMMX (Digital Mediatama Maxima PT)
  • MCAS (M Cash Integrasi PT)
  • JKSE (Jakarta Composite Index)

We are using getSymbols to download price data. We are going to use data from November 1, 2019 until 2022 due to difference in availability in stock price for DMMX and MCAS.

emtk_r <- getSymbols("EMTK.JK", src="yahoo", from="2019-11-01", to="2022-02-28", auto.assign = F)
dmmx <- getSymbols("DMMX.JK", src="yahoo", from="2019-11-01", to="2022-02-28", auto.assign = F)
mcas <- getSymbols("MCAS.JK", src="yahoo", from="2019-11-01", to="2022-02-28", auto.assign = F)
jkse <- getSymbols("^JKSE", src="yahoo", from="2019-11-01", to="2022-02-28", auto.assign = F)

We plot those stocks with highchart.

highchart(type="stock") %>% 
  hc_add_series(Ad(emtk_r), name="EMTK") %>% 
  hc_add_series(na.omit(Ad(dmmx)),name="DMMX") %>% 
  hc_add_series(na.omit(Ad(mcas)),name="MCAS") %>%
  hc_add_series(na.omit(Ad(jkse)),name="JKSE") %>%
  hc_title(text="<b>EMTK vs DMMX vs MCAS vs IHSG 2016-2022</b>")

As we can see from the comparison chart above, that MCAS and DMMX performed better than EMTK. Even MCAS cab outperform JKSE. Also we can see that MCAS and EMTK has upward trend after August 2021.

Beside stock price, we also want to compare the return between three stocks daily using adjusted closed price, with dailyReturn from quantmod library.

ret_emtk_r <- dailyReturn(Ad(emtk_r))
ret_dmmx <- dailyReturn(Ad(dmmx))
ret_mcas <- dailyReturn(Ad(mcas))

Combining return of three stocks above.

returns <- data.frame(ret_emtk_r,ret_dmmx,ret_mcas)
names(returns) <- c("ret_emtk","ret_dmmx","ret_mcas")
returns <- as.xts(returns)

We compare return using chart below.

library(PerformanceAnalytics)
## Warning: package 'PerformanceAnalytics' was built under R version 4.1.3
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend
charts.PerformanceSummary(returns,main="Daily Return EMTK vs DMMX vs MCAS 2019-2022")

From chart above, we can see that EMTK performs better than MCAS. And DMMX outperforms other two stocks.

4 ARIMA Forecasting

In this part, we are going to use ARIMA to forecast EMTK stock price.

4.1 Splitting Data for ARIMA

Before we split the data, we are going to change the price to log so it will be equally scaled (in case of multivariate time series) and the analysis will be easier.

emtk6_log_ad <- log(Ad(emtk6))

Below are plot of log price.

plot(emtk6_log_ad, main="EMTK 2016-2022 (Log)")

We will check the autocorrelation using acf and partial autocorrelation with pacf.

acf(emtk6_log_ad ,lag.max=380)

pacf(emtk6_log_ad ,lag.max=380)

Given by the ACF and PACF above, we can see that the data shows strong and significant autocorrelation up to lag 210 and then the autocorrelation starts reversing. For the PACF, significant autocorrelation appear in lag 2 and 3 and then the autocorrelation starts oscillating around 0. This is the sign of certain trend, but we are unsure whether the data has seasonality or not, given that the PACF does not have any significant seasonal pattern. Therefore we conclude that EMTK stock price is non-stationary.

Next thing we are going check autocorrelation using ADF Test.

adf.test(emtk6_log_ad)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  emtk6_log_ad
## Dickey-Fuller = -0.85894, Lag order = 11, p-value = 0.9564
## alternative hypothesis: stationary

We can see that p-value is 0.9564 which indicates autocorrelation, so we need to difference our data as shown by code below.

emtk_lgdiff <- diff(emtk6_log_ad,lag=1)
adf.test(emtk_lgdiff[-1,])
## Warning in adf.test(emtk_lgdiff[-1, ]): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  emtk_lgdiff[-1, ]
## Dickey-Fuller = -12.278, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary

After applying difference of 1, we can see p-value is below alpha (0.05), which indicates no autocorrelation.

emtk_lgdiff <- emtk_lgdiff[-1,]

We are going to split training and test data.

n=100
tsdiff_emtk <- tail(emtk_lgdiff,n)
trdiff_emtk <- head(emtk_lgdiff,-n)
ts_emtk <- tail(emtk6_log_ad,n)
tr_emtk <- head(emtk6_log_ad,-n)

4.2 ARIMA Model

4.2.1 Auto Process

Making ARIMA model using auto.

ar_emtk <- auto.arima(trdiff_emtk, stationary = T, trace=T)
## 
##  Fitting models using approximations to speed things up...
## 
##  ARIMA(2,0,2) with non-zero mean : -5864.683
##  ARIMA(0,0,0) with non-zero mean : -5854.343
##  ARIMA(1,0,0) with non-zero mean : -5860.727
##  ARIMA(0,0,1) with non-zero mean : -5862.13
##  ARIMA(0,0,0) with zero mean     : -5856.206
##  ARIMA(1,0,2) with non-zero mean : -5867.573
##  ARIMA(0,0,2) with non-zero mean : -5869.386
##  ARIMA(0,0,3) with non-zero mean : -5867.373
##  ARIMA(1,0,1) with non-zero mean : -5867.129
##  ARIMA(1,0,3) with non-zero mean : -5865.557
##  ARIMA(0,0,2) with zero mean     : -5871.181
##  ARIMA(0,0,1) with zero mean     : -5863.962
##  ARIMA(1,0,2) with zero mean     : -5869.328
##  ARIMA(0,0,3) with zero mean     : -5869.17
##  ARIMA(1,0,1) with zero mean     : -5868.844
##  ARIMA(1,0,3) with zero mean     : -5867.33
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(0,0,2) with zero mean     : -5871.177
## 
##  Best model: ARIMA(0,0,2) with zero mean
summary(ar_emtk)
## Series: trdiff_emtk 
## ARIMA(0,0,2) with zero mean 
## 
## Coefficients:
##           ma1      ma2
##       -0.0828  -0.0807
## s.e.   0.0262   0.0265
## 
## sigma^2 = 0.001007:  log likelihood = 2938.6
## AIC=-5871.19   AICc=-5871.18   BIC=-5855.36
## 
## Training set error measures:
##                        ME      RMSE        MAE MPE MAPE      MASE          ACF1
## Training set 0.0003842756 0.0317085 0.01642078 NaN  Inf 0.6374323 -0.0001137144

With auto.arima the solution with difference data is (0,0,2) for (p,d,q).

We are going to fit the model to training dataset.

forecast_0 <- forecast(ar_emtk,n)
plot(forecast_0)

checkresiduals(ar_emtk)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,2) with zero mean
## Q* = 13.998, df = 8, p-value = 0.08181
## 
## Model df: 2.   Total lags used: 10
Box.test(ar_emtk$residuals, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  ar_emtk$residuals
## X-squared = 1.8737e-05, df = 1, p-value = 0.9965

Here our forecast for 100 days ahead shows straight line. The p-value from Ljung Box test shows that the model residuals are non-autocorrelated, which indicates no heterocedasticity. The residuals of the model should follow normal distribution and stationary, this is the indication that the arima model fits the data well.

In this section we will generate arima from data that hasn’t been differenced and compare result with auto.arima. We are going to p,d,q for (0,1,2).

arima_emtk <- arima(tr_emtk, order=c(0,1,2))
summary(arima_emtk)
## 
## Call:
## arima(x = tr_emtk, order = c(0, 1, 2))
## 
## Coefficients:
##           ma1      ma2
##       -0.0828  -0.0807
## s.e.   0.0262   0.0265
## 
## sigma^2 estimated as 0.001005:  log likelihood = 2938.6,  aic = -5871.19
## 
## Training set error measures:
##                        ME       RMSE        MAE         MPE      MAPE     MASE
## Training set 0.0003887955 0.03169807 0.01641421 0.004147573 0.2411852 1.038112
##                       ACF1
## Training set -0.0002698326
forecast_1 <- forecast(arima_emtk, n)
checkresiduals(arima_emtk)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2)
## Q* = 14.008, df = 8, p-value = 0.08156
## 
## Model df: 2.   Total lags used: 10
Box.test(arima_emtk$residuals, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  arima_emtk$residuals
## X-squared = 0.00010557, df = 1, p-value = 0.9918

Here our forecast for 100 days using ARIMA(0,1,2) generate almost same result. The p-value from Ljung Box test shows that the model residuals are non-autocorrelated, which indicates no heterocedasticity. The residuals of the model still in normal distribution and stationary. So the model is as good as auto.arima.

forecast_1 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")

4.2.2 Manual ARIMA

After using auto.arima we want to try generate ARIMA with ACF and PACF of differenced data as shown by plot below.

diff(emtk6_log_ad,lag=1) %>% tsdisplay()

From plot above we can extract the following information for (p,d,q) as follow:

  • P: 0/1/2
  • D: 1
  • Q: 0/1/2

Generating arima using above information

arima_emtk_010 <- arima(tr_emtk, order=c(0,1,0))
arima_emtk_011 <- arima(tr_emtk, order=c(0,1,1))
arima_emtk_012 <- arima(tr_emtk, order=c(0,1,2))
arima_emtk_110 <- arima(tr_emtk, order=c(1,1,0))
arima_emtk_111 <- arima(tr_emtk, order=c(1,1,1))
arima_emtk_210 <- arima(tr_emtk, order=c(2,1,0))
arima_emtk_211 <- arima(tr_emtk, order=c(2,1,1))
arima_emtk_212 <- arima(tr_emtk, order=c(2,1,2))

Here we compare aic of each models to determine best model using manual arima and compared to auto arima model.

arima_emtk_010$aic
## [1] -5856.209
arima_emtk_011$aic
## [1] -5863.972
arima_emtk_012$aic
## [1] -5871.193
arima_emtk_110$aic
## [1] -5862.375
arima_emtk_111$aic
## [1] -5869.425
arima_emtk_210$aic
## [1] -5870.63
arima_emtk_211$aic
## [1] -5869.069
arima_emtk_212$aic
## [1] -5867.35

We can see from aic above that arima_emtk_012 generate same aic with arima_emtk model and also the lowest aic. So the best model for this analysis is still ARIMA with p,d,q of (0,1,2).

In this section will visualize forecast using manual arima models that we made above.

forecast_2 <- forecast(arima_emtk_110, n)
forecast_3 <- forecast(arima_emtk_111, n)
forecast_4 <- forecast(arima_emtk_210, n)
forecast_5 <- forecast(arima_emtk_211, n)
forecast_6 <- forecast(arima_emtk_212, n)
forecast_7 <- forecast(arima_emtk_010, n)
forecast_8 <- forecast(arima_emtk_011, n)
forecast_9 <- forecast(arima_emtk_012, n)
forecast_2 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")

forecast_3 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")

forecast_4 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")

forecast_5 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")

forecast_6 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")

forecast_7 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")

forecast_8 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")

forecast_9 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")

5. Conclusion

After building model above, the ARIMA(0,1,2) is till the best compared to other manually generated ARIMA model. Eventhough the model is descent, as you can see the prediction is a flat line, it is because ARIMA cannot capture random data movement. If we want to improve the mode, we need to try other models like VAR, GARCH (volatiliry clustering). This model building should be seen as interactive process until we can find the perfect model that could predict the movement of financial data, instead of seen it as final step.