In this report we will do stock forecasting using ARIMA. We will use EMTK(Elang Mahkota Teknologi Tbk PT) stock for this report. EMTK is an Indonesian technology, telecommunication and media conglomerate and headquartered in Jakarta. We are going to use stock price from January 1, 2016 until February 28, 2022.
library(dplyr)
library(ggplot2)
library(quantmod)
library(forecast)
library(TTR)
library(MLmetrics)
library(tseries)
library(highcharter)
library(caTools)
library(rsconnect)We are going to download stock price information from yahoo finance using getSymbols from quantmod library.
emtk6 <- getSymbols("EMTK.JK", src="yahoo", from="2016-01-01", to="2022-02-28", auto.assign = F)
head(emtk6)## EMTK.JK.Open EMTK.JK.High EMTK.JK.Low EMTK.JK.Close EMTK.JK.Volume
## 2016-01-04 1030 1030 1030 1030 0
## 2016-01-05 1000 1000 995 995 25000
## 2016-01-06 995 995 995 995 0
## 2016-01-07 1000 1000 1000 1000 10000
## 2016-01-08 1000 1000 1000 1000 2000
## 2016-01-11 980 1000 950 1000 138000
## EMTK.JK.Adjusted
## 2016-01-04 1015.6310
## 2016-01-05 981.1193
## 2016-01-06 981.1193
## 2016-01-07 986.0496
## 2016-01-08 986.0496
## 2016-01-11 986.0496
The data that we obtain from yahoo finance is is OHLC data. We will use Adjusted Close Price.
In this section we will visualize the stock price movement from 2016 to 2022 using chartSeries from quantmod library.
chartSeries(emtk6, name = "EMTK 2016-2022")We are taking monthly adjusted price and convert to time series object.
ad_emtk <- Ad(to.monthly(emtk6))ts_ad_emtk <- as.ts(ad_emtk, start=c(2016,1))Decomposing monthly adjusted price using additive type due to no multiplication pattern in stock price chart.
dc_ad_emtk <- decompose(ts_ad_emtk,type = "additive")
plot(dc_ad_emtk)Looking at decomposition above we can see that:
dc_ad_emtk$seasonal## Jan Feb Mar Apr May Jun
## 2016 66.51741 29.07267 24.41173 57.94734 60.49764 76.63931
## 2017 66.51741 29.07267 24.41173 57.94734 60.49764 76.63931
## 2018 66.51741 29.07267 24.41173 57.94734 60.49764 76.63931
## 2019 66.51741 29.07267 24.41173 57.94734 60.49764 76.63931
## 2020 66.51741 29.07267 24.41173 57.94734 60.49764 76.63931
## 2021 66.51741 29.07267 24.41173 57.94734 60.49764 76.63931
## 2022 66.51741 29.07267
## Jul Aug Sep Oct Nov Dec
## 2016 81.01844 -67.96790 -58.43586 -122.03846 -135.74797 -11.91436
## 2017 81.01844 -67.96790 -58.43586 -122.03846 -135.74797 -11.91436
## 2018 81.01844 -67.96790 -58.43586 -122.03846 -135.74797 -11.91436
## 2019 81.01844 -67.96790 -58.43586 -122.03846 -135.74797 -11.91436
## 2020 81.01844 -67.96790 -58.43586 -122.03846 -135.74797 -11.91436
## 2021 81.01844 -67.96790 -58.43586 -122.03846 -135.74797 -11.91436
## 2022
Below are interactive chart using highchart using highcharter library. We are going to compare in this interactive chart the price movement with Single Moving Average plot. We can use this plot to highlight some important technical analysis that affect our decision, such as golden or death cross. In other word, the golden cross is considered as a good time to buy stock and the death cross is considered as a good time to sell stock.
highchart(type="stock") %>%
hc_add_series(emtk6) %>%
hc_add_series(SMA(na.omit(Ad(emtk6)),n=50),name="SMA(50)") %>%
hc_add_series(SMA(na.omit(Ad(emtk6)),n=200),name="SMA(200)") %>%
hc_title(text="<b>EMTK Price Candle Stick Chart 2016-2022</b>")We also want to compare EMTK with other similar companies in the same industries also comparing with JSKE as benchmark:
We are using getSymbols to download price data. We are going to use data from November 1, 2019 until 2022 due to difference in availability in stock price for DMMX and MCAS.
emtk_r <- getSymbols("EMTK.JK", src="yahoo", from="2019-11-01", to="2022-02-28", auto.assign = F)
dmmx <- getSymbols("DMMX.JK", src="yahoo", from="2019-11-01", to="2022-02-28", auto.assign = F)
mcas <- getSymbols("MCAS.JK", src="yahoo", from="2019-11-01", to="2022-02-28", auto.assign = F)
jkse <- getSymbols("^JKSE", src="yahoo", from="2019-11-01", to="2022-02-28", auto.assign = F)We plot those stocks with highchart.
highchart(type="stock") %>%
hc_add_series(Ad(emtk_r), name="EMTK") %>%
hc_add_series(na.omit(Ad(dmmx)),name="DMMX") %>%
hc_add_series(na.omit(Ad(mcas)),name="MCAS") %>%
hc_add_series(na.omit(Ad(jkse)),name="JKSE") %>%
hc_title(text="<b>EMTK vs DMMX vs MCAS vs IHSG 2016-2022</b>")As we can see from the comparison chart above, that MCAS and DMMX performed better than EMTK. Even MCAS cab outperform JKSE. Also we can see that MCAS and EMTK has upward trend after August 2021.
Beside stock price, we also want to compare the return between three stocks daily using adjusted closed price, with dailyReturn from quantmod library.
ret_emtk_r <- dailyReturn(Ad(emtk_r))
ret_dmmx <- dailyReturn(Ad(dmmx))
ret_mcas <- dailyReturn(Ad(mcas))Combining return of three stocks above.
returns <- data.frame(ret_emtk_r,ret_dmmx,ret_mcas)
names(returns) <- c("ret_emtk","ret_dmmx","ret_mcas")
returns <- as.xts(returns)We compare return using chart below.
library(PerformanceAnalytics)## Warning: package 'PerformanceAnalytics' was built under R version 4.1.3
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
charts.PerformanceSummary(returns,main="Daily Return EMTK vs DMMX vs MCAS 2019-2022")From chart above, we can see that EMTK performs better than MCAS. And DMMX outperforms other two stocks.
In this part, we are going to use ARIMA to forecast EMTK stock price.
Before we split the data, we are going to change the price to log so it will be equally scaled (in case of multivariate time series) and the analysis will be easier.
emtk6_log_ad <- log(Ad(emtk6))Below are plot of log price.
plot(emtk6_log_ad, main="EMTK 2016-2022 (Log)")We will check the autocorrelation using acf and partial autocorrelation with pacf.
acf(emtk6_log_ad ,lag.max=380)pacf(emtk6_log_ad ,lag.max=380)Given by the ACF and PACF above, we can see that the data shows strong and significant autocorrelation up to lag 210 and then the autocorrelation starts reversing. For the PACF, significant autocorrelation appear in lag 2 and 3 and then the autocorrelation starts oscillating around 0. This is the sign of certain trend, but we are unsure whether the data has seasonality or not, given that the PACF does not have any significant seasonal pattern. Therefore we conclude that EMTK stock price is non-stationary.
Next thing we are going check autocorrelation using ADF Test.
adf.test(emtk6_log_ad)##
## Augmented Dickey-Fuller Test
##
## data: emtk6_log_ad
## Dickey-Fuller = -0.85894, Lag order = 11, p-value = 0.9564
## alternative hypothesis: stationary
We can see that p-value is 0.9564 which indicates autocorrelation, so we need to difference our data as shown by code below.
emtk_lgdiff <- diff(emtk6_log_ad,lag=1)adf.test(emtk_lgdiff[-1,])## Warning in adf.test(emtk_lgdiff[-1, ]): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: emtk_lgdiff[-1, ]
## Dickey-Fuller = -12.278, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary
After applying difference of 1, we can see p-value is below alpha (0.05), which indicates no autocorrelation.
emtk_lgdiff <- emtk_lgdiff[-1,]We are going to split training and test data.
n=100
tsdiff_emtk <- tail(emtk_lgdiff,n)
trdiff_emtk <- head(emtk_lgdiff,-n)
ts_emtk <- tail(emtk6_log_ad,n)
tr_emtk <- head(emtk6_log_ad,-n)Making ARIMA model using auto.
ar_emtk <- auto.arima(trdiff_emtk, stationary = T, trace=T)##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,0,2) with non-zero mean : -5864.683
## ARIMA(0,0,0) with non-zero mean : -5854.343
## ARIMA(1,0,0) with non-zero mean : -5860.727
## ARIMA(0,0,1) with non-zero mean : -5862.13
## ARIMA(0,0,0) with zero mean : -5856.206
## ARIMA(1,0,2) with non-zero mean : -5867.573
## ARIMA(0,0,2) with non-zero mean : -5869.386
## ARIMA(0,0,3) with non-zero mean : -5867.373
## ARIMA(1,0,1) with non-zero mean : -5867.129
## ARIMA(1,0,3) with non-zero mean : -5865.557
## ARIMA(0,0,2) with zero mean : -5871.181
## ARIMA(0,0,1) with zero mean : -5863.962
## ARIMA(1,0,2) with zero mean : -5869.328
## ARIMA(0,0,3) with zero mean : -5869.17
## ARIMA(1,0,1) with zero mean : -5868.844
## ARIMA(1,0,3) with zero mean : -5867.33
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(0,0,2) with zero mean : -5871.177
##
## Best model: ARIMA(0,0,2) with zero mean
summary(ar_emtk)## Series: trdiff_emtk
## ARIMA(0,0,2) with zero mean
##
## Coefficients:
## ma1 ma2
## -0.0828 -0.0807
## s.e. 0.0262 0.0265
##
## sigma^2 = 0.001007: log likelihood = 2938.6
## AIC=-5871.19 AICc=-5871.18 BIC=-5855.36
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.0003842756 0.0317085 0.01642078 NaN Inf 0.6374323 -0.0001137144
With auto.arima the solution with difference data is (0,0,2) for (p,d,q).
We are going to fit the model to training dataset.
forecast_0 <- forecast(ar_emtk,n)
plot(forecast_0)checkresiduals(ar_emtk)##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,2) with zero mean
## Q* = 13.998, df = 8, p-value = 0.08181
##
## Model df: 2. Total lags used: 10
Box.test(ar_emtk$residuals, type = "Ljung-Box")##
## Box-Ljung test
##
## data: ar_emtk$residuals
## X-squared = 1.8737e-05, df = 1, p-value = 0.9965
Here our forecast for 100 days ahead shows straight line. The p-value from Ljung Box test shows that the model residuals are non-autocorrelated, which indicates no heterocedasticity. The residuals of the model should follow normal distribution and stationary, this is the indication that the arima model fits the data well.
In this section we will generate arima from data that hasn’t been differenced and compare result with auto.arima. We are going to p,d,q for (0,1,2).
arima_emtk <- arima(tr_emtk, order=c(0,1,2))
summary(arima_emtk)##
## Call:
## arima(x = tr_emtk, order = c(0, 1, 2))
##
## Coefficients:
## ma1 ma2
## -0.0828 -0.0807
## s.e. 0.0262 0.0265
##
## sigma^2 estimated as 0.001005: log likelihood = 2938.6, aic = -5871.19
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0003887955 0.03169807 0.01641421 0.004147573 0.2411852 1.038112
## ACF1
## Training set -0.0002698326
forecast_1 <- forecast(arima_emtk, n)checkresiduals(arima_emtk)##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,2)
## Q* = 14.008, df = 8, p-value = 0.08156
##
## Model df: 2. Total lags used: 10
Box.test(arima_emtk$residuals, type = "Ljung-Box")##
## Box-Ljung test
##
## data: arima_emtk$residuals
## X-squared = 0.00010557, df = 1, p-value = 0.9918
Here our forecast for 100 days using ARIMA(0,1,2) generate almost same result. The p-value from Ljung Box test shows that the model residuals are non-autocorrelated, which indicates no heterocedasticity. The residuals of the model still in normal distribution and stationary. So the model is as good as auto.arima.
forecast_1 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")After using auto.arima we want to try generate ARIMA with ACF and PACF of differenced data as shown by plot below.
diff(emtk6_log_ad,lag=1) %>% tsdisplay()From plot above we can extract the following information for (p,d,q) as follow:
Generating arima using above information
arima_emtk_010 <- arima(tr_emtk, order=c(0,1,0))
arima_emtk_011 <- arima(tr_emtk, order=c(0,1,1))
arima_emtk_012 <- arima(tr_emtk, order=c(0,1,2))
arima_emtk_110 <- arima(tr_emtk, order=c(1,1,0))
arima_emtk_111 <- arima(tr_emtk, order=c(1,1,1))
arima_emtk_210 <- arima(tr_emtk, order=c(2,1,0))
arima_emtk_211 <- arima(tr_emtk, order=c(2,1,1))
arima_emtk_212 <- arima(tr_emtk, order=c(2,1,2))Here we compare aic of each models to determine best model using manual arima and compared to auto arima model.
arima_emtk_010$aic## [1] -5856.209
arima_emtk_011$aic## [1] -5863.972
arima_emtk_012$aic## [1] -5871.193
arima_emtk_110$aic## [1] -5862.375
arima_emtk_111$aic## [1] -5869.425
arima_emtk_210$aic## [1] -5870.63
arima_emtk_211$aic## [1] -5869.069
arima_emtk_212$aic## [1] -5867.35
We can see from aic above that arima_emtk_012 generate same aic with arima_emtk model and also the lowest aic. So the best model for this analysis is still ARIMA with p,d,q of (0,1,2).
In this section will visualize forecast using manual arima models that we made above.
forecast_2 <- forecast(arima_emtk_110, n)
forecast_3 <- forecast(arima_emtk_111, n)
forecast_4 <- forecast(arima_emtk_210, n)
forecast_5 <- forecast(arima_emtk_211, n)
forecast_6 <- forecast(arima_emtk_212, n)
forecast_7 <- forecast(arima_emtk_010, n)
forecast_8 <- forecast(arima_emtk_011, n)
forecast_9 <- forecast(arima_emtk_012, n)forecast_2 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")forecast_3 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")forecast_4 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")forecast_5 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")forecast_6 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")forecast_7 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")forecast_8 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")forecast_9 %>% autoplot() + autolayer(ts(ts_emtk,start=(length(tr_emtk))), series="test")After building model above, the ARIMA(0,1,2) is till the best compared to other manually generated ARIMA model. Eventhough the model is descent, as you can see the prediction is a flat line, it is because ARIMA cannot capture random data movement. If we want to improve the mode, we need to try other models like VAR, GARCH (volatiliry clustering). This model building should be seen as interactive process until we can find the perfect model that could predict the movement of financial data, instead of seen it as final step.