From the initial data obtained as many as 131 stock data from the id.investing website. We will divide these data into training data and testing data, where training data is used to create a time series analysis model, while testing data is useful for measuring the accuracy of the prediction model at the end with the MAPE size. 15% of the total data will be taken for testing data, which is approximately 20 data, so that the training data used is the first 111 data. So, the training data used for analysis is as follows:
harga.saham = ts(read.csv("C:/CAMPUS/PROFESSIONAL PREPARATION/Portfolio/RPubs/ARIMA/Copy of Untuk analisis Mastercard(1).csv")[1:111,3])
harga.saham
## Time Series:
## Start = 1
## End = 111
## Frequency = 1
## [1] 284.34 294.97 288.69 302.37 329.47 318.37 339.29 343.69 351.29 360.06
## [11] 348.83 345.84 343.60 347.73 367.67 375.52 376.28 374.03 373.91 366.83
## [21] 361.13 353.12 361.50 347.11 349.66 351.63 363.41 361.47 372.43 375.24
## [31] 380.03 384.66 381.92 385.57 374.37 373.73 369.22 376.53 379.81 393.30
## [41] 388.68 402.51 397.49 392.96 391.35 394.98 392.17 402.89 415.57 414.84
## [51] 414.31 402.22 395.91 397.97 398.03 384.41 364.08 386.05 394.38 400.30
## [61] 412.50 414.36 412.16 418.57 424.10 426.51 419.42 429.10 436.78 438.53
## [71] 460.58 457.88 468.13 473.42 476.63 469.26 475.83 481.67 481.57 477.15
## [81] 465.38 455.39 462.42 443.58 456.98 460.27 451.18 447.07 449.79 444.63
## [91] 454.85 441.16 449.49 440.11 443.69 438.18 462.02 456.78 468.88 466.44
## [101] 483.34 476.12 493.36 492.74 493.64 497.70 502.26 516.34 507.36 508.08
## [111] 524.76
testing = ts(ts(read.csv("C:/CAMPUS/PROFESSIONAL PREPARATION/Portfolio/RPubs/ARIMA/Copy of Untuk analisis Mastercard(1).csv")[112:131,3]),start=112)
Here is the time series plot
ts.plot(harga.saham)
It
can be seen that the data has an upward trend so it is not yet
stationary in the mean and there is a suspicion that it is not yet
stationary in variance. To stationary the data in variance the box-cox
transformation is used as follows.
lambda = BoxCox.lambda(harga.saham,)
lambda
## [1] 1.464438
data.boxcox = BoxCox(harga.saham,lambda)
data.boxcox
## Time Series:
## Start = 1
## End = 111
## Frequency = 1
## [1] 2677.385 2825.268 2737.597 2929.692 3322.187 3159.535 3468.223 3534.299
## [9] 3649.358 3783.574 3611.988 3566.730 3532.944 3595.317 3901.274 4023.878
## [17] 4035.812 4000.515 3998.635 3888.226 3800.054 3677.237 3805.758 3585.931
## [25] 3624.583 3654.533 3835.246 3805.295 3975.474 4019.485 4094.859 4168.136
## [33] 4124.722 4182.587 4005.842 3995.816 3925.387 4039.740 4091.387 4305.975
## [41] 4232.093 4454.463 4373.330 4300.524 4274.741 4332.942 4287.867 4460.624
## [49] 4667.739 4655.734 4647.025 4449.764 4347.893 4381.068 4382.035 4164.169
## [57] 3845.607 4190.216 4323.305 4418.687 4617.320 4647.846 4611.747 4717.175
## [65] 4808.734 4848.810 4731.212 4891.996 5020.767 5050.257 5426.484 5379.956
## [73] 5557.261 5649.477 5705.669 5576.919 5691.649 5794.250 5792.489 5714.789
## [81] 5509.512 5337.160 5458.264 5135.664 5364.475 5421.135 5265.049 5194.952
## [89] 5241.309 5153.479 5327.893 5094.679 5236.190 5076.929 5137.529 5044.354
## [97] 5451.350 5361.037 5570.306 5527.902 5823.697 5696.730 6001.367 5990.324
## [105] 6006.356 6078.845 6160.590 6415.167 6252.424 6265.424 6568.961
## attr(,"lambda")
## [1] 1.464438
ts.plot(data.boxcox)
After our data is boxcox transformed, then the next step is differencing. Because there is no seasonal influence on the time series plot, we do not need seasonal differencing and we immediately differencing 1 and then display the time series plot.
diff.1 = diff(data.boxcox)
ts.plot(diff.1)
After the time series plot appears, the data appears to have no trend. Then we test adf to prove whether the data is stationary in the mean.
adf.test(diff.1)
## Warning in adf.test(diff.1): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff.1
## Dickey-Fuller = -4.6361, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
After testing adf, it turns out that the data is stationary in the mean. Next, let’s check the ACF and PACF plots of the data from differencing 1 to determine the ARIMA order.
acf(diff.1)
pacf(diff.1)
After visualizing the ACF and PACF plots, it is found that the lags in the ACF that come out on the significance limit line are at lags 1 and 2. While in the PACF, only 1. This indicates the possibility of order p = 0,1 (from the PACF lag), d = 1 (from the number of differencing), q = 0,1,2 That is, the approximate model that fits is ARIMA(0,1,1) ARIMA(0,1,2) ARIMA(1,1,0) ARIMA(1,1,1) ARIMA(1,1,2) masing masing model with constant dan without constant Maka kita cek satu per satu model
arima_0.1.1 = Arima(data.boxcox,order=c(0,1,1),include.constant = T)
coeftest(arima_0.1.1)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.145378 0.081078 -1.7931 0.072963 .
## drift 34.900549 12.255308 2.8478 0.004402 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
arima_0.1.2 = Arima(data.boxcox,order=c(0,1,2),include.constant = T)
coeftest(arima_0.1.2)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.172490 0.094605 -1.8233 0.06826 .
## ma2 0.201808 0.096329 2.0950 0.03617 *
## drift 35.033834 14.418223 2.4298 0.01511 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
arima_1.1.0 = Arima(data.boxcox,order=c(1,1,0),include.constant = T)
coeftest(arima_1.1.0)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.19759 0.09465 -2.0876 0.036838 *
## drift 34.80529 11.90956 2.9225 0.003473 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
arima_1.1.1 = Arima(data.boxcox,order=c(1,1,1),include.constant = T)
coeftest(arima_1.1.1)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.91499 0.10862 -8.4235 < 2.2e-16 ***
## ma1 0.80952 0.16495 4.9076 9.219e-07 ***
## drift 35.28742 13.24247 2.6647 0.007705 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
arima_1.1.2 = Arima(data.boxcox,order=c(1,1,2),include.constant = T)
coeftest(arima_1.1.2)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.055789 0.500871 -0.1114 0.91131
## ma1 -0.118641 0.493660 -0.2403 0.81008
## ma2 0.193461 0.126768 1.5261 0.12698
## drift 35.018065 14.263761 2.4550 0.01409 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
arima_0.1.1.noc = Arima(data.boxcox,order=c(0,1,1),include.constant = F)
coeftest(arima_0.1.1.noc)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.096234 0.078753 -1.222 0.2217
arima_0.1.2.noc = Arima(data.boxcox,order=c(0,1,2),include.constant = F)
coeftest(arima_0.1.2.noc)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.138942 0.095639 -1.4528 0.14629
## ma2 0.239682 0.094108 2.5469 0.01087 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
arima_1.1.0.noc = Arima(data.boxcox,order=c(1,1,0),include.constant = F)
coeftest(arima_1.1.0.noc)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.141867 0.096013 -1.4776 0.1395
arima_1.1.1.noc = Arima(data.boxcox,order=c(1,1,1),include.constant = F)
coeftest(arima_0.1.1.noc)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.096234 0.078753 -1.222 0.2217
arima_1.1.2.noc = Arima(data.boxcox,order=c(1,1,2),include.constant = F)
coeftest(arima_0.1.2.noc)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.138942 0.095639 -1.4528 0.14629
## ma2 0.239682 0.094108 2.5469 0.01087 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So from the above results, it is found that the model with all parameters that meet the p-value < 5% is: ARIMA(1,1,0) with constant ARIMA(1,1,1) with constant
Next, we will test the white noise of the two models using the Ljung Box test.
Box.test(arima_1.1.0$residuals,type="Ljung");Box.test(arima_1.1.1$residuals,type="Ljung")
##
## Box-Ljung test
##
## data: arima_1.1.0$residuals
## X-squared = 0.11534, df = 1, p-value = 0.7341
##
## Box-Ljung test
##
## data: arima_1.1.1$residuals
## X-squared = 0.19162, df = 1, p-value = 0.6616
The model meets the Ljung-Box white noise assumption if the p-value> 5%. Then the two models have met the white noise assumption.
Furthermore, the normality of the residuals of the two models will be tested using the Kolmogorov-Smirnov test. Data is normally distributed if the p-value> 5%
ks.test(arima_1.1.0$residuals,"pnorm",mean(arima_1.1.0$residuals),sd(arima_1.1.0$residuals));ks.test(arima_1.1.1$residuals,"pnorm",mean(arima_1.1.1$residuals),sd(arima_1.1.1$residuals))
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: arima_1.1.0$residuals
## D = 0.031806, p-value = 0.9999
## alternative hypothesis: two-sided
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: arima_1.1.1$residuals
## D = 0.048961, p-value = 0.9529
## alternative hypothesis: two-sided
It turns out that the residuals of the two models are normal. Then the best model will be selected with the smallest MSE criteria.
mean(arima_1.1.0$residuals^2)
## [1] 22100.5
mean(arima_1.1.1$residuals^2)
## [1] 21397.65
Judging from the MSE in sample criteria, the ARIMA (1,1,1) with constant model has a smaller MSE so that the model is the best model.
Here is the forecast for the next 20 weeks with the ARIMA (1,1,1) model.
fore.arima = forecast(arima_1.1.1,h=20,lambda=lambda,biasadj=T)$mean
fore.arima
## Time Series:
## Start = 112
## End = 131
## Frequency = 1
## [1] 525.3268 528.4385 529.2198 532.1245 533.0840 535.8150 536.9223 539.5076
## [9] 540.7372 543.2000 544.5310 546.8907 548.3054 550.5782 552.0618 554.2613
## [17] 555.8017 557.9392 559.5260 561.6109
The forecast results are used to obtain the MAPE value as follows
testing
## Time Series:
## Start = 112
## End = 131
## Frequency = 1
## [1] 521.89 520.86 532.94 528.57 529.00 528.03 532.20 521.36 504.67 524.70
## [11] 533.49 555.43 562.75 564.76 557.51 576.31 546.77 527.64 535.69 545.16
fore.arima
## Time Series:
## Start = 112
## End = 131
## Frequency = 1
## [1] 525.3268 528.4385 529.2198 532.1245 533.0840 535.8150 536.9223 539.5076
## [9] 540.7372 543.2000 544.5310 546.8907 548.3054 550.5782 552.0618 554.2613
## [17] 555.8017 557.9392 559.5260 561.6109
accuracy(fore.arima,testing)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -6.307598 16.01024 13.14586 -1.24438 2.456023 0.5979294 1.245837
The MAPE value of 2.456 means that the model is very good for prediction.
summary(arima_1.1.1)
## Series: data.boxcox
## ARIMA(1,1,1) with drift
##
## Coefficients:
## ar1 ma1 drift
## -0.9150 0.8095 35.2874
## s.e. 0.1086 0.1650 13.2425
##
## sigma^2 = 22198: log likelihood = -705.08
## AIC=1418.16 AICc=1418.54 BIC=1428.96
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.009209942 146.2794 115.4179 -0.03541098 2.588829 0.9263809
## ACF1
## Training set -0.0409933
dari rangkuman didapat bahwa nilai ar1 = -0,9150 ma1 = 0,8095 constanta = 35,2874
Thus, the best ARIMA model is ARIMA(1,1,1) with constant