Shares are investment instruments that signify a person’s or entity’s ownership in a company. By purchasing a stock, the owner of the stock (often referred to as an investor or shareholder) is entitled to a portion of the company’s assets and profits according to the number of shares owned, the problem we will solve is to forecast/predict the stock price for the next 3 months. The library we need includes:
library(dplyr) # data wrangling
library(lubridate) # date manipulation
library(padr) # complete data frame
library(zoo) # Missing value imputation
library(forecast) # time series library
library(TTR) # for Simple moving average function
library(MLmetrics) # calculate error
library(tseries) # adf.test
library(fpp) # data for forecasting: principles and practice
library(TSstudio) # mempercantik visualisasi timeseries
library(ggplot2)
library(tidyr)
library(quantmod)
library(xts)With the help of the library (quantmod) we will do scraping first, the stock data that we will scrape with code BBCA on yfinance:
getSymbols("BBCA.JK", src = "yahoo", from = "2020-10-01", to = Sys.Date())#> [1] "BBCA.JK"
BBCA.JK <- data.frame(Date = index(BBCA.JK), coredata(BBCA.JK))
head(BBCA.JK)We will save it into a data frame
write.csv(BBCA.JK, "data_saham_BBCA.csv", row.names = FALSE)We will call the file we saved again
saham_bbca <- read.csv("data_saham_BBCA.csv")
head(saham_bbca)At this stage we will check and adjust whether the data we have has
met the characteristics of time series data. At this stage we will
select data for Date and BBCA.JK.Close. some
columns that we delete such as BBCA.JK.Open,
BBCA.JK.High, BBCA.JK.Low,
BBCA.JK.Volume and BBCA.JK.Adjusted.
bbca_clean <- saham_bbca %>% select(Date,BBCA.JK.Close) %>% mutate(Date = ymd(Date))
all(seq.Date(from = min(bbca_clean$Date), to = max(bbca_clean$Date), by = "day")==bbca_clean$Date)#> [1] FALSE
Because there is a missed day then we will do padding
bbca_clean <- bbca_clean %>% pad()Check Missing values
anyNA(bbca_clean)#> [1] TRUE
There are missing values because we did padding, we will fill the missing values with the previous value.
bbca_clean <- bbca_clean %>% mutate(BBCA.JK.Close = na.locf(BBCA.JK.Close))
anyNA(bbca_clean)#> [1] FALSE
head(bbca_clean)names(bbca_clean) <- c("ds", "y")From the above steps, we can conclude that there is no missing time series data and no missing values.
With the data above, we will create a time series for the next 1 year.
bbca_ts <- ts(data = bbca_clean$y,
start = c(2020,10),
frequency = 365)
bbca_ts#> Time Series:
#> Start = c(2020, 10)
#> End = c(2024, 35)
#> Frequency = 365
#> [1] 5570 5505 5505 5505 5520 5700 5755 5780 5775 5775 5775 5855
#> [13] 5855 5900 5785 5760 5760 5760 5900 5805 5780 5800 5770 5770
#> [25] 5770 5815 5790 5790 5790 5790 5790 5790 5820 5890 5820 6150
#> [37] 6300 6300 6300 6285 6480 6540 6420 6390 6390 6390 6480 6550
#> [49] 6570 6615 6600 6600 6600 6600 6565 6410 6480 6385 6385 6385
#> [61] 6205 6395 6450 6460 6390 6390 6390 6520 6490 6490 6575 6735
#> [73] 6735 6735 6820 6790 6950 6935 6800 6800 6800 6830 6715 6725
#> [85] 6725 6725 6725 6725 6780 6765 6770 6770 6770 6770 6770 6835
#> [97] 7090 6945 6965 7050 7050 7050 7345 7160 7120 7020 6955 6955
#> [109] 6955 7120 7095 7095 7075 7080 7080 7080 7035 6820 6790 6900
#> [121] 6760 6760 6760 6820 6800 6825 6855 6915 6915 6915 6920 6980
#> [133] 6920 6880 6880 6880 6880 6800 6940 6900 6735 6825 6825 6825
#> [145] 6790 6825 6725 6705 6710 6710 6710 7045 7015 7000 6720 6800
#> [157] 6800 6800 6720 6605 6705 6705 6765 6765 6765 6665 6625 6610
#> [169] 6705 6760 6760 6760 6620 6565 6440 6370 6415 6415 6415 6360
#> [181] 6395 6215 6225 6225 6225 6225 6155 6165 6250 6135 6200 6200
#> [193] 6200 6065 6000 6305 6280 6275 6275 6275 6270 6235 6165 6215
#> [205] 6390 6390 6390 6285 6405 6320 6410 6405 6405 6405 6390 6400
#> [217] 6425 6425 6400 6400 6400 6420 6480 6480 6480 6480 6480 6480
#> [229] 6500 6390 6345 6380 6360 6360 6360 6325 6355 6355 6270 6340
#> [241] 6340 6340 6375 6375 6465 6600 6580 6580 6580 6530 6430 6530
#> [253] 6620 6470 6470 6470 6410 6470 6385 6330 6325 6325 6325 6250
#> [265] 6340 6230 6210 6190 6190 6190 6055 6045 6025 6025 6100 6100
#> [277] 6100 6110 6055 6065 6015 6020 6020 6020 6170 6045 5990 6115
#> [289] 6110 6110 6110 6005 6005 6010 6155 6035 6035 6035 6005 6005
#> [301] 5980 6040 5970 5970 5970 5960 6145 6120 6300 6160 6160 6160
#> [313] 6200 6300 6300 6300 6410 6410 6410 6420 6420 6600 6600 6600
#> [325] 6600 6600 6590 6600 6600 6560 6510 6510 6510 6565 6550 6565
#> [337] 6540 6600 6600 6600 6575 6570 6440 6570 6520 6520 6520 6555
#> [349] 6545 6495 6500 6520 6520 6520 6585 6490 6555 6580 6585 6585
#> [361] 6585 6580 6520 6580 7000 6760 6760 6760 6960 6945 7180 7160
#> [373] 7290 7290 7290 7255 7320 7525 7750 7650 7650 7650 7525 7500
#> [385] 7500 7400 7525 7525 7525 7525 7525 7450 7375 7475 7475 7475
#> [397] 7400 7300 7350 7375 7450 7450 7450 7575 7675 7650 7675 7525
#> [409] 7525 7525 7500 7475 7575 7400 7425 7425 7425 7475 7475 7475
#> [421] 7425 7275 7275 7275 7400 7275 7300 7500 7375 7375 7375 7350
#> [433] 7350 7425 7350 7375 7375 7375 7300 7300 7300 7275 7500 7500
#> [445] 7500 7375 7375 7325 7300 7300 7300 7300 7350 7350 7300 7300
#> [457] 7300 7300 7300 7325 7400 7450 7475 7650 7650 7650 7600 7700
#> [469] 7700 7700 7850 7850 7850 7750 7675 7675 7775 7950 7950 7950
#> [481] 7800 7775 7700 7800 7775 7775 7775 7625 7625 7800 7725 7725
#> [493] 7725 7725 7800 7725 7950 7750 7825 7825 7825 7700 7875 7975
#> [505] 7900 7925 7925 7925 7950 7900 8050 8000 8050 8050 8050 8050
#> [517] 8050 7975 7975 7900 7900 7900 7700 7650 7850 7925 7950 7950
#> [529] 7950 8075 8150 8200 8000 7900 7900 7900 7900 7925 7900 7925
#> [541] 7950 7950 7950 7900 7850 7875 7975 7925 7925 7925 7900 7900
#> [553] 7750 7750 7850 7850 7850 7725 7800 7800 7700 7700 7700 7700
#> [565] 7700 7625 7650 7925 7875 7875 7875 8000 8125 8200 8125 8125
#> [577] 8125 8125 8125 8125 8125 8125 8125 8125 8125 7600 7525 7650
#> [589] 7275 7325 7325 7325 7325 7400 7575 7450 7400 7400 7400 7375
#> [601] 7350 7375 7375 7575 7575 7575 7575 7750 7750 7575 7600 7600
#> [613] 7600 7450 7375 7600 7500 7350 7350 7350 7350 7400 7325 7575
#> [625] 7500 7500 7500 7625 7650 7500 7525 7475 7475 7475 7350 7300
#> [637] 7275 7250 7250 7250 7250 7050 7250 7300 7100 7150 7150 7150
#> [649] 7125 7175 7000 7025 7000 7000 7000 7150 7175 7400 7400 7325
#> [661] 7325 7325 7300 7300 7325 7350 7350 7350 7350 7500 7600 7625
#> [673] 7800 7875 7875 7875 7875 7900 7900 7950 7925 7925 7925 7950
#> [685] 7975 7975 8000 7900 7900 7900 8000 7900 7950 8075 8000 8000
#> [697] 8000 8150 8175 8200 8150 8225 8225 8225 8275 8275 8375 8350
#> [709] 8375 8375 8375 8375 8525 8500 8750 8450 8450 8450 8650 8550
#> [721] 8475 8475 8375 8375 8375 8425 8300 8325 8375 8550 8550 8550
#> [733] 8500 8550 8450 8425 8200 8200 8200 8300 8250 8325 8275 8250
#> [745] 8250 8250 8250 8300 8275 8500 8650 8650 8650 8900 8700 8550
#> [757] 8700 8750 8750 8750 8800 8800 8750 8800 8775 8775 8775 8850
#> [769] 8750 8875 8800 8850 8850 8850 8750 8800 8600 8725 8825 8825
#> [781] 8825 8725 8900 8875 9000 8975 8975 8975 9025 8975 9300 9000
#> [793] 8900 8900 8900 8775 8675 8450 8500 8575 8575 8575 8700 8700
#> [805] 8625 8500 8600 8600 8600 8650 8575 8675 8575 8500 8500 8500
#> [817] 8575 8600 8650 8575 8550 8550 8550 8550 8550 8350 8250 8300
#> [829] 8300 8300 8450 8175 8125 8175 8050 8050 8050 8150 8325 8300
#> [841] 8325 8300 8300 8300 8300 8225 8200 8475 8700 8700 8700 8700
#> [853] 8475 8500 8450 8700 8700 8700 8725 8850 8825 8900 8825 8825
#> [865] 8825 8875 8950 8875 8700 8725 8725 8725 8750 8700 8675 8725
#> [877] 8675 8675 8675 8775 8750 8600 8625 8475 8475 8475 8400 8425
#> [889] 8575 8575 8450 8450 8450 8550 8325 8325 8300 8375 8375 8375
#> [901] 8400 8500 8500 8500 8825 8825 8825 8700 8675 8800 8825 8750
#> [913] 8750 8750 8800 8775 8725 8750 8750 8750 8750 8800 8825 8900
#> [925] 8925 9000 9000 9000 9025 9125 9125 9125 9125 9125 9125 9125
#> [937] 9125 9200 9150 9050 9050 9050 9050 9050 8925 9000 9000 9000
#> [949] 9000 9000 8925 8925 8825 8800 8800 8800 8775 8700 8775 8775
#> [961] 9000 9000 9000 9000 9125 9025 9050 9150 9150 9150 9150 9250
#> [973] 9050 9050 9050 9050 9050 9200 9150 9100 9125 9100 9100 9100
#> [985] 9150 9150 9075 9050 9050 9050 9050 9000 9050 9125 9050 9050
#> [997] 9050 9050 9075 9150 9150 9150 9150 9150 9150 9075 9050 9050
#> [1009] 9075 9025 9025 9025 9050 9025 9175 9125 9200 9200 9200 9175
#> [1021] 9150 9150 9150 9150 9150 9150 9100 9150 9350 9225 9125 9125
#> [1033] 9125 9125 9125 9200 9250 9150 9150 9150 9275 9200 9400 9400
#> [1045] 9400 9400 9400 9350 9300 9300 9300 9250 9250 9250 9175 9300
#> [1057] 9300 9200 9275 9275 9275 9200 9250 9200 9175 9225 9225 9225
#> [1069] 9225 9225 9150 9175 9125 9125 9125 9125 9100 9075 9100 9000
#> [1081] 9000 9000 9000 9075 9150 9125 9075 9075 9075 9000 8950 8875
#> [1093] 8875 8825 8825 8825 9075 9200 9200 9075 9025 9025 9025 9050
#> [1105] 8925 8925 9050 9075 9075 9075 9100 8950 8850 8750 8975 8975
#> [1117] 8975 8850 8775 8875 8725 8700 8700 8700 8850 8750 8600 8850
#> [1129] 8900 8900 8900 9050 8975 9000 9000 8825 8825 8825 8875 8925
#> [1141] 9050 9075 9075 9075 9075 8875 8775 8875 8925 8925 8925 8925
#> [1153] 8875 8875 8900 8975 8950 8950 8950 8925 8900 8800 8825 8750
#> [1165] 8750 8750 8750 8700 8675 9050 9225 9225 9225 9200 9250 9300
#> [1177] 9325 9325 9325 9325 9325 9325 9375 9400 9400 9400 9400 9400
#> [1189] 9425 9350 9475 9575 9575 9575 9575 9625 9550 9575 9700 9700
#> [1201] 9700 9725 9700 9750 9675 9625 9625 9625 9625 9600 9525 9500
#> [1213] 9350 9350 9350 9550 9650 9550 9700 9700 9700 9700 9575 9625
#> [1225] 9700 9700 9700 9700 9700 9800 9725 9725 9850 9950 9950 9950
#> [1237] 9875 10025 9975 9875 9825 9825 9825 9800 9875 10000 9875 9825
#> [1249] 9825 9825 9750 9800 9950 10125 10150 10150 10150 10150 10150 10000
#> [1261] 10325 10150 10150 10150 10150 10175 10125 10125 10100 10100 10100 10075
#> [1273] 10050 10075 10075 10075 10075 10075 9850 9900 9525 9850 9825 9825
#> [1285] 9825 9825 9825 9825 9825 9825 9825 9825 9825 9475 9525 9475
#> [1297] 9475 9475 9475 9350 9725 9950 9775 9625 9625 9625 9800 9800
#> [1309] 9800 9550 9850 9850 9850 9800 9700 9375 9375 9375 9375 9375
#> [1321] 9525 9550 9500 9600 9750 9750 9750 9475 9375 9425 9425 9425
#> [1333] 9425 9425 9300 9300 9150 9000 9250 9250 9250 9275 9350 9450
#> [1345] 9475 9325 9325 9325 9525 9300 9250 9200 9200 9200 9200 9200
#> [1357] 9200 9050 9425 9600 9600 9600 9600 9600 9500 9750 9925 9925
#> [1369] 9925 9875 9900 10000 9825 9950 9950 9950 10050 10075 10100 10075
#> [1381] 10075 10075 10075 10050 9950 9800 10100 10125 10125 10125 10100 10175
#> [1393] 10075 10300 10325 10325 10325 10250 10175 10275 10375 10200 10200 10200
#> [1405] 9875 10000 10100 10225 10150 10150 10150 10200 10300 10200 10225 10325
#> [1417] 10325 10325 10375 10425 10425 10325 10325 10325 10325 10325 10200 10350
#> [1429] 10225 10325 10325 10325 10275 10175 10300 10250 10300 10300 10300 10275
#> [1441] 10350 10425 10475 10425 10425 10425 10425 10500 10625 10900 10775 10775
#> [1453] 10775 10950 10800 10850 10700 10650 10650 10650 10325 10550 10500 10450
#> [1465] 10475 10475 10475 10300 10400 10425 10500 10375 10375 10375 10500 10625
#> [1477] 10475 10725 10750 10750 10750 10675 10500 10650 10700 10750
After we create the bbca stock time series, we can see the time series plot below:
autoplot(bbca_ts)Insight: From the data above, we can see that the data fluctuates, the trend tends to increase.
Decomposition is a stage in time series analysis that is used to decompose several components in time series data. Components/elements in time series:
To be able to decompose our time series object into these 3
components, we can use the decompose() function.
bbca_decom <- decompose(bbca_ts)Visualize the decompose result using autoplot()
bbca_decom %>% autoplot()In the decompose result we get visualization information:
A good cross validation stage will always be done before model building, the data will be divided into train data and test data. Especially for time series data, the division of data should not be taken randomly but divided by splitting sequentially. In the migration_ts data, we know that the observed pattern (seasonal) is daily, so we will take test data for one season, namely the last 365 days
bbca_test <- tail(bbca_ts,365)
#membuat data train
bbca_train <- head(bbca_ts,-365)ARIMA is a combination of two methods, namely Auto Regressive (AR) and Moving Average (MA). The I explains Integrated.
The requirement for data to be processed using ARIMA is that the data
must be stationary. Stationary in time series means that the time series
data we have does not have a trend or seasonal and fluctuates around its
mean. In knowing whether our data is stationary or not, we can test the
assumptions with the adf.test() function.
bbca_train %>% adf.test()#>
#> Augmented Dickey-Fuller Test
#>
#> data: .
#> Dickey-Fuller = -2.9951, Lag order = 10, p-value = 0.1571
#> alternative hypothesis: stationary
bbca_train %>%
diff(lag = 365) %>%
adf.test()#>
#> Augmented Dickey-Fuller Test
#>
#> data: .
#> Dickey-Fuller = -2.8689, Lag order = 9, p-value = 0.2105
#> alternative hypothesis: stationary
From the data above, the p-value is greater than 0.05, so it is stationary.
from the differencing results above we get the order d for model I in the SARIMA(p,d,q)(P,D,Q)[seasonal] model is:
diff()diff(lag = 365)To find the value of p and q, we can see with the plot below:
bbca_train %>%
diff(lag = 365) %>%
diff() %>%
tsdisplay()From the plot above we can see to look for the value of p then we look at the plot in the PACF that crosses the blue line. and to look for the value of q then we look at the plot in the ACF that crosses the blue line. the maximum value of p and q that we take is 5
model candidate:
model_010 <- Arima(y = bbca_train,order = c(0,1,0))
model_013 <- Arima(y = bbca_train,order = c(0,1,3))
model_310 <- Arima(y = bbca_train,order = c(3,1,0))
model_313 <- Arima(y = bbca_train,order = c(3,1,3))
model_autoarima <- auto.arima(bbca_train)For selecting the best model, we can use the AIC value. Remember: AIC
value is information loss -> so the smallest value is desired. can be
done by way of name_model$aic
model_010$aic#> [1] 13107.29
model_013$aic#> [1] 13090.24
model_310$aic#> [1] 13089.3
model_313$aic#> [1] 13093.82
model_autoarima$aic#> [1] 13088.88
the best model of arima is:
### Forecasting ARIMA
forecast_autoarima <- forecast(model_autoarima, h = 365)
forecast_310 <- forecast(model_310, h = 365)
forecast_013 <- forecast(model_013, h = 365)we will evaluate the model using the accuracy() function of the best model.
accuracy(forecast_autoarima$mean, bbca_test)#> ME RMSE MAE MPE MAPE ACF1 Theil's U
#> Test set 1018.037 1148.011 1020.241 10.15693 10.18238 0.9705849 10.53864
accuracy(forecast_310$mean, bbca_test)#> ME RMSE MAE MPE MAPE ACF1 Theil's U
#> Test set 1016.768 1146.883 1019.007 10.1439 10.16975 0.9705785 10.52798
accuracy(forecast_013$mean, bbca_test)#> ME RMSE MAE MPE MAPE ACF1 Theil's U
#> Test set 1014.565 1144.921 1016.88 10.12129 10.14801 0.970595 10.50944
From the Arima evaluation above model_013 has the
smallest MAPE of our best models
Assumptions on time series are tested to measure whether the residuals obtained from the modeling results are good enough to describe and capture the information in the data. Below are the assumptions to test:
Normality of residual
hist(model_013$residuals)From the histogram above that the plot forms a normal looking curve.
to make sure we can use the shapiro.test
shapiro.test(model_013$residuals)#>
#> Shapiro-Wilk normality test
#>
#> data: model_013$residuals
#> W = 0.913, p-value < 0.00000000000000022
after doing the shapiro test, the p-value
obtained is smaller than 0.05, so the residuals of the 111 model are not
normally distributed (if the residuals are changed normally, then do it
like regression).
No-autocorrelation residual
To check the presence or absence of autocorrelation in time series forecasting results, we can use the Ljung-box test using the Box.test(residual model, type = “Ljung-Box) function which will produce a p-value. The hypothesis used is:
H0: residual has no-autocorrelation
H1: residual has autocorrelation
desired p-value > 0.05 (alpha), no-autocorrelation
Box.test(model_013$residuals, type = "Ljung-Box")#>
#> Box-Ljung test
#>
#> data: model_013$residuals
#> X-squared = 0.010191, df = 1, p-value = 0.9196
Insight:
model 013 do not occur autocorrelation.It is one way to decompose the data but still retain the information from all the data we have. Using the train data information, we will try to create a model using stlm(). Some model parameters:
y : object time series
s.window : seasonal pattern to be captured (frequency value in ts() function)
method: “arima”
model_stlm_arima <- stlm(y = bbca_train,
s.window = 365,
method = "arima")Forecast for the next 365 days
forecast_stlm_arima <- forecast(model_stlm_arima, h = 365)Visualization of forecast results
bbca_ts %>%
autoplot()+
autolayer(forecast_autoarima$mean, series = " model autoarima")+
autolayer(forecast_310$mean, series = "model 310")+
autolayer(forecast_013$mean, series = "model 013")+
autolayer(forecast_stlm_arima$mean, series = "model stlm arima")Evaluate between models using the MAPE() function from the MLmetrics package.
accuracy(forecast_013$mean, bbca_test)#> ME RMSE MAE MPE MAPE ACF1 Theil's U
#> Test set 1014.565 1144.921 1016.88 10.12129 10.14801 0.970595 10.50944
accuracy(forecast_stlm_arima$mean, bbca_test)#> ME RMSE MAE MPE MAPE ACF1 Theil's U
#> Test set 466.1979 622.3011 528.5009 4.56954 5.271799 0.9584233 5.709883
From the above evaluation with the percentage of MAPE between the
test data, model_stlm_arima is our best model.
With a MAPE of 5.271799, model_stlm is the best model to predict stock price for 365 days from our test data.