| Variable | Description |
|---|---|
| Open | Opening price of the stock on the day |
| High | Highest price of the stock on the day |
| Low | Lowest price of the stock on the day |
| Close | Closing price of the stock on the day |
| Volume | Total Volume Traded |
| Adjusted | Adjusted price of the stock including any risks or strategies |
library(quantmod)
library(xts)
library(dygraphs)
library(dplyr)
library(forecast)
library(prophet)
library(lubridate)
library(TSA)
library(lmtest)
Reading data
getSymbols("AMZN", src = "yahoo", from = "2006-01-01", to = "2020-12-31",resetIndex = TRUE)
amz_stock <- as.data.frame(AMZN)
colnames(amz_stock) <- c("Open","High","Low","Close","Volume","Adjusted")
Studying data
head(amz_stock)
## Open High Low Close Volume Adjusted
## 2006-01-03 47.47 47.85 46.25 47.58 7582200 47.58
## 2006-01-04 47.49 47.73 46.69 47.25 7440900 47.25
## 2006-01-05 47.16 48.20 47.11 47.65 5417200 47.65
## 2006-01-06 47.97 48.58 47.32 47.87 6152900 47.87
## 2006-01-09 46.55 47.10 46.40 47.08 8943100 47.08
## 2006-01-10 46.41 46.75 45.36 45.65 9686100 45.65
Plotting data
Statistical summary of the data
summary(amz_stock)
## Open High Low Close
## Min. : 26.09 Min. : 26.30 Min. : 25.76 Min. : 26.07
## 1st Qu.: 95.00 1st Qu.: 95.88 1st Qu.: 93.09 1st Qu.: 94.43
## Median : 282.00 Median : 284.87 Median : 279.74 Median : 282.10
## Mean : 646.96 Mean : 653.97 Mean : 639.08 Mean : 646.86
## 3rd Qu.: 882.25 3rd Qu.: 891.92 3rd Qu.: 880.57 3rd Qu.: 885.61
## Max. :3547.00 Max. :3552.25 Max. :3486.69 Max. :3531.45
## Volume Adjusted
## Min. : 881300 Min. : 26.07
## 1st Qu.: 3156600 1st Qu.: 94.43
## Median : 4614600 Median : 282.10
## Mean : 5706870 Mean : 646.86
## 3rd Qu.: 6810300 3rd Qu.: 885.61
## Max. :104329200 Max. :3531.45
Plot the data and examine for non constant variance or trend or stationary in the series
When analyzing a time series plot it is important to consider the presence of trend, seasonality and non constant variance. These components play a major role in determining the techniques used for transformation and model fitting for time series data.
plot.ts(amz_stock[,"Close"], ylab ="Closing Price")
Using Box Cox for stabilizing the variance
library(forecast)
(lambda <- BoxCox.lambda(amz_stock[,"Close"]))
## [1] 0.1335584
Plot of time series after transformation
tclose_price <- log(amz_stock[,"Close"])
plot(ts(tclose_price), ylab =" Transformed closing price")
Taking difference to check for seasonality
ts(tclose_price) %>% diff() %>% ggtsdisplay()
Checking for stationary using ADF test
library(tseries)
ts(tclose_price) %>% adf.test()
##
## Augmented Dickey-Fuller Test
##
## data: .
## Dickey-Fuller = -3.4717, Lag order = 15, p-value = 0.04499
## alternative hypothesis: stationary
Differencing to remove non-stationary
ts(tclose_price) %>% diff() %>% ggtsdisplay()
ts(tclose_price) %>% diff() %>% adf.test()
##
## Augmented Dickey-Fuller Test
##
## data: .
## Dickey-Fuller = -16.107, Lag order = 15, p-value = 0.01
## alternative hypothesis: stationary
Autocorrelation function (ACF)
For a stationary time series model, autocorrelation of Xt and Xt+k (or k-th order autocorrelation) is \(\rho_k = \gamma_k/\gamma_0\)
for all t = 1, …, n. As a function of k, \(\rho_k\) is called autocorrelation function (ACF) and its graph has been called the correlogram. Importance of \(\rho_k\) is to identify a suitable model for a time series data by knowing its autocorrelation function or an estimate of this function
Partial Autocorrelation function (PACF)
Partial correlation is the autocorrelation between Xt and Xt+k after their dependency on the intervening variables \(X_{t+1}\), …,\(X_{t+k−1}\) has been removed. In other words, it is the conditional correlation between \(X_{t}\) and \(X_{t+k}\) conditional on \(X_{t+1}\), …,\(X_{t+k-1}\), that is, \(\phi_{kk}\) = cor(\(X_{t}\),\(X_{t+k}\)|\(X_{t+1}\), …,\(X_{t+k-1}\)) for any t = 1, …, n.
tclose_price %>% diff() %>% acf()
tclose_price %>% diff() %>% pacf()
Using extended sample ACF (ESACF) for testing the order and finding potential order of model
eacf(diff(ts(tclose_price)))
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o x o o o o o o o o x o o o
## 1 x x o o o o o o o o o o o o
## 2 x x o o o o o o o o x o o o
## 3 x x x o o o o o o o o o o o
## 4 x x o x o o o o o o o o o o
## 5 x o x x x o o o o o o o o o
## 6 x x x x x x o o o o o o o o
## 7 x x x x x x o o o o o o o o
Overview of ARIMA models
The parameters of the ARIMA model are defined as follows:
Fitting ARIMA(0,1,2)
fit1= Arima(ts(close_price), order = c(0,1,2), lambda=0)
Checking residuals
Ljung Box test
The Ljung-Box test is a diagnostic tool used to test the lack of fit of a time series model. The test is applied to the residuals of a time series after fitting an ARMA(p,q) model to the data. The test examines autocorrelation between the residuals.
H0: Null hypothesis: There is no auto-correlation between residuals.
HA: Alternative hypothesis : There is auto-correlation between residuals.
Box.test (resid(fit1), lag = 30, type = "Ljung",fitdf = 2)
##
## Box-Ljung test
##
## data: resid(fit1)
## X-squared = 42.426, df = 28, p-value = 0.03951
Fitting ARIMA(2,1,2)
fit2= Arima(ts(close_price), order = c(2,1,2),lambda = 0)
Box.test (resid(fit2), lag = 30, type = "Ljung", fitdf = 4)
##
## Box-Ljung test
##
## data: resid(fit2)
## X-squared = 42.03, df = 26, p-value = 0.02437
Fitting ARIMA(5,1,5)
fit3= Arima(ts(close_price), order = c(5,1,5),lambda = 0)
Box.test (resid(fit3), lag = 30, type = "Ljung", fitdf = 10)
##
## Box-Ljung test
##
## data: resid(fit3)
## X-squared = 27.127, df = 20, p-value = 0.1317
Fitting ARIMA(5,1,6)
fit4= Arima(ts(close_price), order = c(5,1,6), lambda = 0)
Box.test (resid(fit4), lag = 30, type = "Ljung", fitdf = 11)
##
## Box-Ljung test
##
## data: resid(fit4)
## X-squared = 27.009, df = 19, p-value = 0.1044
Fitting ARIMA(6,1,6)
fit5= Arima(ts(close_price), order = c(6,1,6),lambda = 0)
coeftest(fit5)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.033699 0.650313 -0.0518 0.95867
## ar2 -0.017453 0.116998 -0.1492 0.88142
## ar3 0.026187 0.054029 0.4847 0.62790
## ar4 -0.100001 0.051676 -1.9352 0.05297 .
## ar5 -0.890664 0.068194 -13.0608 < 2e-16 ***
## ar6 0.127646 0.603800 0.2114 0.83257
## ma1 0.014840 0.661096 0.0224 0.98209
## ma2 -0.009416 0.109770 -0.0858 0.93164
## ma3 -0.030883 0.055413 -0.5573 0.57731
## ma4 0.078929 0.051956 1.5192 0.12872
## ma5 0.882031 0.060370 14.6104 < 2e-16 ***
## ma6 -0.134723 0.610777 -0.2206 0.82542
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Box.test (resid(fit5), lag = 30, type = "Ljung", fitdf = 12)
##
## Box-Ljung test
##
## data: resid(fit5)
## X-squared = 26.88, df = 18, p-value = 0.08128
Summary of the ARIMA models
| Model | Residuals | AIC |
|---|---|---|
| ARIMA(0,1,2) | Violate assumptions | -17390.95 |
| ARIMA(2,1,2) | Violate assumptions | -17387.25 |
| ARIMA(5,1,5) | Do not violate assumptions | -17399.22 |
| ARIMA(5,1,6) | Do not violate assumptions | -17397.71 |
| ARIMA(6,1,6) | Do not violate assumptions | -17395.97 |
Overview of AIC
Summary of the selected model ARIMA(5,1,6)
## Series: ts(close_price)
## ARIMA(5,1,6)
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ma1 ma2 ma3
## -0.1589 -0.0272 0.0168 -0.1095 -0.9206 0.1426 0.0008 -0.0218
## s.e. 0.0457 0.0560 0.0510 0.0406 0.0353 0.0484 0.0567 0.0497
## ma4 ma5 ma6
## 0.0897 0.9093 -0.0056
## s.e. 0.0391 0.0376 0.0184
##
## sigma^2 estimated as 0.0005807: log likelihood=8710.85
## AIC=-17397.71 AICc=-17397.63 BIC=-17322.88
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.8947686 20.88799 9.21193 0.08785396 1.594305 1.004396
## ACF1
## Training set -0.04903058
Forecasting the closing price of Amazon stock
Forecast plot
Forecasted values
forecasted_values <- final_model %>% forecast(h = 102)
RMSE of predicted data
accuracy(amz_stock_curr$Close,forecasted_values$mean)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 53.0394 129.9629 105.3485 1.618909 3.216088 0.9127146 51.20192
Overview of Prophet model
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
Prophet is open source software released by Facebook Core Data Science team
The prophet function that performs fitting and returns a model object.
The function requires dataframe with columns names as ds and y, containing the date and numeric value respectively.
Preprocesing the data
data <- read.csv("AMZN.csv")
amz_data <- data[,c("Date","Close")]
colnames(amz_data) <- c("ds","y")
amz_data <- add_country_holidays(amz_data, country_name = 'US')
Fitting prophet on time series data
pmodel <- prophet(amz_data)
Predicting the closing price
future <- make_future_dataframe(pmodel, periods = 151) %>% filter(!wday(ds) %in% c(1,7))
forecast <- predict(pmodel, future)
tail(forecast[c('ds', 'yhat', 'yhat_lower', 'yhat_upper')])
## ds yhat yhat_lower yhat_upper
## 3878 2021-05-24 3063.254 2891.191 3257.671
## 3879 2021-05-25 3067.107 2879.869 3258.662
## 3880 2021-05-26 3070.103 2886.742 3249.940
## 3881 2021-05-27 3071.602 2883.461 3250.283
## 3882 2021-05-28 3070.218 2877.223 3262.024
## 3883 2021-05-31 3077.940 2895.322 3247.888
Plotting the forecast
dyplot.prophet(pmodel, forecast)
## Warning: `select_()` is deprecated as of dplyr 0.7.0.
## Please use `select()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Plotting the forecast components
prophet_plot_components(pmodel,forecast)
Evaluating accuracy
forecast_insample <- forecast %>% select(ds, yhat) %>% filter(ds <= as.Date("2021-01-01"))
forecast_outofsample <- forecast %>% select(ds, yhat) %>% filter(ds >= as.Date("2021-01-01"))
accuracy(forecast_insample$yhat,amz_data$y)
## ME RMSE MAE MPE MAPE
## Test set -0.001355992 143.9533 78.69301 -2.441521 21.66887
accuracy(forecast_outofsample$yhat,amz_stock_curr$Close)
## ME RMSE MAE MPE MAPE
## Test set 286.955 310.8159 286.955 8.805399 8.805399
In this project we used stock price data of Amazon from last 14 years and build models to forecast the daily stock price for period of Jan to May 2021.
The variance of the data is stabilized using log differencing is used to achieve stationarity in the time series data.
ARIMA models are fitted over stabilized data and residual diagnostic is performed to examine whether the assumptions are violated. AIC of the models is checked to finalize the model having lower AIC value.
Summary of the models meeting residual assumptions
| Model | AIC |
|---|---|
| ARIMA(5,1,5) | -17399.22 |
| ARIMA(5,1,6) | -17397.71 |
| ARIMA(6,1,6) | -17395.97 |
Out of the five ARIMA models, ARIMA (5,1,6) has good ACF, PACF residual plots which resemble white noise and has lower AIC value hence it is selected as the final model for forecasting the closing price.
Sample forecasted values using ARIMA model
| Date | Actual | Forecasted |
|---|---|---|
| 1/19/2021 | 3120.76001 | 3278.193 |
| 1/20/2021 | 3263.379883 | 3274.846 |
| 1/21/2021 | 3306.98999 | 3273.525 |
| 1/22/2021 | 3292.22998 | 3267.755 |
| 1/25/2021 | 3294 | 3270.031 |
| 1/26/2021 | 3326.129883 | 3274.589 |
| 1/27/2021 | 3232.580078 | 3276.931 |
| 1/28/2021 | 3237.620117 | 3278.323 |
| 1/29/2021 | 3206.199951 | 3283.193 |