Discussion #5
Download 5-years of historical daily data from any stock of your choosing. Finance.Yahoo.Com is a good place to start. Forecast the daily adjusted closing price of your stock using time series components and at least one external regressor (e.g., transaction volume at t-1).
I picked - Weekly closings of the Dow-Jones industrial average, July 1971 – August 1974.
# Loading libraries
library(forecast)
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(TTR)
library(tseries)
# LOADING DATASET
require(quantmod)
## Loading required package: quantmod
## Version 0.4-0 included new data defaults. See ?getSymbols.
G <- new.env()
getSymbols("GOOG", env = G, src = "yahoo",
from = as.Date("2013-04-09"), to = as.Date("2018-04-10"),return.class='ts')
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
##
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).
## [1] "GOOG"
g<-G$GOOG
head(g)
## Time Series:
## Start = 1
## End = 6
## Frequency = 1
## GOOG.Open GOOG.High GOOG.Low GOOG.Close GOOG.Volume GOOG.Adjusted
## 1 385.2444 389.3427 384.0571 386.3124 4342600 386.3124
## 2 388.9304 393.6149 385.4927 392.5369 3982800 392.5369
## 3 393.8782 393.9875 389.4967 392.6412 4083700 392.6412
## 4 393.4361 393.4907 388.9354 392.4724 3294500 392.4724
## 5 390.4356 395.9249 385.9995 388.4386 4938000 388.4386
## 6 390.7535 395.4281 389.4272 394.1216 3506500 394.1216
colnames(g)
## [1] "GOOG.Open" "GOOG.High" "GOOG.Low" "GOOG.Close"
## [5] "GOOG.Volume" "GOOG.Adjusted"
Plotting data
# Plot original series
plot(g)
# g.ts<-ts(g$GOOG.Adjusted,frequency = 252) # no. of days its traded in 1 yr.
There is clear trend in the data with slight bit of seasonality. But we will see those details below by decomposing it.
Decomposing the time series
# boxplot(g[,"GOOG.Adjusted"]) ; boxplot(g[,"GOOG.Volume"])
plot(decompose(ts(g[,"GOOG.Adjusted"],frequency = 252)))
Forecasting data
g.adj.train<-g[2:1250,c(5)]
g.adj.test<-g[1251:1260,c(5)]
vol.train<-g[1:1249,6]
vol.test<-g[1251:1260,6]
model1<-auto.arima(g.adj.train)
model2<-auto.arima(g.adj.train,xreg = vol.train)
summary(model1)
## Series: g.adj.train
## ARIMA(1,1,3)
##
## Coefficients:
## ar1 ma1 ma2 ma3
## 0.7087 -1.1395 -0.0050 0.1638
## s.e. 0.0869 0.0944 0.0555 0.0545
##
## sigma^2 estimated as 1.115e+12: log likelihood=-19079.52
## AIC=38169.04 AICc=38169.09 BIC=38194.69
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -32734.89 1053976 591687.2 -78.57762 93.51828 0.9086218
## ACF1
## Training set -0.0001106508
summary(model2)
## Series: g.adj.train
## Regression with ARIMA(1,1,3) errors
##
## Coefficients:
## ar1 ma1 ma2 ma3 xreg
## 0.7084 -1.1379 -0.0053 0.1630 -1526.717
## s.e. 0.0885 0.0959 0.0559 0.0549 1723.061
##
## sigma^2 estimated as 1.116e+12: log likelihood=-19079.12
## AIC=38170.24 AICc=38170.31 BIC=38201.01
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -19491.29 1053645 588271.4 -78.03165 93.46546 0.9033765
## ACF1
## Training set 0.0005508406
Plotting forecasts
f1<-forecast(model1, h=10)
f2<-forecast(model2, h=10, xreg = vol.test)
Results
Which ones better ?
accuracy(f1, g.adj.test)
## ME RMSE MAE MPE MAPE MASE
## Training set -32734.89 1053976 591687.2 -78.57762 93.51828 0.9086218
## Test set 534316.21 782428 667163.5 16.57041 25.72889 1.0245267
## ACF1
## Training set -0.0001106508
## Test set NA
accuracy(f2, g.adj.test)
## ME RMSE MAE MPE MAPE MASE
## Training set -19491.29 1053645 588271.4 -78.03165 93.46546 0.9033765
## Test set 426721.20 714271 630167.3 11.74609 25.26629 0.9677137
## ACF1
## Training set 0.0005508406
## Test set NA
Conclusion
With this market-driven prices of Google we see that arima(2,1,3) with a regressor(volume at t-1) performs better in terms of accuracy with lower test errors from RMSE, MASE point of view.