Discussion #5
Download 5-years of historical daily data from any stock of your choosing. Finance.Yahoo.Com is a good place to start. Forecast the daily adjusted closing price of your stock using time series components and at least one external regressor (e.g., transaction volume at t-1).

I picked - Weekly closings of the Dow-Jones industrial average, July 1971 – August 1974.

# Loading libraries
library(forecast)
library(xts)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(TTR)
library(tseries)

# LOADING DATASET
require(quantmod)
## Loading required package: quantmod
## Version 0.4-0 included new data defaults. See ?getSymbols.
G <- new.env()
getSymbols("GOOG", env = G, src = "yahoo",
          from = as.Date("2013-04-09"), to = as.Date("2018-04-10"),return.class='ts')
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## 
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
## 
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).
## [1] "GOOG"
g<-G$GOOG
head(g)
## Time Series:
## Start = 1 
## End = 6 
## Frequency = 1 
##   GOOG.Open GOOG.High GOOG.Low GOOG.Close GOOG.Volume GOOG.Adjusted
## 1  385.2444  389.3427 384.0571   386.3124     4342600      386.3124
## 2  388.9304  393.6149 385.4927   392.5369     3982800      392.5369
## 3  393.8782  393.9875 389.4967   392.6412     4083700      392.6412
## 4  393.4361  393.4907 388.9354   392.4724     3294500      392.4724
## 5  390.4356  395.9249 385.9995   388.4386     4938000      388.4386
## 6  390.7535  395.4281 389.4272   394.1216     3506500      394.1216
colnames(g)
## [1] "GOOG.Open"     "GOOG.High"     "GOOG.Low"      "GOOG.Close"   
## [5] "GOOG.Volume"   "GOOG.Adjusted"

Plotting data

# Plot original series
plot(g)

# g.ts<-ts(g$GOOG.Adjusted,frequency = 252) # no. of days its traded in 1 yr. 

There is clear trend in the data with slight bit of seasonality. But we will see those details below by decomposing it.

Decomposing the time series

# boxplot(g[,"GOOG.Adjusted"]) ; boxplot(g[,"GOOG.Volume"])
plot(decompose(ts(g[,"GOOG.Adjusted"],frequency = 252)))

Forecasting data

g.adj.train<-g[2:1250,c(5)]
g.adj.test<-g[1251:1260,c(5)]

vol.train<-g[1:1249,6]
vol.test<-g[1251:1260,6]

model1<-auto.arima(g.adj.train)
model2<-auto.arima(g.adj.train,xreg = vol.train)
summary(model1)
## Series: g.adj.train 
## ARIMA(1,1,3) 
## 
## Coefficients:
##          ar1      ma1      ma2     ma3
##       0.7087  -1.1395  -0.0050  0.1638
## s.e.  0.0869   0.0944   0.0555  0.0545
## 
## sigma^2 estimated as 1.115e+12:  log likelihood=-19079.52
## AIC=38169.04   AICc=38169.09   BIC=38194.69
## 
## Training set error measures:
##                     ME    RMSE      MAE       MPE     MAPE      MASE
## Training set -32734.89 1053976 591687.2 -78.57762 93.51828 0.9086218
##                       ACF1
## Training set -0.0001106508
summary(model2)
## Series: g.adj.train 
## Regression with ARIMA(1,1,3) errors 
## 
## Coefficients:
##          ar1      ma1      ma2     ma3       xreg
##       0.7084  -1.1379  -0.0053  0.1630  -1526.717
## s.e.  0.0885   0.0959   0.0559  0.0549   1723.061
## 
## sigma^2 estimated as 1.116e+12:  log likelihood=-19079.12
## AIC=38170.24   AICc=38170.31   BIC=38201.01
## 
## Training set error measures:
##                     ME    RMSE      MAE       MPE     MAPE      MASE
## Training set -19491.29 1053645 588271.4 -78.03165 93.46546 0.9033765
##                      ACF1
## Training set 0.0005508406

Plotting forecasts

f1<-forecast(model1, h=10)
f2<-forecast(model2, h=10, xreg = vol.test)

Results
Which ones better ?

accuracy(f1, g.adj.test)
##                     ME    RMSE      MAE       MPE     MAPE      MASE
## Training set -32734.89 1053976 591687.2 -78.57762 93.51828 0.9086218
## Test set     534316.21  782428 667163.5  16.57041 25.72889 1.0245267
##                       ACF1
## Training set -0.0001106508
## Test set                NA
accuracy(f2, g.adj.test)
##                     ME    RMSE      MAE       MPE     MAPE      MASE
## Training set -19491.29 1053645 588271.4 -78.03165 93.46546 0.9033765
## Test set     426721.20  714271 630167.3  11.74609 25.26629 0.9677137
##                      ACF1
## Training set 0.0005508406
## Test set               NA

Conclusion
With this market-driven prices of Google we see that arima(2,1,3) with a regressor(volume at t-1) performs better in terms of accuracy with lower test errors from RMSE, MASE point of view.