Download 5-years of historical daily data from any stock of your choosing. Finance.Yahoo.Com is a good place to start. Forecast the daily adjusted closing price of your stock using time series components and at least one external regressor (e.g., transaction volume at t-1).
I used Tesla for this analysis and constructed 3 models. Two used the ARIMA method and the other was an ETS. One of the ARIMA models included an external regressor with a lagged volume variable.
#Import Data
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.2
## Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'zone/tz/2018c.
## 1.0/zoneinfo/America/New_York'
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(tseries)
library(readr)
mydata <- read_csv("~/Desktop/TSLA-3.csv")
## Parsed with column specification:
## cols(
## Date = col_date(format = ""),
## Open = col_double(),
## High = col_double(),
## Low = col_double(),
## Close = col_double(),
## `Adj Close` = col_double(),
## Volume = col_integer()
## )
#Create Time Series
myts<-ts(mydata$`Adj Close`, frequency=252, start=c(2013,4))
plot(myts, ylab= "Adjusted Closing Price", main= "Tesla, Inc, (TSLA)")
#Create Training and Test Set
#Training is 80%
#Test set is 20%
train=myts[2:1007]
test=myts[1008:1260]
volume.train <- as.numeric(mydata$Volume[1:1006])
volume.test <- as.numeric(mydata$Volume[1007:1259])
#ARIMA
#Select ARIMA(0,1,0)
fit1 <- auto.arima(train)
fit1
## Series: train
## ARIMA(0,1,0)
##
## sigma^2 estimated as 32.6: log likelihood=-3176.89
## AIC=6355.77 AICc=6355.78 BIC=6360.68
fcast1 <- forecast(fit1, h=253)
accuracy(fcast1, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.2536318 5.70672 4.047787 0.1448252 2.077068 0.9990167
## Test set 33.6910950 41.12391 35.242875 9.6701336 10.255148 8.6981408
## ACF1
## Training set 0.032286
## Test set NA
plot(fcast1)
#ARIMA with external regressor
#Regression with ARIMA(0,1,0) errors
fit2<-auto.arima(train, xreg=volume.train)
fit2
## Series: train
## Regression with ARIMA(0,1,0) errors
##
## Coefficients:
## xreg
## 0
## s.e. 0
##
## sigma^2 estimated as 32.62: log likelihood=-3176.76
## AIC=6357.52 AICc=6357.53 BIC=6367.35
fcast2 <- forecast(fit2, xreg=volume.test, h=252)
accuracy(fcast2, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.2534863 5.706008 4.048442 0.1445037 2.076736 0.9991785
## Test set 33.7186646 41.158317 35.280248 9.6779454 10.266610 8.7073647
## ACF1
## Training set 0.03288603
## Test set NA
plot(fcast2)
#ETS
#ETS(A,N,N)
fit3=ets(train,model="ZZZ")
fit3
## ETS(A,N,N)
##
## Call:
## ets(y = train, model = "ZZZ")
##
## Smoothing parameters:
## alpha = 0.9999
##
## Initial states:
## l = 43.585
##
## sigma: 5.7067
##
## AIC AICc BIC
## 10465.42 10465.44 10480.16
fcast3 <- forecast(fit3, h=253, drift=TRUE)
accuracy(fcast3, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.2536188 5.706739 4.047732 0.1447515 2.076971 0.9990032
## Test set 33.6914690 41.124215 35.243222 9.6702467 10.255251 8.6982266
## ACF1
## Training set 0.03238688
## Test set NA
plot(fcast3)
Based on these models the best model is the ARIMA(0,1,0). It performs the best, with lowest ME, RMSE, MAE, MPE, MAPE, MASE. The next best model was the ETS. Addition of the lagged variable of Volume doesn’t seem to be beneficial for forecasting when comparing the models.