This data is monthly adjusted closing price for Home Depot and Lowe's. Data source is yahoo finance.

Setup

library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(ggplot2)
library(tseries)
library(vars)
## Loading required package: MASS
## Loading required package: strucchange
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: urca
## Loading required package: lmtest

Data

ts = ts(HD, frequency = 12, start = c(2015,8))
ts = ts[,-1]

train = as.ts(head(ts, n = 48))
test = as.ts(tail(ts, n = 12))

EDA

autoplot(ts, main = "Adj. Closing Stock Prices", ylab = "Stock Price ($)")

cor(HD$HD, HD$Lowe)
## [1] 0.955375
adf.test(ts[,1])
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts[, 1]
## Dickey-Fuller = -3.2708, Lag order = 3, p-value = 0.0848
## alternative hypothesis: stationary
adf.test(ts[,2])
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts[, 2]
## Dickey-Fuller = -3.6078, Lag order = 3, p-value = 0.04015
## alternative hypothesis: stationary
hist(ts[,1])

hist(ts[,2])

Not only do the plots show that the stocks appear to move together, but the correlation statistic shows us the two stocks have a correlation of 0.955.

VAR select

select  = VARselect(train[,1:2], lag.max=8,
  type="const")[["selection"]]
select
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      8      8      1      8

Based on these results, I will fit a VAR(1) and a VAR(8)

VAR

var1 <- VAR(train[,1:2], p=1, type="const")
serial.test(var1, lags.pt=10, type="PT.asymptotic")
## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object var1
## Chi-squared = 49.16, df = 36, p-value = 0.07068
var2 <- VAR(train[,1:2], p=8, type="const")
serial.test(var2, lags.pt=10, type="PT.asymptotic")
## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object var2
## Chi-squared = 20.614, df = 8, p-value = 0.008246

The VAR(8) model seems to be better.

Forecast

fc = forecast(var2,h=12)
autoplot(fc)

checkresiduals(fc$forecast$HD)

checkresiduals(fc$forecast$Lowe)

accuracy(fc$forecast$HD,test[1])
##                         ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -3.553147e-16  4.134252  3.339148 -0.0760513 2.251943 0.4957663
## Test set     -1.384163e+01 13.841631 13.841631 -6.2275640 6.227564 2.0550794
##                     ACF1
## Training set -0.04051015
## Test set              NA
accuracy(fc$forecast$Lowe, test[2])
##                        ME       RMSE        MAE        MPE      MAPE       MASE
## Training set 1.775923e-16   3.710486   3.120389 -0.1949789  3.817529  0.6471301
## Test set     1.182114e+02 118.211410 118.211410 52.2430139 52.243014 24.5155843
##                     ACF1
## Training set -0.02426916
## Test set              NA

Neither model has troublesome serial correlation. The forecasts for Home Depot are decent. They are not very biased, and the RMSE of 13 is reasonable. However, the forecasts for Lowe's are not very good. And this likely hurts the Home Depot forecasts as well as the poor Lowe's forecasts hurt the Home Depot forecasts. It is interesting how differently the forecasts perform given how correlated the two are.