Data Preparation

I pick 5 years of historical stock data of Netflix and AT&T. I guess that streaming services and cable subscription services may have negative correlation.

netflix <- netflix[ , date := as.Date(date, format = "%m/%d/%y")]
netflix <- netflix[order(date),]
att <- att[ , date := as.Date(date, format = "%m/%d/%y")]

# create time seiries object
netflix.ts <- ts(netflix$adj_close, frequency = 12, start = c(2013,1))
att.ts <- ts(att$adj_close, frequency = 12, start = c(2013,1))

Plots & Diagnoses

Both data sets are non stationary

plot.ts(netflix.ts)

plot.ts(att.ts)

ggAcf(netflix.ts)

ggPacf(netflix.ts)

ggAcf(att.ts)

ggPacf(att.ts)

Now I run CCF (cross-correlation function) to determine which lag of AT&T or Netflix should be included in the model. However, the data are non stationary, there are trends, seasonal effects and other factors in my data, so this CCF plot does not say much about which lag of each stock is highly correlated with the other.

ccf(att.ts,netflix.ts, type = c("correlation","covariance"))

CCF Intepretation - 2 approaches

My goal is to figure out which lags of x may predict y based on CCF. Here I will try 2 methods:

  1. Differencing data then look at CCF plot
  2. Whitening data, filter out the residuals then look at CCF plot

Differencing (1)

CCF of the data after differencing indicates that there are spikes at lag -0.25 and 0.25. The scale of the lag-axis is in terms of complete seasonal periods. Here I have monthly data with a seasonal period of 12, so 0.25 lags means 0.25 months, or equivalently 0.25*12=3 months. Therefore, I would include 3 lags (p=3) when running VAR.

#differencing 
netflix.diff1 = diff(netflix.ts,1)
att.diff1 = diff(att.ts,1)

ccf(att.diff1,netflix.diff1, type = c("correlation","covariance"))

Pre whitening data

The CCF plot looks similar to the previous one, indicating the stocks price of AT&T in the previous 3 months should be included as the predictor of Netflix’s stock price.

# Build an Arima model on Netflix data, store the residuals from this model
netflix.arima = arima(netflix.ts, order = c(1,1,0))
error.netflix = netflix.arima$residuals

# Use the coefficients to filter AT&T 
att.filtered <- residuals(Arima(att.ts,model=netflix.arima))

#plot CCF
ccf(att.filtered,error.netflix)

VAR model

I also try VAR select, BIC suggests p = 1. Finally, I run 2 VAR model with p = 1 and p = 3. Both models pass the test for residuals’ serial correlation. However, I am leaning towards p = 3 regarding Rsquared and p-value.

It seems that the stock price of ATT 3 months ago will be a good predictor for stock price of Netflix (but not the other way around)

netflix.diff = att.diff1.(t-1) + netflix.diff1.(t-1) + att.diff1.(t-2) + netflix.diff1.(t-2) + att.diff1.(t-3) + netflix.diff1.(t-3) + const + trend

netflix.diff(t) = 0.29 x att.diff1.(t-1) + 0.03 x netflix.diff1.(t-1) + 1.28 x att.diff1.(t-2) - 0.09 x netflix.diff1.(t-2) - 3.00 x att.diff1.(t-3) + 0.15 x netflix.diff1.(t-3) - 3.51 + 0.26

A negative significant coeefficient (-3.00379) also demonstrates that keeping everything else the same, if att.diff1.(t-3) increases by 1 unit, netflix.diff(t) decreases by 3 unit.

VARselect(cbind(att.diff1,netflix.diff1), lag.max=8,
          type="const")[["selection"]]
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      4      3      1      4
# VAR model with p = 1
fitvar1=VAR(cbind(att.diff1,netflix.diff1), p=1, type="both")
summary(fitvar1)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: att.diff1, netflix.diff1 
## Deterministic variables: both 
## Sample size: 65 
## Log Likelihood: -376.855 
## Roots of the characteristic polynomial:
## 0.1522 0.05469
## Call:
## VAR(y = cbind(att.diff1, netflix.diff1), p = 1, type = "both")
## 
## 
## Estimation results for equation att.diff1: 
## ========================================== 
## att.diff1 = att.diff1.l1 + netflix.diff1.l1 + const + trend 
## 
##                   Estimate Std. Error t value Pr(>|t|)
## att.diff1.l1     -0.166838   0.129529  -1.288    0.203
## netflix.diff1.l1 -0.011294   0.013619  -0.829    0.410
## const             0.295143   0.384847   0.767    0.446
## trend            -0.004341   0.010447  -0.416    0.679
## 
## 
## Residual standard error: 1.482 on 61 degrees of freedom
## Multiple R-Squared: 0.03704, Adjusted R-squared: -0.01031 
## F-statistic: 0.7822 on 3 and 61 DF,  p-value: 0.5084 
## 
## 
## Estimation results for equation netflix.diff1: 
## ============================================== 
## netflix.diff1 = att.diff1.l1 + netflix.diff1.l1 + const + trend 
## 
##                  Estimate Std. Error t value Pr(>|t|)   
## att.diff1.l1      0.14547    1.25294   0.116  0.90795   
## netflix.diff1.l1 -0.04004    0.13174  -0.304  0.76220   
## const            -3.82547    3.72262  -1.028  0.30818   
## trend             0.28785    0.10105   2.849  0.00598 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 14.33 on 61 degrees of freedom
## Multiple R-Squared: 0.1236,  Adjusted R-squared: 0.08047 
## F-statistic: 2.867 on 3 and 61 DF,  p-value: 0.04378 
## 
## 
## 
## Covariance matrix of residuals:
##               att.diff1 netflix.diff1
## att.diff1         2.196        -5.328
## netflix.diff1    -5.328       205.458
## 
## Correlation matrix of residuals:
##               att.diff1 netflix.diff1
## att.diff1        1.0000       -0.2509
## netflix.diff1   -0.2509        1.0000
serial.test(fitvar1, type="PT.asymptotic")
## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object fitvar1
## Chi-squared = 60.66, df = 60, p-value = 0.4519
# VAR model with p = 3
fitvar3=VAR(cbind(att.diff1,netflix.diff1), p=3, type="both")
summary(fitvar3)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: att.diff1, netflix.diff1 
## Deterministic variables: both 
## Sample size: 63 
## Log Likelihood: -355.045 
## Roots of the characteristic polynomial:
## 0.7614 0.7611 0.7611 0.7364 0.7364 0.6567
## Call:
## VAR(y = cbind(att.diff1, netflix.diff1), p = 3, type = "both")
## 
## 
## Estimation results for equation att.diff1: 
## ========================================== 
## att.diff1 = att.diff1.l1 + netflix.diff1.l1 + att.diff1.l2 + netflix.diff1.l2 + att.diff1.l3 + netflix.diff1.l3 + const + trend 
## 
##                   Estimate Std. Error t value Pr(>|t|)   
## att.diff1.l1     -0.206807   0.128582  -1.608  0.11348   
## netflix.diff1.l1 -0.014998   0.013606  -1.102  0.27513   
## att.diff1.l2     -0.134079   0.130770  -1.025  0.30971   
## netflix.diff1.l2 -0.008693   0.013867  -0.627  0.53331   
## att.diff1.l3     -0.173318   0.129962  -1.334  0.18783   
## netflix.diff1.l3 -0.043394   0.014383  -3.017  0.00386 **
## const             0.196501   0.405985   0.484  0.63030   
## trend             0.006473   0.011767   0.550  0.58446   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 1.436 on 55 degrees of freedom
## Multiple R-Squared: 0.1815,  Adjusted R-squared: 0.07737 
## F-statistic: 1.743 on 7 and 55 DF,  p-value: 0.1181 
## 
## 
## Estimation results for equation netflix.diff1: 
## ============================================== 
## netflix.diff1 = att.diff1.l1 + netflix.diff1.l1 + att.diff1.l2 + netflix.diff1.l2 + att.diff1.l3 + netflix.diff1.l3 + const + trend 
## 
##                  Estimate Std. Error t value Pr(>|t|)  
## att.diff1.l1      0.29100    1.21546   0.239   0.8117  
## netflix.diff1.l1  0.03626    0.12861   0.282   0.7791  
## att.diff1.l2      1.28411    1.23614   1.039   0.3034  
## netflix.diff1.l2 -0.09940    0.13108  -0.758   0.4515  
## att.diff1.l3     -3.00379    1.22850  -2.445   0.0177 *
## netflix.diff1.l3  0.15443    0.13596   1.136   0.2610  
## const            -3.51525    3.83769  -0.916   0.3637  
## trend             0.26123    0.11123   2.349   0.0225 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 13.57 on 55 degrees of freedom
## Multiple R-Squared: 0.2896,  Adjusted R-squared: 0.1992 
## F-statistic: 3.203 on 7 and 55 DF,  p-value: 0.006436 
## 
## 
## 
## Covariance matrix of residuals:
##               att.diff1 netflix.diff1
## att.diff1         2.062        -5.162
## netflix.diff1    -5.162       184.240
## 
## Correlation matrix of residuals:
##               att.diff1 netflix.diff1
## att.diff1        1.0000       -0.2649
## netflix.diff1   -0.2649        1.0000
serial.test(fitvar3, type="PT.asymptotic")
## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object fitvar3
## Chi-squared = 42.494, df = 52, p-value = 0.8237

References

https://stats.stackexchange.com/questions/221072/why-is-prewhitening-important

https://rpubs.com/antonio78/282689

https://stats.stackexchange.com/questions/177781/what-does-decimal-lag-point-mean-in-pacf-graph-produced-by-r

https://stats.stackexchange.com/questions/252903/how-to-prewhiten-univariate-time-series

https://onlinecourses.science.psu.edu/stat510/node/79/

https://stats.stackexchange.com/questions/43370/filtering-using-arma-model-in-r