I pick 5 years of historical stock data of Netflix and AT&T. I guess that streaming services and cable subscription services may have negative correlation.
netflix <- netflix[ , date := as.Date(date, format = "%m/%d/%y")]
netflix <- netflix[order(date),]
att <- att[ , date := as.Date(date, format = "%m/%d/%y")]
# create time seiries object
netflix.ts <- ts(netflix$adj_close, frequency = 12, start = c(2013,1))
att.ts <- ts(att$adj_close, frequency = 12, start = c(2013,1))
Both data sets are non stationary
plot.ts(netflix.ts)
plot.ts(att.ts)
ggAcf(netflix.ts)
ggPacf(netflix.ts)
ggAcf(att.ts)
ggPacf(att.ts)
Now I run CCF (cross-correlation function) to determine which lag of AT&T or Netflix should be included in the model. However, the data are non stationary, there are trends, seasonal effects and other factors in my data, so this CCF plot does not say much about which lag of each stock is highly correlated with the other.
ccf(att.ts,netflix.ts, type = c("correlation","covariance"))
My goal is to figure out which lags of x may predict y based on CCF. Here I will try 2 methods:
CCF of the data after differencing indicates that there are spikes at lag -0.25 and 0.25. The scale of the lag-axis is in terms of complete seasonal periods. Here I have monthly data with a seasonal period of 12, so 0.25 lags means 0.25 months, or equivalently 0.25*12=3 months. Therefore, I would include 3 lags (p=3) when running VAR.
#differencing
netflix.diff1 = diff(netflix.ts,1)
att.diff1 = diff(att.ts,1)
ccf(att.diff1,netflix.diff1, type = c("correlation","covariance"))
The CCF plot looks similar to the previous one, indicating the stocks price of AT&T in the previous 3 months should be included as the predictor of Netflix’s stock price.
# Build an Arima model on Netflix data, store the residuals from this model
netflix.arima = arima(netflix.ts, order = c(1,1,0))
error.netflix = netflix.arima$residuals
# Use the coefficients to filter AT&T
att.filtered <- residuals(Arima(att.ts,model=netflix.arima))
#plot CCF
ccf(att.filtered,error.netflix)
I also try VAR select, BIC suggests p = 1. Finally, I run 2 VAR model with p = 1 and p = 3. Both models pass the test for residuals’ serial correlation. However, I am leaning towards p = 3 regarding Rsquared and p-value.
It seems that the stock price of ATT 3 months ago will be a good predictor for stock price of Netflix (but not the other way around)
netflix.diff = att.diff1.(t-1) + netflix.diff1.(t-1) + att.diff1.(t-2) + netflix.diff1.(t-2) + att.diff1.(t-3) + netflix.diff1.(t-3) + const + trend
netflix.diff(t) = 0.29 x att.diff1.(t-1) + 0.03 x netflix.diff1.(t-1) + 1.28 x att.diff1.(t-2) - 0.09 x netflix.diff1.(t-2) - 3.00 x att.diff1.(t-3) + 0.15 x netflix.diff1.(t-3) - 3.51 + 0.26
A negative significant coeefficient (-3.00379) also demonstrates that keeping everything else the same, if att.diff1.(t-3) increases by 1 unit, netflix.diff(t) decreases by 3 unit.
VARselect(cbind(att.diff1,netflix.diff1), lag.max=8,
type="const")[["selection"]]
## AIC(n) HQ(n) SC(n) FPE(n)
## 4 3 1 4
# VAR model with p = 1
fitvar1=VAR(cbind(att.diff1,netflix.diff1), p=1, type="both")
summary(fitvar1)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: att.diff1, netflix.diff1
## Deterministic variables: both
## Sample size: 65
## Log Likelihood: -376.855
## Roots of the characteristic polynomial:
## 0.1522 0.05469
## Call:
## VAR(y = cbind(att.diff1, netflix.diff1), p = 1, type = "both")
##
##
## Estimation results for equation att.diff1:
## ==========================================
## att.diff1 = att.diff1.l1 + netflix.diff1.l1 + const + trend
##
## Estimate Std. Error t value Pr(>|t|)
## att.diff1.l1 -0.166838 0.129529 -1.288 0.203
## netflix.diff1.l1 -0.011294 0.013619 -0.829 0.410
## const 0.295143 0.384847 0.767 0.446
## trend -0.004341 0.010447 -0.416 0.679
##
##
## Residual standard error: 1.482 on 61 degrees of freedom
## Multiple R-Squared: 0.03704, Adjusted R-squared: -0.01031
## F-statistic: 0.7822 on 3 and 61 DF, p-value: 0.5084
##
##
## Estimation results for equation netflix.diff1:
## ==============================================
## netflix.diff1 = att.diff1.l1 + netflix.diff1.l1 + const + trend
##
## Estimate Std. Error t value Pr(>|t|)
## att.diff1.l1 0.14547 1.25294 0.116 0.90795
## netflix.diff1.l1 -0.04004 0.13174 -0.304 0.76220
## const -3.82547 3.72262 -1.028 0.30818
## trend 0.28785 0.10105 2.849 0.00598 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 14.33 on 61 degrees of freedom
## Multiple R-Squared: 0.1236, Adjusted R-squared: 0.08047
## F-statistic: 2.867 on 3 and 61 DF, p-value: 0.04378
##
##
##
## Covariance matrix of residuals:
## att.diff1 netflix.diff1
## att.diff1 2.196 -5.328
## netflix.diff1 -5.328 205.458
##
## Correlation matrix of residuals:
## att.diff1 netflix.diff1
## att.diff1 1.0000 -0.2509
## netflix.diff1 -0.2509 1.0000
serial.test(fitvar1, type="PT.asymptotic")
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object fitvar1
## Chi-squared = 60.66, df = 60, p-value = 0.4519
# VAR model with p = 3
fitvar3=VAR(cbind(att.diff1,netflix.diff1), p=3, type="both")
summary(fitvar3)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: att.diff1, netflix.diff1
## Deterministic variables: both
## Sample size: 63
## Log Likelihood: -355.045
## Roots of the characteristic polynomial:
## 0.7614 0.7611 0.7611 0.7364 0.7364 0.6567
## Call:
## VAR(y = cbind(att.diff1, netflix.diff1), p = 3, type = "both")
##
##
## Estimation results for equation att.diff1:
## ==========================================
## att.diff1 = att.diff1.l1 + netflix.diff1.l1 + att.diff1.l2 + netflix.diff1.l2 + att.diff1.l3 + netflix.diff1.l3 + const + trend
##
## Estimate Std. Error t value Pr(>|t|)
## att.diff1.l1 -0.206807 0.128582 -1.608 0.11348
## netflix.diff1.l1 -0.014998 0.013606 -1.102 0.27513
## att.diff1.l2 -0.134079 0.130770 -1.025 0.30971
## netflix.diff1.l2 -0.008693 0.013867 -0.627 0.53331
## att.diff1.l3 -0.173318 0.129962 -1.334 0.18783
## netflix.diff1.l3 -0.043394 0.014383 -3.017 0.00386 **
## const 0.196501 0.405985 0.484 0.63030
## trend 0.006473 0.011767 0.550 0.58446
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 1.436 on 55 degrees of freedom
## Multiple R-Squared: 0.1815, Adjusted R-squared: 0.07737
## F-statistic: 1.743 on 7 and 55 DF, p-value: 0.1181
##
##
## Estimation results for equation netflix.diff1:
## ==============================================
## netflix.diff1 = att.diff1.l1 + netflix.diff1.l1 + att.diff1.l2 + netflix.diff1.l2 + att.diff1.l3 + netflix.diff1.l3 + const + trend
##
## Estimate Std. Error t value Pr(>|t|)
## att.diff1.l1 0.29100 1.21546 0.239 0.8117
## netflix.diff1.l1 0.03626 0.12861 0.282 0.7791
## att.diff1.l2 1.28411 1.23614 1.039 0.3034
## netflix.diff1.l2 -0.09940 0.13108 -0.758 0.4515
## att.diff1.l3 -3.00379 1.22850 -2.445 0.0177 *
## netflix.diff1.l3 0.15443 0.13596 1.136 0.2610
## const -3.51525 3.83769 -0.916 0.3637
## trend 0.26123 0.11123 2.349 0.0225 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 13.57 on 55 degrees of freedom
## Multiple R-Squared: 0.2896, Adjusted R-squared: 0.1992
## F-statistic: 3.203 on 7 and 55 DF, p-value: 0.006436
##
##
##
## Covariance matrix of residuals:
## att.diff1 netflix.diff1
## att.diff1 2.062 -5.162
## netflix.diff1 -5.162 184.240
##
## Correlation matrix of residuals:
## att.diff1 netflix.diff1
## att.diff1 1.0000 -0.2649
## netflix.diff1 -0.2649 1.0000
serial.test(fitvar3, type="PT.asymptotic")
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object fitvar3
## Chi-squared = 42.494, df = 52, p-value = 0.8237
https://stats.stackexchange.com/questions/221072/why-is-prewhitening-important
https://rpubs.com/antonio78/282689
https://stats.stackexchange.com/questions/252903/how-to-prewhiten-univariate-time-series
https://onlinecourses.science.psu.edu/stat510/node/79/
https://stats.stackexchange.com/questions/43370/filtering-using-arma-model-in-r