library(fpp2)
library(vars)
library(knitr)
LOW_data <- read.csv("C:/ECON 4210/as4_data-Shane.csv")
df = LOW_data
# define as time series
df = ts(df, start=2000, frequency=4)
# extract sales
y = df[,"LOW"]
tsdisplay(y)
Before, we get into forecasting, I felt it was paramount to discuss the interesting dataset provided which might help in explaining forecasting methods. The first plot on the top shows the level of sales over the period measured in millions of dollars. It shows an increasing trend in the graph and then a sudden huge drop in 2009 followed by a gradual increase in sales. The 2nd plot shows ACF slowly decaying, that means future values of the series are strongly correlated i.e. heavily affected by past values. The 3rd plot (Partial auto correlation) shows a case of anomiliy which needs further investigation.
cor(df[,3:5])
## LOW CS PERMIT
## LOW 1.0000000 -0.2442245 -0.4673471
## CS -0.2442245 1.0000000 0.6733693
## PERMIT -0.4673471 0.6733693 1.0000000
The table show the correlation among the varaibles; building permits(PERMIT),consumer sentiment (CS) and Lows sales (LOW). The table shows that there is a strong positive correlation between building permits and consumer sentiment (around 70%). It is to be noted however that correlation does not mean causation. The LOW sales and the other 2 variables (CS and Permit) are both negatively autocorrelated with the stronger correlation being with Permit. We will keep this in mind while making the VAR model.
vardata = log (df[,c(5,4,3)])
plot(vardata, main = "VAR data", xlab = "")
In order to make the data stationary, it is necessary to difference it. In order to understand the general pattern of the variables before we create equations, it might be usefult to look at the data. The graph is in line with what we expected looking at the correlation between CS and PERMIT i.e. the general pattern looks similar. The LOW sales show and increasing trend and very slow increase between 2007 and 2018.
Looking at the graph plot for stationary LOW sales, I believe adding time trend aswell as intercept in each equation of the model will be helpful.
VARselect(vardata, lag.max = 9, type = "both", season=4)
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 2 1 1 2
##
## $criteria
## 1 2 3 4
## AIC(n) -1.709368e+01 -1.714677e+01 -1.701895e+01 -1.704070e+01
## HQ(n) -1.676819e+01 -1.669923e+01 -1.644935e+01 -1.634905e+01
## SC(n) -1.626317e+01 -1.600482e+01 -1.556556e+01 -1.527587e+01
## FPE(n) 3.786951e-08 3.617822e-08 4.164941e-08 4.159549e-08
## 5 6 7 8
## AIC(n) -1.686606e+01 -1.692950e+01 -1.671578e+01 -1.659662e+01
## HQ(n) -1.605235e+01 -1.599374e+01 -1.565795e+01 -1.541674e+01
## SC(n) -1.478979e+01 -1.454179e+01 -1.401663e+01 -1.358603e+01
## FPE(n) 5.104057e-08 4.995692e-08 6.548313e-08 7.955198e-08
## 9
## AIC(n) -1.656029e+01
## HQ(n) -1.525835e+01
## SC(n) -1.323826e+01
## FPE(n) 9.104461e-08
The R output shows the lag length selected by each of the information criteria available in the vars package. There is a large discrepancy between the VAR(2) selected by the AIC and the VAR(1) selected by the BIC (i.e. SC in the table). This is not unusual. As a result we first fit a VAR(1), as selected by the BIC.
var1 <- VAR(vardata, p=1, type="both", season=4)
serial.test(var1, lags.pt=16, type="PT.asymptotic")
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var1
## Chi-squared = 153.08, df = 135, p-value = 0.1367
var2 <- VAR(vardata, p=2, type="both", season=4)
serial.test(var2, lags.pt=16, type="PT.asymptotic")
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var2
## Chi-squared = 120.22, df = 126, p-value = 0.6285
The residuals for both models pass the test for serial correlation since p-value is above 0.05. I will go forward with p=1 since it is more significant than p=2.
var.1 = VAR(vardata, p=1, type = "both", season =4)
summary(var.1)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: PERMIT, CS, LOW
## Deterministic variables: both
## Sample size: 69
## Log Likelihood: 319.397
## Roots of the characteristic polynomial:
## 0.9621 0.9621 0.3707
## Call:
## VAR(y = vardata, p = 1, type = "both", season = 4L)
##
##
## Estimation results for equation PERMIT:
## =======================================
## PERMIT = PERMIT.l1 + CS.l1 + LOW.l1 + const + trend + sd1 + sd2 + sd3
##
## Estimate Std. Error t value Pr(>|t|)
## PERMIT.l1 0.978415 0.044034 22.220 <2e-16 ***
## CS.l1 0.068461 0.115189 0.594 0.5545
## LOW.l1 -0.161913 0.074744 -2.166 0.0342 *
## const 1.242012 0.823903 1.507 0.1369
## trend 0.002866 0.001453 1.973 0.0531 .
## sd1 -0.021229 0.025269 -0.840 0.4041
## sd2 0.006454 0.024848 0.260 0.7959
## sd3 0.012001 0.029652 0.405 0.6871
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.07222 on 61 degrees of freedom
## Multiple R-Squared: 0.9747, Adjusted R-squared: 0.9718
## F-statistic: 336.2 on 7 and 61 DF, p-value: < 2.2e-16
##
##
## Estimation results for equation CS:
## ===================================
## CS = PERMIT.l1 + CS.l1 + LOW.l1 + const + trend + sd1 + sd2 + sd3
##
## Estimate Std. Error t value Pr(>|t|)
## PERMIT.l1 0.167839 0.046034 3.646 0.000553 ***
## CS.l1 0.435460 0.120420 3.616 0.000608 ***
## LOW.l1 -0.232102 0.078139 -2.970 0.004249 **
## const 3.270004 0.861320 3.797 0.000340 ***
## trend 0.005094 0.001519 3.354 0.001375 **
## sd1 -0.052429 0.026417 -1.985 0.051682 .
## sd2 0.003493 0.025976 0.134 0.893477
## sd3 -0.006115 0.030998 -0.197 0.844266
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.0755 on 61 degrees of freedom
## Multiple R-Squared: 0.7525, Adjusted R-squared: 0.7241
## F-statistic: 26.5 on 7 and 61 DF, p-value: 2.854e-16
##
##
## Estimation results for equation LOW:
## ====================================
## LOW = PERMIT.l1 + CS.l1 + LOW.l1 + const + trend + sd1 + sd2 + sd3
##
## Estimate Std. Error t value Pr(>|t|)
## PERMIT.l1 0.0591408 0.0211142 2.801 0.00682 **
## CS.l1 -0.1215438 0.0552329 -2.201 0.03156 *
## LOW.l1 0.8785107 0.0358398 24.512 < 2e-16 ***
## const 1.1948574 0.3950613 3.024 0.00364 **
## trend 0.0018767 0.0006967 2.694 0.00912 **
## sd1 0.2079249 0.0121165 17.160 < 2e-16 ***
## sd2 0.2362321 0.0119144 19.827 < 2e-16 ***
## sd3 -0.0766388 0.0142180 -5.390 1.2e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.03463 on 61 degrees of freedom
## Multiple R-Squared: 0.9912, Adjusted R-squared: 0.9902
## F-statistic: 985.7 on 7 and 61 DF, p-value: < 2.2e-16
##
##
##
## Covariance matrix of residuals:
## PERMIT CS LOW
## PERMIT 0.005215 0.0014767 0.0010047
## CS 0.001477 0.0056999 0.0002842
## LOW 0.001005 0.0002842 0.0011991
##
## Correlation matrix of residuals:
## PERMIT CS LOW
## PERMIT 1.0000 0.2708 0.4018
## CS 0.2708 1.0000 0.1087
## LOW 0.4018 0.1087 1.0000
We are given multiple equations and looking at the adjust R-square will help us decide how equations performed. Looking at 1st equation(Estimation results for equation PERMIT: ), we can see that the adjust Rsquare is pretty high (around 97%) therefore it does a good job as a model. Estimation results for equation CS equation shows a poor result since Rsquare is relatively much lower. Looking at the final equation,(Estimation results for equation LOW); we can see a superior result. It is effectively capturing 99% of the variability.
roots(var.1)
## [1] 0.9621418 0.9621418 0.3706701
All the roots less than 1 implying a suitable VAR model.
plot(var.1, names = "CS")
1st plot show a reasonable fit of the equation with the blue dotted line capturing the datapoints. 2nd plot show the residuals arond mean(0), the residuals got high in the middle of the graph( between 30 and 40 specifcally). ACF residuals show significance in the 1st lag and the last lag. Overall it seems reasonable.
plot(var.1, names = "PERMIT")
1st plot show a very good fit of the equation with the blue dotted line capturing the datapoints probably because there is few variations. 2nd plot show the residuals arond mean(0), the residuals got high in the middle of the graph(at 35). ACF residuals show significance in the 1st lag and the last lag. Overall it seems reasonable.
plot(var.1, names = "LOW")
1st plot show a very good fit of the equation and effectively integrating the variables for explaining. 2nd plot show the residuals arond mean(0), a lot of variation throughout the graph. ACF residuals show significance in the 1st lag, 2nd lag and the last lag. Overall it seems reasonable.
Acf(residuals(var.1)[,1], main="ACF of CS")
##### detailed breakdown for residuals Overall Permit(graph 1), CS(graph 5) and LOW(graph 9) seems within the blue dotted lines except in a few lags in which they slightly got out of the border indicating correlation. Among the combination graphs (grpah 3 and 6 strong indication of significant corrrelation) Other graphs has little to no correlation among residuals. LOW and CS, LOW and PERM grpah is very interesting since it is almost flat indicating how strong correlated the variables are and this it is opposite for residuals.
Acf(residuals(var.1)[,"CS"], main="ACF of CS")
Acf(residuals(var.1)[,"PERMIT"], main="ACF of PERMIT")
Acf(residuals(var.1)[,"LOW"], main="ACF of LOW")
plot 1 shows some correlation in residuals in lag 11 and 12. plot 2 shows a decaying residuals with correlation somewhat increasing as lag increases. After lag 10, we see significant correlations.
serial.test(var.1, lags.pt = 16, type = "PT.adjusted")
##
## Portmanteau Test (adjusted)
##
## data: Residuals of VAR object var.1
## Chi-squared = 176.28, df = 135, p-value = 0.009812
residuals are insignficant since p-value is less than 0.05 indicating model is good.
It helps us test whether one variable is useful in forecasting another
causality(var.1, cause= c("PERMIT", "CS" ))
## $Granger
##
## Granger causality H0: PERMIT CS do not Granger-cause LOW
##
## data: VAR object var.1
## F-Test = 3.9394, df1 = 2, df2 = 183, p-value = 0.02113
##
##
## $Instant
##
## H0: No instantaneous causality between: PERMIT CS and LOW
##
## data: VAR object var.1
## Chi-squared = 9.5893, df = 2, p-value = 0.008274
The above results are quite interesting. The first test, testing for Granger Causality, yields a low p-value (0.02113), meaning we cannot confidently reject the null hypothesis. The second test, however, produces a much more significant p-value (0.008274), meaning we can reject the null with a high degree of confidence. This means that we cannot say that PERMIT and CS Granger-cause LOW, but we can say that PERMIT, CS, and LOW show signs of instantaneous causality.
One of the more useful techniques in this corner: impulse-response (IR) simulations by way of vector autoregression (VAR) modeling. It is a powerful tool for developing perspective on a recurring question What could happen to y if x changes by z percent?
var1a.irf <- irf(var.1, n.ahead = 16, boot = TRUE, runs=500, seed=99, cumulative=FALSE)
par(mfrow=c(3,3))
plot(var1a.irf, plot.type = "single")
par(mfrow=c(1,1))
par(mfrow=c(2,2)) plot( irf(var.1, response = “LOW”, n.ahead = 24, boot = TRUE, runs=500) , plot.type = “single”) par(mfrow=c(1,1)) ```
The first plot shows us a relatively big reaction between changes to PERMIT values and Lowes sales, with a pretty big CI. It indicates that the former has some influence on the latter.
The second plot, showing the Orthogonal Impulse response on LOW from CS, which looks like a tick sign shows more signs of reactivity, with it initially dropping sharply for first 4 degrees and then gradually increasing effectively reversing around shock 17 and remaining positive going forward. This indicates that Consumer Sentiment affects Lowes sales to a much higher degree than building permit numbers.
The third graph is not really informative becuase it is plotted against itself.
forecast error variance decompositions allow us to analyze the dynamic interaction between all variables thus providing us with further insights about the VAR model.
fevd(var.1, n.ahead = 16)
## $PERMIT
## PERMIT CS LOW
## [1,] 1.0000000 0.000000000 0.000000000
## [2,] 0.9949579 0.002442164 0.002599955
## [3,] 0.9849293 0.006501082 0.008569622
## [4,] 0.9708761 0.011534252 0.017589625
## [5,] 0.9535031 0.017259906 0.029237024
## [6,] 0.9334464 0.023498307 0.043055330
## [7,] 0.9113062 0.030102659 0.058591138
## [8,] 0.8876420 0.036942649 0.075415374
## [9,] 0.8629620 0.043902121 0.093135884
## [10,] 0.8377160 0.050879705 0.111404279
## [11,] 0.8122920 0.057789331 0.129918663
## [12,] 0.7870166 0.064560008 0.148423377
## [13,] 0.7621584 0.071134935 0.166706712
## [14,] 0.7379325 0.077470182 0.184597325
## [15,] 0.7145069 0.083533146 0.201959963
## [16,] 0.6920081 0.089300946 0.218690925
##
## $CS
## PERMIT CS LOW
## [1,] 0.07335515 0.9266449 0.000000000
## [2,] 0.10389970 0.8884406 0.007659671
## [3,] 0.12917160 0.8494521 0.021376354
## [4,] 0.14880623 0.8133469 0.037846899
## [5,] 0.16363700 0.7810713 0.055291681
## [6,] 0.17455101 0.7525614 0.072887593
## [7,] 0.18230508 0.7274529 0.090242001
## [8,] 0.18752696 0.7053269 0.107146190
## [9,] 0.19073492 0.6857902 0.123474895
## [10,] 0.19235574 0.6684982 0.139146100
## [11,] 0.19274000 0.6531557 0.154104293
## [12,] 0.19217522 0.6395120 0.168312757
## [13,] 0.19089666 0.6273540 0.181749352
## [14,] 0.18909645 0.6164998 0.194403714
## [15,] 0.18693091 0.6067940 0.206275114
## [16,] 0.18452678 0.5981025 0.217370692
##
## $LOW
## PERMIT CS LOW
## [1,] 0.1614079 1.474071e-08 0.8385921
## [2,] 0.1732654 3.471504e-02 0.7920196
## [3,] 0.1849004 6.422795e-02 0.7508717
## [4,] 0.1978736 8.299099e-02 0.7191354
## [5,] 0.2120483 9.406506e-02 0.6938866
## [6,] 0.2270809 1.002697e-01 0.6726494
## [7,] 0.2426693 1.034150e-01 0.6539157
## [8,] 0.2585682 1.046042e-01 0.6368276
## [9,] 0.2745718 1.045207e-01 0.6209074
## [10,] 0.2904996 1.036048e-01 0.6058956
## [11,] 0.3061881 1.021530e-01 0.5916588
## [12,] 0.3214877 1.003741e-01 0.5781383
## [13,] 0.3362610 9.842018e-02 0.5653188
## [14,] 0.3503842 9.640556e-02 0.5532103
## [15,] 0.3637477 9.441717e-02 0.5418351
## [16,] 0.3762580 9.252153e-02 0.5312204
It is evident that the PERMIT data has higher values overall, which is in line with out findings from the IRF plots we discussed above.
#### forecast
var.fc = predict(var.1, n.ahead= 4)
plot(var.fc)
from 2017 3rd quaretr to 4th Q, Permit numbers will pretty nuch the same stable while Consumer Sentiment will decline overall and Lowes sales will dip at first before rising again. The latter appears to be doing a decent job of capturing the trend and seasonality of the data, and has relatively narrow confidence bands.