VAR Models

VAR stands for vector auto regression. The benefit of using a VAR model compared to what we have learned so far is that one can look at how variables affect each other, instead of just a unidirectional relationship. A VAR model fits two linear regressions (or ARIMAs) in an autoregressive fashion and use one variable to forecast the other, and vis versa.

I will be using the insurance data set from the fpp2 package. Insurance provides the monthly quotations and monthly television advertising expenditure for a US insurance company.January 2002 to April 2005. Again, this is different from other data sets that we have looked at in the previous weeks because there are two different variables that we want to forecast.

library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(fpp2)
## Loading required package: ggplot2
## Loading required package: fma
## Loading required package: expsmooth
library(vars)
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following objects are masked from 'package:fma':
## 
##     cement, housing, petrol
## Loading required package: strucchange
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: urca
## Loading required package: lmtest
head(insurance)
##            Quotes TV.advert
## Jan 2002 12.97065  7.212725
## Feb 2002 15.38714  9.443570
## Mar 2002 13.22957  7.534250
## Apr 2002 12.97065  7.212725
## May 2002 15.38714  9.443570
## Jun 2002 11.72288  6.415215
autoplot(insurance)

First I want to select the infoormation criteria for my VAR model, which I can do with the varSelect function.

VARselect(insurance)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      2      2      2      2 
## 
## $criteria
##                 1          2          3          4          5          6
## AIC(n) -1.0712004 -1.3757416 -1.2626328 -1.1081294 -0.9165509 -0.7178200
## HQ(n)  -0.9815494 -1.2263232 -1.0534471 -0.8391764 -0.5878305 -0.3293322
## SC(n)  -0.7909609 -0.9086758 -0.6087407 -0.2674110  0.1109938  0.4965511
## FPE(n)  0.3430569  0.2542433  0.2879067  0.3429910  0.4295782  0.5514634
##                  7          8          9         10
## AIC(n) -0.52526979 -0.6342723 -1.1190649 -1.3302006
## HQ(n)  -0.07701471 -0.1262498 -0.5512751 -0.7026435
## SC(n)   0.87592759  0.9537514  0.6557851  0.6314757
## FPE(n)  0.72033045  0.7185339  0.5145059  0.5163355

This also gives me the final prediction error for the different lag orders. Interestingly, all of the criteria have picked 2 components. I can use this lag order to put into the next function to make the VAR model.

The next part of the VAR function thatI need to fill out is the type part, which is the type of deterministic regression to include. This type is either going to be constant, trend, both or none. Information criterion is based upon if we’re going to have it choose for us which way for the inclusion of the number of lags. However, since I already looked at VARSelect, we don’t need to worry about that aspect of the function documentation.

var1 <- VAR(insurance, p = 2, type =c("both"))
var1
## 
## VAR Estimation Results:
## ======================= 
## 
## Estimated coefficients for equation Quotes: 
## =========================================== 
## Call:
## Quotes = Quotes.l1 + TV.advert.l1 + Quotes.l2 + TV.advert.l2 + const + trend 
## 
##    Quotes.l1 TV.advert.l1    Quotes.l2 TV.advert.l2        const        trend 
##   2.53906293  -3.02646958  -0.41711186  -0.13626972   8.96878449   0.07354499 
## 
## 
## Estimated coefficients for equation TV.advert: 
## ============================================== 
## Call:
## TV.advert = Quotes.l1 + TV.advert.l1 + Quotes.l2 + TV.advert.l2 + const + trend 
## 
##    Quotes.l1 TV.advert.l1    Quotes.l2 TV.advert.l2        const        trend 
##    1.0875428   -1.2489913   -0.1153345   -0.2064523    5.5713260    0.0585788

This output now gives me coefficients for the equation for quotes and the coefficients for the equation for tv advertisements.

And then a visual of the fit and residuals for each of these equations:

plot(var1)

As well as a forecast

fc_var1 <- forecast(var1, h = 12)
autoplot(fc_var1)

What I would have liked to do next would be to check my residuals and then to check the accuracy of my forecast. With the code checkresiduals(fc_var1), I received the error no residuals found. I also received an error for accuracy. When using the function accuracy(fc_var1), I got the error Error in accuracy(fcast, test = test, d = d, D = D) : argument ā€œDā€ is missing, with no default. So, I will continue working on how to determine residual error and error matrices so that I can compare this type of modelling to the others that we have learned thus far.