Today in class we began our discussion of time series regression, covering sections 6.1 and 6.2. We started with a general overview, then moved on to autocorrelation and the Durbin watson statistic.
We can write the time series trend as
\[y_t = TR_t + \epsilon_t\]
where \(y_t\) is the value at time t, \(TR_t\) is the trend, and \(\epsilon_t\) is the error term.
We begin by plotting the data as always, and visually making an informed guess as to which kind of trend we see. If the data looks like a horizontal line, there could be no trend, with \(TR_t = \beta_0\). If the data looks like a line with a positive or negative slope, there could be a linear trend, with \(TR_t = \beta_0 + \beta_1t\). The data could also have a quadratic trend, with \(TR_t = \beta_0 + \beta_1t + \beta_2t^2\). We can keep adding terms to create a pth order polynomial trend.
In later examples I will show how to test the significance of the polynomial terms to decide whether or not they are needed in the model.
If the residuals for our dataset through time are correlated, we have autocorrelation. We do not want autocorrelation in our data, as this can lead to our standard errors being incorrect, and either rejecting or failing to reject the null hypothesis in a case where we would actually want to do the opposite.
Positive autocorrelation is when a positive residual is likely to be followed by another positive residual, or if a negative residual is likely to be followed by another negative residual.
Negative autocorrelation is when a positive residual is likely to be followed by a negative residual, or if a negative residual is likely to be followed by a positive residual.
To detect autocorrelation, we can first plot the residuals through time to visually check if we can quickly see autocorrelation. However, we can also formally test with the Durbin Watson statistic.
I will only focus on the first order autocorrelation, which looks at the autocorrelation between two residuals that are directly near each other.
The Durbin Watson statistic tests whether or not there is autocorrelation between two residuals. We can either test if there is autocorrelation, or specifically if there is negative or positive autocorrelation.
For an example I will use data on the number of airline passengers over time.
library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data(airpass)
head(airpass)
## pass year
## 1 112 49.08333
## 2 118 49.16667
## 3 132 49.25000
## 4 129 49.33333
## 5 121 49.41667
## 6 135 49.50000
attach(airpass)
summary(airpass)
## pass year
## Min. :104.0 Min. :49.08
## 1st Qu.:180.0 1st Qu.:52.06
## Median :265.5 Median :55.04
## Mean :280.3 Mean :55.04
## 3rd Qu.:360.5 3rd Qu.:58.02
## Max. :622.0 Max. :61.00
plot(pass ~ year)
We can see that there is an obvious trend, and that there also appears to be some heterscedasticity.
It appears to me that there is more than just a linear trend going on here, so I will start with a quadratic linear model.
mod1 <- lm(pass ~ year + I(year^2))
summary(mod1)
##
## Call:
## lm(formula = pass ~ year + I(year^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -100.353 -27.339 -7.442 21.603 146.116
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1570.5174 1053.9017 1.490 0.13841
## year -79.2078 38.4007 -2.063 0.04098 *
## I(year^2) 1.0092 0.3487 2.894 0.00441 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.91 on 141 degrees of freedom
## Multiple R-squared: 0.8618, Adjusted R-squared: 0.8599
## F-statistic: 439.8 on 2 and 141 DF, p-value: < 2.2e-16
plot(mod1)
From the p-values we can see that the quadratic term is needed! Our plots seem to fit our requirements as well.
If I were to go on and add a cubed term, we would find that it would not be significant.
Since the data only goes until 1961, we can predict for 1962 using our model.
coef(mod1)%*%c(1,62,62^2)
## [,1]
## [1,] 538.9268
So we predict that 539,000 total passengers flew in the first month of 1962.
To do a Durbin Watson test, we use the dwtest command from the lmtest package.
library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dwtest(mod1, alternative = "two.sided")
##
## Durbin-Watson test
##
## data: mod1
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is not 0
From this we can conclude that there is autocorrelation, although we’re not sure which kind.
dwtest(mod1)
##
## Durbin-Watson test
##
## data: mod1
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0
From this second test we can conclude that there is postive autocorrelation. Not so good!