6.1 and 6.2

In yesterday’s class, we started chapter 6, which is about time series regression. We began section 6.1 and talked about modeling trends with polynomial functions and moved onto section 6.2, which is about detecting autocorrelation.

Example

In class, we did an example to show how we can model our data. We will use the airpass data from the faraway package. We will first plot our passengers as our response variable and our year as our predictor variable and connect the trends between the two with a line, so we specificy the type of our plot as “l”.

library(faraway)
## Warning: package 'faraway' was built under R version 3.4.3
data(airpass)
attach(airpass)
plot(pass ~ year, type = "l")

We can see that there is an upward trend in this plot, but we still want to examine our models a bit more. We now create a model of our data and examine the summary and plots as we’ve done in the past.

mod <- lm(pass ~ year)
summary(mod)
## 
## Call:
## lm(formula = pass ~ year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -93.858 -30.727  -5.757  24.489 164.999 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1474.771     61.106  -24.14   <2e-16 ***
## year           31.886      1.108   28.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared:  0.8536, Adjusted R-squared:  0.8526 
## F-statistic: 828.2 on 1 and 142 DF,  p-value: < 2.2e-16
plot(mod)

The summary of our plots shows that there is a relationship between the two variables, but in our first plot, we see some evidence of a trend in our residuals, which we do not want. So we will now create a model that adds another term.

mod2 <- lm(pass ~ year + I(year^2))
summary(mod2)
## 
## Call:
## lm(formula = pass ~ year + I(year^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -100.353  -27.339   -7.442   21.603  146.116 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 1570.5174  1053.9017   1.490  0.13841   
## year         -79.2078    38.4007  -2.063  0.04098 * 
## I(year^2)      1.0092     0.3487   2.894  0.00441 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.91 on 141 degrees of freedom
## Multiple R-squared:  0.8618, Adjusted R-squared:  0.8599 
## F-statistic: 439.8 on 2 and 141 DF,  p-value: < 2.2e-16
plot(mod2)

As we can see from our summary, there is a relationship between our year and year2 terms, and the plot of our residuals looks a little bit better than before. There is still some evidence of a trend in our residuals, though, so we will add a cubed term:

mod3 <- lm(pass ~ year + I(year^2) + I(year^3))
summary(mod3)
## 
## Call:
## lm(formula = pass ~ year + I(year^2) + I(year^3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -98.202 -27.753  -6.369  20.411 148.690 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  1.001e+04  1.906e+04   0.525    0.600
## year        -5.414e+02  1.043e+03  -0.519    0.605
## I(year^2)    9.426e+00  1.899e+01   0.496    0.620
## I(year^3)   -5.097e-02  1.150e-01  -0.443    0.658
## 
## Residual standard error: 45.03 on 140 degrees of freedom
## Multiple R-squared:  0.862,  Adjusted R-squared:  0.8591 
## F-statistic: 291.6 on 3 and 140 DF,  p-value: < 2.2e-16

At this point, we’ve lost the relationship between our terms and the response variable, so we do not want to keep the cubed term. Our best model was our quadratic one.

6.2 Detecting Autocorrelation

In 6.2, we began talking about autocorrelation. Essentially, if your residuals are correlated through time, you have autocorrelation.

There are two types of autocorrelation: 1) positive autocorrelation, meaning positive residuals are likely to be followed by another positive residual (and vice versa); and 2) negative autocorrelation, meaning positive residuals are likely to be followed by a negative residual (and vice versa).

We also learned about first order correlation, meaning that an error term at time t is only related the the error terms at time t-1 and t+1.

Durbin Watson Stat

We learned that you can run a Durbin Watson test to see if you have autocorrelation. When testing for positive or positive autocorrelation, we have the same null and alternative hypotheses; our null hypothesis is that our error terms are not autocorrelated and our alternative hypothesis is that they do have either negative or positive autocorrelation (depending on what you are testing for). We determine whether or not to reject our null hypotheses in different ways when testing for negative or positive autocorrelation, though.

We define d as our Durbin Watson test stat. When testing for positive autocorrelation, we say that if d is small, reject our null hypothesis. When testing for negative autocorrelation, we say that if 4-d is small, we reject our null hypothesis.

We will run our Durbin Watson test on our model from earlier:

library("lmtest")
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.4
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
dwtest(mod2)
## 
##  Durbin-Watson test
## 
## data:  mod2
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0

Based on our p-value, we do have evidence of autocorrelation.