Today in class we talked about Modeling Trends and Autocorrelation.

Autocorrelation

Autocorrelation when the residuals are correlated through time.

Types of Autocorrelation

  1. Positive correlation: positive [or negative] residuals is likely to be followed by another positive [negative] residual.
  2. Negative correlation: positive [or negative] residuals are likely to be followed by another negative [positive] residual.

First Order Autocorrelation

First order correlation is what we will be working with today. What it means by first order is that we only go one unit away from the original. For example, the temperature for today is related/correlated with the temperature yesterday and the temperature tomorrow.

How To Detect

  1. Plot: Use the plot function–plot(residuals ~ time)
  2. Durbin Watson statistic: this function will tell us if the two variables are autocorrelated (positive/negative) and gives us the test stat/pvalue to reject/accept that there is autocorrelation.

Example!

We will use the data set aatemp to practice these methods.

library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data("aatemp")
names(aatemp)
## [1] "year" "temp"

We can run the plot to see if which know which type of trend will work best.

plot(temp ~ year, data = aatemp, type = "l")

Our data looks like it is slightly increasing, so we will construct a linear model first.

linmod <- lm(temp ~ year, data = aatemp)
summary(linmod)
## 
## Call:
## lm(formula = temp ~ year, data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9843 -0.9113 -0.0820  0.9946  3.5343 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 24.005510   7.310781   3.284  0.00136 **
## year         0.012237   0.003768   3.247  0.00153 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared:  0.08536,    Adjusted R-squared:  0.07727 
## F-statistic: 10.55 on 1 and 113 DF,  p-value: 0.001533

Looking at the summary function, our year variable is significant. According to pvalues, our model works. We will check the diagnostic plots to double check.

plot(linmod)

Looking at the first one, our line is slightly curved. Not too bad, but we can do a quadratic function to make sure that isn’t a better fit.

quadmod <- lm(temp ~ year + I(year^2), data = aatemp)
summary(quadmod)
## 
## Call:
## lm(formula = temp ~ year + I(year^2), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0412 -0.9538 -0.0624  0.9959  3.5820 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02  3.837e+02  -0.554    0.580
## year         2.567e-01  3.962e-01   0.648    0.518
## I(year^2)   -6.307e-05  1.022e-04  -0.617    0.539
## 
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared:  0.08846,    Adjusted R-squared:  0.07218 
## F-statistic: 5.434 on 2 and 112 DF,  p-value: 0.005591

Our pvalues here are not significant, so we should stick with the linear model.

Now that we have our chosen model, we can check for autocollinearity.

NOTE: autocollinearity can be used for all models, to save time we will just run it on the model we are using.

library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
dwtest(linmod)
## 
##  Durbin-Watson test
## 
## data:  linmod
## DW = 1.6177, p-value = 0.01524
## alternative hypothesis: true autocorrelation is greater than 0

Unfortunately, our p-value is small, which means we reject the null hypothesis (no autocorrelation) in favor of our alternative hypothesis (there is autocorrelation).