Concepts Covered

In class we started covering time series analysis, a variable tracked across time, and autocorrelation, both positive and negative. Positive autocorrelation is when a positive residual is likely to be followed by another positive residual or when a negative residual is likely to be followed by another negative residual. Negative autocorrelation is the opposite, it is when a positive residual is likely to be followed by a negative residual or vise versa.

Time Series Analysis

In this example I will be using the aatemp data from the library faraway. The data set aatemp gives the annual mean temperature in Ann Arbor from 1854 to 2000.

library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data(aatemp)
attach(aatemp)

First, I began by plotting temp against year to see the time series of the data, I used type=“l” so that it would be plotted with lines making the trend easier to detect.

plot(temp~year, data = aatemp, type = "l")

The trend appears to be slightly upwards, but to learn more we can create a simple linear model with temp and year.

mod=lm(temp~year)
summary(mod)
## 
## Call:
## lm(formula = temp ~ year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9843 -0.9113 -0.0820  0.9946  3.5343 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 24.005510   7.310781   3.284  0.00136 **
## year         0.012237   0.003768   3.247  0.00153 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared:  0.08536,    Adjusted R-squared:  0.07727 
## F-statistic: 10.55 on 1 and 113 DF,  p-value: 0.001533
plot(mod)

The pval is small enough for year that we can see the two variables have a significant relationship. There does appear to be a slight trend in our residuals that we can try to fix by adding a quadratic term.

mod=lm(temp~year+I(year^2))
summary(mod)
## 
## Call:
## lm(formula = temp ~ year + I(year^2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0412 -0.9538 -0.0624  0.9959  3.5820 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02  3.837e+02  -0.554    0.580
## year         2.567e-01  3.962e-01   0.648    0.518
## I(year^2)   -6.307e-05  1.022e-04  -0.617    0.539
## 
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared:  0.08846,    Adjusted R-squared:  0.07218 
## F-statistic: 5.434 on 2 and 112 DF,  p-value: 0.005591
plot(mod)

Looking at our summary, none of the variables have small enough pvals and the trend in our residuals is still there. We can try adding a cubic term to see if that helps but if not we will return to our original equation.

mod=lm(temp~year+I(year^2)+I(year^3))
summary(mod)
## 
## Call:
## lm(formula = temp ~ year + I(year^2) + I(year^3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8557 -0.9646 -0.1552  1.0485  4.1538 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  3.959e+04  1.734e+04   2.283   0.0243 *
## year        -6.159e+01  2.694e+01  -2.286   0.0241 *
## I(year^2)    3.197e-02  1.395e-02   2.291   0.0238 *
## I(year^3)   -5.527e-06  2.407e-06  -2.296   0.0236 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared:  0.1298, Adjusted R-squared:  0.1063 
## F-statistic: 5.518 on 3 and 111 DF,  p-value: 0.001436
plot(mod)

As you can see, all of our terms now have significant pvals and we have eliminated the trend in our residuals so this is our best model as of right now.

Autocorrelation

Now we can use a Durbin-Watson test to check for autocorrelation. We can test for both positive and negative autocorrelation by adding alternative=“greater” for positive and alternative=“less” for negative.

library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.4
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
dwtest(mod, alternative="greater")
## 
##  Durbin-Watson test
## 
## data:  mod
## DW = 1.7171, p-value = 0.03464
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(mod, alternative="less")
## 
##  Durbin-Watson test
## 
## data:  mod
## DW = 1.7171, p-value = 0.9654
## alternative hypothesis: true autocorrelation is less than 0

As the pval is significant for positive autocorrelation we can say that it is likely that given a positive residual another positive residual will follow and same with negative residuals.