In class we introduced time series and autocorrelation. We first started with the different types of time series models we can do such as linear, quadratic, or nth degree polynomial models. To make these models we use similar concepts as we have learned before but we use time as our predictor variable.

I will use the data on Ann Arbor’s temperature to demonstrate what we learned.

Modeling

First I want to make a plot to see how my data looks.

library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data(aatemp)
attach(aatemp)
plot(temp~year)

It looks to have a pretty random scatter but looks like it could be predicted well using a linear or even a quadratic model. I will test the linear model first.

model<-lm(temp~year)
summary(model)
## 
## Call:
## lm(formula = temp ~ year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9843 -0.9113 -0.0820  0.9946  3.5343 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 24.005510   7.310781   3.284  0.00136 **
## year         0.012237   0.003768   3.247  0.00153 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared:  0.08536,    Adjusted R-squared:  0.07727 
## F-statistic: 10.55 on 1 and 113 DF,  p-value: 0.001533

This model is not super fantastic as our R^2 is very low but we have significant coefficients at the very least. I will see if the quadratic form is better.

model2<-lm(temp~year+I(year^2))
summary(model2)
## 
## Call:
## lm(formula = temp ~ year + I(year^2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0412 -0.9538 -0.0624  0.9959  3.5820 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02  3.837e+02  -0.554    0.580
## year         2.567e-01  3.962e-01   0.648    0.518
## I(year^2)   -6.307e-05  1.022e-04  -0.617    0.539
## 
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared:  0.08846,    Adjusted R-squared:  0.07218 
## F-statistic: 5.434 on 2 and 112 DF,  p-value: 0.005591

This as we can see is worse. Adjusted R^2 got worse and none of our coefficients are significant anymore. I will try the cubic and see if that helps.

model3<-lm(temp~year+I(year^2)+I(year^3))
summary(model3)
## 
## Call:
## lm(formula = temp ~ year + I(year^2) + I(year^3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8557 -0.9646 -0.1552  1.0485  4.1538 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  3.959e+04  1.734e+04   2.283   0.0243 *
## year        -6.159e+01  2.694e+01  -2.286   0.0241 *
## I(year^2)    3.197e-02  1.395e-02   2.291   0.0238 *
## I(year^3)   -5.527e-06  2.407e-06  -2.296   0.0236 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared:  0.1298, Adjusted R-squared:  0.1063 
## F-statistic: 5.518 on 3 and 111 DF,  p-value: 0.001436

This is the best model we have found. Our adjusted R^2 is the best yet and all of our coefficients are significant. We would choose to use the cubic model in this situation.

Forecasting

The other thing that our time series models are useful for is forecasting. If we have a data set from a set of years and then we can use it to predict and estimate for the coming years.

For this data set I will forecast out to 2005 using the cubic model and see what it looks like.

forecast2<-coef(summary(model3))[1,1]+coef(summary(model3))[2,1]*2005+coef(summary(model3))[3,1]*2005^2+coef(summary(model3))[4,1]*2005^3
forecast2
## [1] 47.30557

This predicts a temp of 47.31 for the year 2005 based on our model. This can be super helpful for predicting into the future based off of reliable data.

Residual Plots

Another factor in the decision to make the cubic model was the residual plots of the previous models.

plot(model)

plot(residuals(model)~year)
abline(0,0)

plot(model2)

plot(residuals(model2)~year)
abline(0,0)

plot(model3)

plot(residuals(model3)~year)
abline(0,0)

We can see the model 3 plots or our cubic model has the best looking residual plots with no patterns and with heteroscadicity. This is another reason why we chose our model.

Autocorrelation

I will test for autocorrelation which means whether or not the error term at time period t is related to the error term at the previous time period, t-1.

Positive autocorrelation means that a positive error term at period t-1 is likely to be followed by a positive error term at period t. It also means that a negative error term at period t-1 is likely to be followed by a negative error term at period t.

Negative autocorrelation means that a positive will likely be followed by a negative and vice versa.

The best way to find this is the Durbin-Watson test.

library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.4
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
dwtest(model3)
## 
##  Durbin-Watson test
## 
## data:  model3
## DW = 1.7171, p-value = 0.03464
## alternative hypothesis: true autocorrelation is greater than 0

This gives us a significant p-value and means there is a positive autocorrelation.