Topics

Today we talked about 6.1 and 6.2. This is the first time we talked about time-series data.

Today we are focusing on 1st order autocorrelation (only looking at the value to the left and right or each error).

There are 2 ways we will use to detect Auto correlation - 1. plot residuals throught time - 2. Durbin Watson Statistic – Durbin Watson Test Ho: no auto correlation Ha: there is auto correlation, OR Ha: there is neg. auto correaltion, OR Ha: there is pos. auto correlation

Example

library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data(airpass)
head(airpass)
##   pass     year
## 1  112 49.08333
## 2  118 49.16667
## 3  132 49.25000
## 4  129 49.33333
## 5  121 49.41667
## 6  135 49.50000
attach(airpass)
plot(pass ~ year, type="l")

#can see much less variability earlier in time
#type "l" to be connected by lines.

#after plotting, lets make some models
mod<-lm(pass~year)
summary(mod)
## 
## Call:
## lm(formula = pass ~ year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -93.858 -30.727  -5.757  24.489 164.999 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1474.771     61.106  -24.14   <2e-16 ***
## year           31.886      1.108   28.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared:  0.8536, Adjusted R-squared:  0.8526 
## F-statistic: 828.2 on 1 and 142 DF,  p-value: < 2.2e-16
plot(mod)

#definitley evidence of heteroscedasdicity. This corresponds to what we see in the previous plot.
mod2<-lm(pass~ year+I(year^2))
summary(mod2)
## 
## Call:
## lm(formula = pass ~ year + I(year^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -100.353  -27.339   -7.442   21.603  146.116 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 1570.5174  1053.9017   1.490  0.13841   
## year         -79.2078    38.4007  -2.063  0.04098 * 
## I(year^2)      1.0092     0.3487   2.894  0.00441 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.91 on 141 degrees of freedom
## Multiple R-squared:  0.8618, Adjusted R-squared:  0.8599 
## F-statistic: 439.8 on 2 and 141 DF,  p-value: < 2.2e-16
#year becomes less sig. but still keep because year squared is significant!!
#we will use quadratic moving foward because squared term is significant.
#base decision off of test stat and p-val before residual plots
plot(mod2)

#Using the code below we were able to graph our models oveer our orininal plot.
plot(pass ~ year, type="l")
lines(mod$fitted.values ~ year, col="blue", data=airpass)
lines(mod2$fitted.values ~ year, col="red", data = airpass)

#Durbin Watson test
library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
dwtest(mod2, alternative = "two.sided")
## 
##  Durbin-Watson test
## 
## data:  mod2
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is not 0
#"two.sided" for the Ha: there is autocorrelation
#there is auocorr.

Forcasting (predictions through time)

Forcasting for 1962.

coef(mod) %*% c(1,62)
##          [,1]
## [1,] 502.1735
coef(mod2) %*% c(1,62, 62^2)
##          [,1]
## [1,] 538.9268

Example

Steps: - plot data - guess at trend - make some polynomial models - do tests - look at plots - forcast

data("aatemp")
attach(aatemp)
## The following object is masked from airpass:
## 
##     year
head(aatemp)
##   year  temp
## 1 1854 49.15
## 2 1855 46.52
## 3 1871 48.80
## 4 1881 47.95
## 5 1882 47.31
## 6 1883 44.64
#plot
plot(temp ~ year, type="l")

#make models
mod3<-lm(temp~year)
summary(mod3)
## 
## Call:
## lm(formula = temp ~ year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9843 -0.9113 -0.0820  0.9946  3.5343 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 24.005510   7.310781   3.284  0.00136 **
## year         0.012237   0.003768   3.247  0.00153 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared:  0.08536,    Adjusted R-squared:  0.07727 
## F-statistic: 10.55 on 1 and 113 DF,  p-value: 0.001533
mod4<-lm(temp~ year + I(year^2))
summary(mod4)
## 
## Call:
## lm(formula = temp ~ year + I(year^2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0412 -0.9538 -0.0624  0.9959  3.5820 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02  3.837e+02  -0.554    0.580
## year         2.567e-01  3.962e-01   0.648    0.518
## I(year^2)   -6.307e-05  1.022e-04  -0.617    0.539
## 
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared:  0.08846,    Adjusted R-squared:  0.07218 
## F-statistic: 5.434 on 2 and 112 DF,  p-value: 0.005591
mod5<-lm(temp~ year+I(year^2)+I(year^3))
summary(mod5)
## 
## Call:
## lm(formula = temp ~ year + I(year^2) + I(year^3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8557 -0.9646 -0.1552  1.0485  4.1538 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  3.959e+04  1.734e+04   2.283   0.0243 *
## year        -6.159e+01  2.694e+01  -2.286   0.0241 *
## I(year^2)    3.197e-02  1.395e-02   2.291   0.0238 *
## I(year^3)   -5.527e-06  2.407e-06  -2.296   0.0236 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared:  0.1298, Adjusted R-squared:  0.1063 
## F-statistic: 5.518 on 3 and 111 DF,  p-value: 0.001436
par(mfrow=c(2,2))
plot(mod3)

plot(mod4)

plot(mod5)

##cubic model looks great! I will use that one.

#forcasting
coef(mod5) %*% c(1,2002, 2002^2, 2002^3)
##          [,1]
## [1,] 47.50484
new.data<-data.frame(year=2002)
predict(mod5,newdata=new.data)
##        1 
## 47.50484
#way easier when you start getting a big poly model

##CI but dont worry about right now.
new.data<-data.frame(year=2002)
predict(mod5,newdata=new.data,interval = "confidence")
##        fit      lwr      upr
## 1 47.50484 46.45565 48.55402

Check Durbin Watson

library(lmtest)
dwtest(mod5, alternative = "two.sided")
## 
##  Durbin-Watson test
## 
## data:  mod5
## DW = 1.7171, p-value = 0.06928
## alternative hypothesis: true autocorrelation is not 0
#bigger than .05 so no autocorrelation

Conclusion

Looking at 6.1 and 6.2 was a good intro to time series data. We got to use time as a vairable in our models, explored finding the best model using polynomials, and took a look at autocorrelation. Though we did not do anything too different today, it will help us to do more with time series data in the future, opening up lots of different opportunities for us. It will be interesting to learn how to apply some of the things we have already learned in a time series setting.