Today we talked about 6.1 and 6.2. This is the first time we talked about time-series data.
6.1 deals with polynomial trend. We used similar methods to what we did before with polynomoial trend, but we will now be using year as the response variable.
6.2 is all about autocorrelation. Autocorrealtion is bad because it violate the independence condition.
Today we are focusing on 1st order autocorrelation (only looking at the value to the left and right or each error).
There are 2 ways we will use to detect Auto correlation - 1. plot residuals throught time - 2. Durbin Watson Statistic – Durbin Watson Test Ho: no auto correlation Ha: there is auto correlation, OR Ha: there is neg. auto correaltion, OR Ha: there is pos. auto correlation
library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data(airpass)
head(airpass)
## pass year
## 1 112 49.08333
## 2 118 49.16667
## 3 132 49.25000
## 4 129 49.33333
## 5 121 49.41667
## 6 135 49.50000
attach(airpass)
plot(pass ~ year, type="l")
#can see much less variability earlier in time
#type "l" to be connected by lines.
#after plotting, lets make some models
mod<-lm(pass~year)
summary(mod)
##
## Call:
## lm(formula = pass ~ year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -93.858 -30.727 -5.757 24.489 164.999
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1474.771 61.106 -24.14 <2e-16 ***
## year 31.886 1.108 28.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared: 0.8536, Adjusted R-squared: 0.8526
## F-statistic: 828.2 on 1 and 142 DF, p-value: < 2.2e-16
plot(mod)
#definitley evidence of heteroscedasdicity. This corresponds to what we see in the previous plot.
mod2<-lm(pass~ year+I(year^2))
summary(mod2)
##
## Call:
## lm(formula = pass ~ year + I(year^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -100.353 -27.339 -7.442 21.603 146.116
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1570.5174 1053.9017 1.490 0.13841
## year -79.2078 38.4007 -2.063 0.04098 *
## I(year^2) 1.0092 0.3487 2.894 0.00441 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.91 on 141 degrees of freedom
## Multiple R-squared: 0.8618, Adjusted R-squared: 0.8599
## F-statistic: 439.8 on 2 and 141 DF, p-value: < 2.2e-16
#year becomes less sig. but still keep because year squared is significant!!
#we will use quadratic moving foward because squared term is significant.
#base decision off of test stat and p-val before residual plots
plot(mod2)
#Using the code below we were able to graph our models oveer our orininal plot.
plot(pass ~ year, type="l")
lines(mod$fitted.values ~ year, col="blue", data=airpass)
lines(mod2$fitted.values ~ year, col="red", data = airpass)
#Durbin Watson test
library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dwtest(mod2, alternative = "two.sided")
##
## Durbin-Watson test
##
## data: mod2
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is not 0
#"two.sided" for the Ha: there is autocorrelation
#there is auocorr.
Forcasting for 1962.
coef(mod) %*% c(1,62)
## [,1]
## [1,] 502.1735
coef(mod2) %*% c(1,62, 62^2)
## [,1]
## [1,] 538.9268
Steps: - plot data - guess at trend - make some polynomial models - do tests - look at plots - forcast
data("aatemp")
attach(aatemp)
## The following object is masked from airpass:
##
## year
head(aatemp)
## year temp
## 1 1854 49.15
## 2 1855 46.52
## 3 1871 48.80
## 4 1881 47.95
## 5 1882 47.31
## 6 1883 44.64
#plot
plot(temp ~ year, type="l")
#make models
mod3<-lm(temp~year)
summary(mod3)
##
## Call:
## lm(formula = temp ~ year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9843 -0.9113 -0.0820 0.9946 3.5343
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.005510 7.310781 3.284 0.00136 **
## year 0.012237 0.003768 3.247 0.00153 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared: 0.08536, Adjusted R-squared: 0.07727
## F-statistic: 10.55 on 1 and 113 DF, p-value: 0.001533
mod4<-lm(temp~ year + I(year^2))
summary(mod4)
##
## Call:
## lm(formula = temp ~ year + I(year^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0412 -0.9538 -0.0624 0.9959 3.5820
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02 3.837e+02 -0.554 0.580
## year 2.567e-01 3.962e-01 0.648 0.518
## I(year^2) -6.307e-05 1.022e-04 -0.617 0.539
##
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared: 0.08846, Adjusted R-squared: 0.07218
## F-statistic: 5.434 on 2 and 112 DF, p-value: 0.005591
mod5<-lm(temp~ year+I(year^2)+I(year^3))
summary(mod5)
##
## Call:
## lm(formula = temp ~ year + I(year^2) + I(year^3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8557 -0.9646 -0.1552 1.0485 4.1538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.959e+04 1.734e+04 2.283 0.0243 *
## year -6.159e+01 2.694e+01 -2.286 0.0241 *
## I(year^2) 3.197e-02 1.395e-02 2.291 0.0238 *
## I(year^3) -5.527e-06 2.407e-06 -2.296 0.0236 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared: 0.1298, Adjusted R-squared: 0.1063
## F-statistic: 5.518 on 3 and 111 DF, p-value: 0.001436
par(mfrow=c(2,2))
plot(mod3)
plot(mod4)
plot(mod5)
##cubic model looks great! I will use that one.
#forcasting
coef(mod5) %*% c(1,2002, 2002^2, 2002^3)
## [,1]
## [1,] 47.50484
new.data<-data.frame(year=2002)
predict(mod5,newdata=new.data)
## 1
## 47.50484
#way easier when you start getting a big poly model
##CI but dont worry about right now.
new.data<-data.frame(year=2002)
predict(mod5,newdata=new.data,interval = "confidence")
## fit lwr upr
## 1 47.50484 46.45565 48.55402
Check Durbin Watson
library(lmtest)
dwtest(mod5, alternative = "two.sided")
##
## Durbin-Watson test
##
## data: mod5
## DW = 1.7171, p-value = 0.06928
## alternative hypothesis: true autocorrelation is not 0
#bigger than .05 so no autocorrelation
Looking at 6.1 and 6.2 was a good intro to time series data. We got to use time as a vairable in our models, explored finding the best model using polynomials, and took a look at autocorrelation. Though we did not do anything too different today, it will help us to do more with time series data in the future, opening up lots of different opportunities for us. It will be interesting to learn how to apply some of the things we have already learned in a time series setting.