Last class we started looking at time series. First we looked into the trends that a time series can have and built up to autocorrilation.
For class we used the airpass dataset to practice.
library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data(airpass)
attach(airpass)
A small thing we learned was how to plot a time seriese with lines connecting all the datapoints.
plot(pass ~ year, type = "l") #type = "l" plots them with a line
We used this to determine to what degree polynomial we should make our model. If the graph has more than 1 point of inflection, you should use at least a second degree model.
The graph above shows that we may find a good model that is less than a second degree.
mod <- lm(pass ~ year)
summary(mod)
##
## Call:
## lm(formula = pass ~ year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -93.858 -30.727 -5.757 24.489 164.999
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1474.771 61.106 -24.14 <2e-16 ***
## year 31.886 1.108 28.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared: 0.8536, Adjusted R-squared: 0.8526
## F-statistic: 828.2 on 1 and 142 DF, p-value: < 2.2e-16
plot(mod)
mod2 <- lm(pass ~ year + I(year^2))
summary(mod2)
##
## Call:
## lm(formula = pass ~ year + I(year^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -100.353 -27.339 -7.442 21.603 146.116
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1570.5174 1053.9017 1.490 0.13841
## year -79.2078 38.4007 -2.063 0.04098 *
## I(year^2) 1.0092 0.3487 2.894 0.00441 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.91 on 141 degrees of freedom
## Multiple R-squared: 0.8618, Adjusted R-squared: 0.8599
## F-statistic: 439.8 on 2 and 141 DF, p-value: < 2.2e-16
plot(mod2)
anova(mod,mod2)
## Analysis of Variance Table
##
## Model 1: pass ~ year
## Model 2: pass ~ year + I(year^2)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 142 301219
## 2 141 284328 1 16891 8.3762 0.004407 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this case we use summary to figure out if the model is significant at all and plot to give us a visual of it.
Anova is used to determine if we should use the smaller model, or the larger model.
The small p-value from the anova test shows us that we should use the larger more complicated model.
If we were to use this to predict somehting, we would use the following lines.
coef(mod2) %*% c(1,62,62^2)
## [,1]
## [1,] 538.9268
The last thing we covered in class was autocorrilation. Autocorrilation is the corrilation of residuals. Positive autocorrilation means that the residuals down swap from positive to negative often, (and vice versa) while negative autocorrilation is when the residuals swap from being negative to possitive. (and vice versa)
One way of detecting this is through the use of durbin watson statistic. To compute this we use the following lines of code.
library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.4
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dwtest(mod2,alternative="two.sided")
##
## Durbin-Watson test
##
## data: mod2
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is not 0
And this shows that there is an autocorrelation.