day 15 ll

Last class we started looking at time series. First we looked into the trends that a time series can have and built up to autocorrilation.

For class we used the airpass dataset to practice.

library(faraway)

## Warning: package 'faraway' was built under R version 3.4.4

data(airpass)
attach(airpass)

A small thing we learned was how to plot a time seriese with lines connecting all the datapoints.

plot(pass ~ year, type = "l") #type = "l" plots them with a line

We used this to determine to what degree polynomial we should make our model. If the graph has more than 1 point of inflection, you should use at least a second degree model.

The graph above shows that we may find a good model that is less than a second degree.

mod <- lm(pass ~ year)
summary(mod)

## 
## Call:
## lm(formula = pass ~ year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -93.858 -30.727  -5.757  24.489 164.999 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1474.771     61.106  -24.14   <2e-16 ***
## year           31.886      1.108   28.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared:  0.8536, Adjusted R-squared:  0.8526 
## F-statistic: 828.2 on 1 and 142 DF,  p-value: < 2.2e-16

plot(mod)

mod2 <- lm(pass ~ year + I(year^2))
summary(mod2)

## 
## Call:
## lm(formula = pass ~ year + I(year^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -100.353  -27.339   -7.442   21.603  146.116 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 1570.5174  1053.9017   1.490  0.13841   
## year         -79.2078    38.4007  -2.063  0.04098 * 
## I(year^2)      1.0092     0.3487   2.894  0.00441 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.91 on 141 degrees of freedom
## Multiple R-squared:  0.8618, Adjusted R-squared:  0.8599 
## F-statistic: 439.8 on 2 and 141 DF,  p-value: < 2.2e-16

plot(mod2)

anova(mod,mod2)

## Analysis of Variance Table
## 
## Model 1: pass ~ year
## Model 2: pass ~ year + I(year^2)
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1    142 301219                                
## 2    141 284328  1     16891 8.3762 0.004407 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this case we use summary to figure out if the model is significant at all and plot to give us a visual of it.

Anova is used to determine if we should use the smaller model, or the larger model.

The small p-value from the anova test shows us that we should use the larger more complicated model.

If we were to use this to predict somehting, we would use the following lines.

coef(mod2) %*% c(1,62,62^2)

##          [,1]
## [1,] 538.9268

The last thing we covered in class was autocorrilation. Autocorrilation is the corrilation of residuals. Positive autocorrilation means that the residuals down swap from positive to negative often, (and vice versa) while negative autocorrilation is when the residuals swap from being negative to possitive. (and vice versa)

One way of detecting this is through the use of durbin watson statistic. To compute this we use the following lines of code.

library(lmtest)

## Warning: package 'lmtest' was built under R version 3.4.4

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 3.4.4

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

dwtest(mod2,alternative="two.sided")

## 
##  Durbin-Watson test
## 
## data:  mod2
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is not 0

And this shows that there is an autocorrelation.

day 15 ll

Nate Persing

April 3, 2018