In class Tuesday, we discussed the first part of Chapter 6- Time series. We went over 6.1: Modeling Time Series with a Polynomial Trend, and 6.2: Detecting autocorrelation. #6.1 Polynomial Trend To model time series with a polynomial trend, we use the following equation: \[y_t=TR_t+E_t\] where \(y_t\) is the value at time t, \(TR_t\) is the trend, and \(E_t\) is the error term. We plot the value at time t against time t to see whether or not there is a trend. If the data follows a horizontal line, this indicates that there is no trend, and therefore \(TR_t=\beta_0\) (which serves as the null hypothesis). If there is a positive slope created by the data, that indicates a positive trend, and thus \(\beta_1>0\), whereas a negative slope indicates a negative trend, and therefore \(\beta_1<0\). These are used for linear trends, but we can build on these by using polynomials. For example, the quadratic trend equation is as follows: \(TR_t=\beta_0 + \beta_1t + \beta_2t^2\). We can keep increasing the order of the polynomial, creating the general equation: \(TR_t=\beta_0 + \beta_1t +...+ \beta_pt^p\), where the polynomial is of pth order. If there appears to be a reversal in curvature, then we want to use a trend model where \(p\geq 3\)

Example

For this learning log, I’m going to use the airpass dataset from the faraway package.

library(faraway)
data(airpass)
head(airpass)

##   pass     year
## 1  112 49.08333
## 2  118 49.16667
## 3  132 49.25000
## 4  129 49.33333
## 5  121 49.41667
## 6  135 49.50000

attach(airpass)

We want to plot the number of passengers against time.

plot(pass~year,data=airpass)

There appears to be a very clear positive trend in the data. Additionally, with an increase in year, the vertical spread seems to be increasing, which indicates heteroscedasticity.

mod<- lm(pass~year)
summary(mod)

## 
## Call:
## lm(formula = pass ~ year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -93.858 -30.727  -5.757  24.489 164.999 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1474.771     61.106  -24.14   <2e-16 ***
## year           31.886      1.108   28.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared:  0.8536, Adjusted R-squared:  0.8526 
## F-statistic: 828.2 on 1 and 142 DF,  p-value: < 2.2e-16

plot(mod)

The coefficient appears to be significant, but the data doesn’t seem to fit the slope a linear line so we want to try adding to the order of polynomials to see if that helps. Now, we want to look at a quadratic model.

mod2<-lm(pass~year+ I(year^2))
summary(mod2)

## 
## Call:
## lm(formula = pass ~ year + I(year^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -100.353  -27.339   -7.442   21.603  146.116 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 1570.5174  1053.9017   1.490  0.13841   
## year         -79.2078    38.4007  -2.063  0.04098 * 
## I(year^2)      1.0092     0.3487   2.894  0.00441 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.91 on 141 degrees of freedom
## Multiple R-squared:  0.8618, Adjusted R-squared:  0.8599 
## F-statistic: 439.8 on 2 and 141 DF,  p-value: < 2.2e-16

plot(mod2)

It appears that both year and \(year^2\) are significant, so this model appears to be better. Even though year is barely significant, since \(year^2\) is significant, then all orders below that should be retained. Adding a third term however would not be significant, and thus we would stick with the quadratic model.

With this model, we can predict what the value would be at a time outside the dataset. We chose the year 1962 for our prediction.

coef(mod2) %*% c(1,62,62^2)

##          [,1]
## [1,] 538.9268

We predict that there will be about 538,927 passengers in 1962. Now, we can use the Durbin Watson test to test for autocorrelation of the model.

library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

dwtest(mod2,alternative="two.sided")

## 
##  Durbin-Watson test
## 
## data:  mod2
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is not 0

Since the p-value of the Durbin-watson test is less than 2.2e-16, that means that we reject the null hypothesis for the alternative, which states that there is autocorrelation. We can test to see specifically what kind of autocorrelation that this is.

dwtest(mod2)

## 
##  Durbin-Watson test
## 
## data:  mod2
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0

It appears that there is positive autocorrelation amongst the dataset, which is not good as it violates the independence assumption of regression.

Learning Log 16

Kristen Rutschke

4/5/2018

6.2 Autocorrelation

Example