In class Tuesday, we discussed the first part of Chapter 6- Time series. We went over 6.1: Modeling Time Series with a Polynomial Trend, and 6.2: Detecting autocorrelation. #6.1 Polynomial Trend To model time series with a polynomial trend, we use the following equation: \[y_t=TR_t+E_t\] where \(y_t\) is the value at time t, \(TR_t\) is the trend, and \(E_t\) is the error term. We plot the value at time t against time t to see whether or not there is a trend. If the data follows a horizontal line, this indicates that there is no trend, and therefore \(TR_t=\beta_0\) (which serves as the null hypothesis). If there is a positive slope created by the data, that indicates a positive trend, and thus \(\beta_1>0\), whereas a negative slope indicates a negative trend, and therefore \(\beta_1<0\). These are used for linear trends, but we can build on these by using polynomials. For example, the quadratic trend equation is as follows: \(TR_t=\beta_0 + \beta_1t + \beta_2t^2\). We can keep increasing the order of the polynomial, creating the general equation: \(TR_t=\beta_0 + \beta_1t +...+ \beta_pt^p\), where the polynomial is of pth order. If there appears to be a reversal in curvature, then we want to use a trend model where \(p\geq 3\)
If the residuals of a model through time are correlated, it is known as autocorrelation. Positive autocorrelation falls under two cases: first, where a positive residual is likely to be followed by another positive residual, and second, where a negative residual is likely to be followed by another negative residual. Negative autocorrelation also has two scenarios: first, where a positive residual is likely to be followed by a negative residual, and second, when a negative residual is likely to be followed by a positive residual. Autocorrelation is considered to be problematic because it violates the assumption of independence of each point in the data set. ##Detecting Autocorrelation We can detect autocorrelation by plotting residuals through time. We can look at the trend of residuals on the graph to manually find autocorrelation, but we can also use the Durban Watson statistic. In Tuesday’s class, we only focused on first order autocorrelation, so we want to see if \(\epsilon_t\) is related to \(\epsilon_{t-1}\) and \(\epsilon_{t+1}\). We use the following hypotheses for the Durbin Watson test:
\(H_0\): There is no autocorrelation
\(H_A\): There is an autocorrelation (alternatively, you can check specifically for either positive or negative autocorrelation)
For this learning log, I’m going to use the airpass dataset from the faraway package.
library(faraway)
data(airpass)
head(airpass)
## pass year
## 1 112 49.08333
## 2 118 49.16667
## 3 132 49.25000
## 4 129 49.33333
## 5 121 49.41667
## 6 135 49.50000
attach(airpass)
We want to plot the number of passengers against time.
plot(pass~year,data=airpass)
There appears to be a very clear positive trend in the data. Additionally, with an increase in year, the vertical spread seems to be increasing, which indicates heteroscedasticity.
mod<- lm(pass~year)
summary(mod)
##
## Call:
## lm(formula = pass ~ year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -93.858 -30.727 -5.757 24.489 164.999
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1474.771 61.106 -24.14 <2e-16 ***
## year 31.886 1.108 28.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared: 0.8536, Adjusted R-squared: 0.8526
## F-statistic: 828.2 on 1 and 142 DF, p-value: < 2.2e-16
plot(mod)
The coefficient appears to be significant, but the data doesn’t seem to fit the slope a linear line so we want to try adding to the order of polynomials to see if that helps. Now, we want to look at a quadratic model.
mod2<-lm(pass~year+ I(year^2))
summary(mod2)
##
## Call:
## lm(formula = pass ~ year + I(year^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -100.353 -27.339 -7.442 21.603 146.116
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1570.5174 1053.9017 1.490 0.13841
## year -79.2078 38.4007 -2.063 0.04098 *
## I(year^2) 1.0092 0.3487 2.894 0.00441 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.91 on 141 degrees of freedom
## Multiple R-squared: 0.8618, Adjusted R-squared: 0.8599
## F-statistic: 439.8 on 2 and 141 DF, p-value: < 2.2e-16
plot(mod2)
It appears that both year and \(year^2\) are significant, so this model appears to be better. Even though year is barely significant, since \(year^2\) is significant, then all orders below that should be retained. Adding a third term however would not be significant, and thus we would stick with the quadratic model.
With this model, we can predict what the value would be at a time outside the dataset. We chose the year 1962 for our prediction.
coef(mod2) %*% c(1,62,62^2)
## [,1]
## [1,] 538.9268
We predict that there will be about 538,927 passengers in 1962. Now, we can use the Durbin Watson test to test for autocorrelation of the model.
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dwtest(mod2,alternative="two.sided")
##
## Durbin-Watson test
##
## data: mod2
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is not 0
Since the p-value of the Durbin-watson test is less than 2.2e-16, that means that we reject the null hypothesis for the alternative, which states that there is autocorrelation. We can test to see specifically what kind of autocorrelation that this is.
dwtest(mod2)
##
## Durbin-Watson test
##
## data: mod2
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0
It appears that there is positive autocorrelation amongst the dataset, which is not good as it violates the independence assumption of regression.