Learning Log 16

Jeff Courneya

April 3, 2018

Time Series Trends

Intro

We learned that time series can be described as: \(y_t = TR_t + e_t\) where

\(y_t\) is the value of our time series at time t

\(TR_t\) is the trend

\(e_t\) is our error term, and it is iid with N(0,\(\sigma^{2}\))

If there is no trend, \(TR_t = {\beta_0}\)

If there is a linear trend, \(TR_t = {\beta_0}+{\beta_1}t\)

All the way up to the pth order polynomial, \(TR_t = {\beta_0}+{\beta_1}t + {\beta_2}t^{2} + ... + {\beta_p}t^{p}\)

Practice 1

We will choose which trend will be the best to model the data in the airpass data set.

library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data("airpass")
names(airpass)
## [1] "pass" "year"
attach(airpass)
plot(pass ~ year, type = "l")

We can rule out no trend, so that leaves linear and quadratic trends to choose from. We will use a linear model to help decide.

mymod <- lm(pass ~ year)
summary(mymod)
## 
## Call:
## lm(formula = pass ~ year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -93.858 -30.727  -5.757  24.489 164.999 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1474.771     61.106  -24.14   <2e-16 ***
## year           31.886      1.108   28.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared:  0.8536, Adjusted R-squared:  0.8526 
## F-statistic: 828.2 on 1 and 142 DF,  p-value: < 2.2e-16

With the small p-value that year has, that means there is indeed a linear relationship between the year and number of air passengers. We can also look at the quadratic model to see what p-value the squared term produces.

quadmod <- lm(pass ~ year + I(year^2))
summary(quadmod)
## 
## Call:
## lm(formula = pass ~ year + I(year^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -100.353  -27.339   -7.442   21.603  146.116 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 1570.5174  1053.9017   1.490  0.13841   
## year         -79.2078    38.4007  -2.063  0.04098 * 
## I(year^2)      1.0092     0.3487   2.894  0.00441 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.91 on 141 degrees of freedom
## Multiple R-squared:  0.8618, Adjusted R-squared:  0.8599 
## F-statistic: 439.8 on 2 and 141 DF,  p-value: < 2.2e-16

Here we see that the square term also has a significant p-value, therefore it will be important to incorporate that term into our model as well. We can also look at the plots of each model to visualize the differences between them.

plot(mymod)

plot(quadmod)

By comparing the residual plot of each model, we can see that the quadratic trend line is closer to zero throughout the fitted values. But it should be noted that both residual plots show heavy heteroscedasticity.

Practice 2

Now we will look at the aatemp data set within the faraway package.

library(faraway)
data("aatemp")
names(aatemp)
## [1] "year" "temp"

Let’s look at the plot of the two variables.

plot(temp ~ year, data = aatemp, type = "l")

There is certainly a trend present, but it’s unclear at this point which is best. We will look at linear quadratic, and cubic trend to see which produces favorable p-values.

linmod <- lm(temp ~ year, data = aatemp)
summary(linmod)
## 
## Call:
## lm(formula = temp ~ year, data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9843 -0.9113 -0.0820  0.9946  3.5343 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 24.005510   7.310781   3.284  0.00136 **
## year         0.012237   0.003768   3.247  0.00153 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared:  0.08536,    Adjusted R-squared:  0.07727 
## F-statistic: 10.55 on 1 and 113 DF,  p-value: 0.001533

A p-value of 0.00153 is small, but let’s see if the other models can improve upon this.

qmod <- lm(temp ~ year + I(year^2), data = aatemp)
summary(qmod)
## 
## Call:
## lm(formula = temp ~ year + I(year^2), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0412 -0.9538 -0.0624  0.9959  3.5820 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02  3.837e+02  -0.554    0.580
## year         2.567e-01  3.962e-01   0.648    0.518
## I(year^2)   -6.307e-05  1.022e-04  -0.617    0.539
## 
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared:  0.08846,    Adjusted R-squared:  0.07218 
## F-statistic: 5.434 on 2 and 112 DF,  p-value: 0.005591

The quadratic was not an improvement, we would not use this instead of the linear trend. We will now look at the cubic trend to see whether or not that helps.

tmod <- lm (temp ~ year + I(year^2) + I(year^3), data = aatemp)
summary(tmod)
## 
## Call:
## lm(formula = temp ~ year + I(year^2) + I(year^3), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8557 -0.9646 -0.1552  1.0485  4.1538 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  3.959e+04  1.734e+04   2.283   0.0243 *
## year        -6.159e+01  2.694e+01  -2.286   0.0241 *
## I(year^2)    3.197e-02  1.395e-02   2.291   0.0238 *
## I(year^3)   -5.527e-06  2.407e-06  -2.296   0.0236 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared:  0.1298, Adjusted R-squared:  0.1063 
## F-statistic: 5.518 on 3 and 111 DF,  p-value: 0.001436

The cubic appears to provide favorable results compared to the quadratic. We will now look at the residual plots of the linear and cubic trends to see which is better.

plot(linmod)

plot(tmod)

Comparing the two residual plots, it becomes apparent that the cubic trendline is flatter than that of the linear trend. We will conclude that is this case, the cubic trend appears to be tho most accurate.

Durbin Watson Test

We can now go back and look at the auto correlation of our previously made models. Our null hypothesis is that the error term is not auto correlated, and the alternative hypothesis is that there is either positive or negative auto correlation. If there is positive auto correlation, positive residuals are likely to be followed by another positive residual, and negative residuals are likely to be followed by negative residuals. Then with negative auto correlation, a positive residual is likely to be followed by a negative residual, and a negative residual is likely to be followed be a positive residual. First we will look at the linear model from the airpass data.

library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
dwtest(mymod, alternative = "two.sided")
## 
##  Durbin-Watson test
## 
## data:  mymod
## DW = 0.53719, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is not 0

With a small p-value, we can reject the null hypothesis in favor the the alternative, concluding that the error terms are positively auto correlated. Now let’s look at the quadratic.

dwtest(quadmod)
## 
##  Durbin-Watson test
## 
## data:  quadmod
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0

Comparing the quadratic model’s p-value to that of the linear we see that they are the same, meaning that the error terms in this model are positively auto correlated as well. Notice that in the first example we used the two sided command to test for both negative and positive auto correlation. The default of dwtest is to test only for positive auto correlation.