We learned that time series can be described as: \(y_t = TR_t + e_t\) where
\(y_t\) is the value of our time series at time t
\(TR_t\) is the trend
\(e_t\) is our error term, and it is iid with N(0,\(\sigma^{2}\))
If there is no trend, \(TR_t = {\beta_0}\)
If there is a linear trend, \(TR_t = {\beta_0}+{\beta_1}t\)
All the way up to the pth order polynomial, \(TR_t = {\beta_0}+{\beta_1}t + {\beta_2}t^{2} + ... + {\beta_p}t^{p}\)
We will choose which trend will be the best to model the data in the airpass data set.
library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data("airpass")
names(airpass)
## [1] "pass" "year"
attach(airpass)
plot(pass ~ year, type = "l")
We can rule out no trend, so that leaves linear and quadratic trends to choose from. We will use a linear model to help decide.
mymod <- lm(pass ~ year)
summary(mymod)
##
## Call:
## lm(formula = pass ~ year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -93.858 -30.727 -5.757 24.489 164.999
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1474.771 61.106 -24.14 <2e-16 ***
## year 31.886 1.108 28.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared: 0.8536, Adjusted R-squared: 0.8526
## F-statistic: 828.2 on 1 and 142 DF, p-value: < 2.2e-16
With the small p-value that year has, that means there is indeed a linear relationship between the year and number of air passengers. We can also look at the quadratic model to see what p-value the squared term produces.
quadmod <- lm(pass ~ year + I(year^2))
summary(quadmod)
##
## Call:
## lm(formula = pass ~ year + I(year^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -100.353 -27.339 -7.442 21.603 146.116
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1570.5174 1053.9017 1.490 0.13841
## year -79.2078 38.4007 -2.063 0.04098 *
## I(year^2) 1.0092 0.3487 2.894 0.00441 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.91 on 141 degrees of freedom
## Multiple R-squared: 0.8618, Adjusted R-squared: 0.8599
## F-statistic: 439.8 on 2 and 141 DF, p-value: < 2.2e-16
Here we see that the square term also has a significant p-value, therefore it will be important to incorporate that term into our model as well. We can also look at the plots of each model to visualize the differences between them.
plot(mymod)
plot(quadmod)
By comparing the residual plot of each model, we can see that the quadratic trend line is closer to zero throughout the fitted values. But it should be noted that both residual plots show heavy heteroscedasticity.
Now we will look at the aatemp data set within the faraway package.
library(faraway)
data("aatemp")
names(aatemp)
## [1] "year" "temp"
Let’s look at the plot of the two variables.
plot(temp ~ year, data = aatemp, type = "l")
There is certainly a trend present, but it’s unclear at this point which is best. We will look at linear quadratic, and cubic trend to see which produces favorable p-values.
linmod <- lm(temp ~ year, data = aatemp)
summary(linmod)
##
## Call:
## lm(formula = temp ~ year, data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9843 -0.9113 -0.0820 0.9946 3.5343
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.005510 7.310781 3.284 0.00136 **
## year 0.012237 0.003768 3.247 0.00153 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared: 0.08536, Adjusted R-squared: 0.07727
## F-statistic: 10.55 on 1 and 113 DF, p-value: 0.001533
A p-value of 0.00153 is small, but let’s see if the other models can improve upon this.
qmod <- lm(temp ~ year + I(year^2), data = aatemp)
summary(qmod)
##
## Call:
## lm(formula = temp ~ year + I(year^2), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0412 -0.9538 -0.0624 0.9959 3.5820
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02 3.837e+02 -0.554 0.580
## year 2.567e-01 3.962e-01 0.648 0.518
## I(year^2) -6.307e-05 1.022e-04 -0.617 0.539
##
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared: 0.08846, Adjusted R-squared: 0.07218
## F-statistic: 5.434 on 2 and 112 DF, p-value: 0.005591
The quadratic was not an improvement, we would not use this instead of the linear trend. We will now look at the cubic trend to see whether or not that helps.
tmod <- lm (temp ~ year + I(year^2) + I(year^3), data = aatemp)
summary(tmod)
##
## Call:
## lm(formula = temp ~ year + I(year^2) + I(year^3), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8557 -0.9646 -0.1552 1.0485 4.1538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.959e+04 1.734e+04 2.283 0.0243 *
## year -6.159e+01 2.694e+01 -2.286 0.0241 *
## I(year^2) 3.197e-02 1.395e-02 2.291 0.0238 *
## I(year^3) -5.527e-06 2.407e-06 -2.296 0.0236 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared: 0.1298, Adjusted R-squared: 0.1063
## F-statistic: 5.518 on 3 and 111 DF, p-value: 0.001436
The cubic appears to provide favorable results compared to the quadratic. We will now look at the residual plots of the linear and cubic trends to see which is better.
plot(linmod)
plot(tmod)
Comparing the two residual plots, it becomes apparent that the cubic trendline is flatter than that of the linear trend. We will conclude that is this case, the cubic trend appears to be tho most accurate.
We can now go back and look at the auto correlation of our previously made models. Our null hypothesis is that the error term is not auto correlated, and the alternative hypothesis is that there is either positive or negative auto correlation. If there is positive auto correlation, positive residuals are likely to be followed by another positive residual, and negative residuals are likely to be followed by negative residuals. Then with negative auto correlation, a positive residual is likely to be followed by a negative residual, and a negative residual is likely to be followed be a positive residual. First we will look at the linear model from the airpass data.
library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dwtest(mymod, alternative = "two.sided")
##
## Durbin-Watson test
##
## data: mymod
## DW = 0.53719, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is not 0
With a small p-value, we can reject the null hypothesis in favor the the alternative, concluding that the error terms are positively auto correlated. Now let’s look at the quadratic.
dwtest(quadmod)
##
## Durbin-Watson test
##
## data: quadmod
## DW = 0.56939, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0
Comparing the quadratic model’s p-value to that of the linear we see that they are the same, meaning that the error terms in this model are positively auto correlated as well. Notice that in the first example we used the two sided command to test for both negative and positive auto correlation. The default of dwtest is to test only for positive auto correlation.