6.1 Modeling the Trend using Polynomials

Today in class we got started on sections 6.1 and 6.2, which dealt with Time Series data and how we model it, as well as autocorrelation. Though we learned what a time series was early in the semester, this was our first time in this course actually learning how to do analysis on it.

One of the first things we learned in class today was how we can describe a time series. The general formula for it is:

yt=TRt+Et where:

yt:The value of our time series at time t

TRt:Trend

Et:Error term

Additionally, we learned about the different trend possibilities for a time series.

6.2 Detecting Auto Cor. Using Resid. Plots or DurbinWatson

After going through section 6.1, we jumped to the next section and learned about autocorrelation. If our residuals are correlated through time, that means we have autocorrelation. There are two different types of autocorrelation.

First Order Correlation

Our error term is only related to the time period before it and the time period after it.

Durbin Watson Stat

We can use the Durbin Watson stat to check our model for autocorrelation. First Install package(“lmtest”) and then use the command dwtest(mod). A small p-value indicates autocorrelation.

Testing for positive autocor:

  • H0: error terms do not have autocor.

  • Ha: errors terms do ahve autocor.

Therefore if the t-stat is small we will reject the null hypothesis and determine that they do ahve autocor. In other words, a small p-value indicates autocor.

Testing for negative autocor:

  • H0: error terms do not have autocor.

  • Ha: errors terms do ahve autocor.

Therefore if the 4 - d is small we will reject the null hypothesis and determine that they do have autocor. If 4 - d is large we do not reject, meaning there is autocor.

The command will always just give us the p-value so we don’t have to worry too much about how we test for it. Let’s look at an example to see what we learned.

Example

To start, we will plot our data using our usual plot command. We will also specify type = “l” to connect the data points a produce a trend line.

data(aatemp, package = "faraway")
plot(temp~year, type = "l", data = aatemp)

After this we will create our usual linear model.

temp.mod1 <- lm(temp~year, data = aatemp)
summary(temp.mod1)
## 
## Call:
## lm(formula = temp ~ year, data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9843 -0.9113 -0.0820  0.9946  3.5343 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 24.005510   7.310781   3.284  0.00136 **
## year         0.012237   0.003768   3.247  0.00153 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared:  0.08536,    Adjusted R-squared:  0.07727 
## F-statistic: 10.55 on 1 and 113 DF,  p-value: 0.001533

Here the t-test indicates that the variable yaer is a significant predictor. Next, it is important to check the residual plots to check for trends.

plot(temp.mod1)

As can be seen in the first graph, there is a slight trend to our model that we will try to fix by adding another term.

temp.mod2 <- lm(temp ~ year + I(year^2), data = aatemp)
summary(temp.mod2)
## 
## Call:
## lm(formula = temp ~ year + I(year^2), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0412 -0.9538 -0.0624  0.9959  3.5820 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02  3.837e+02  -0.554    0.580
## year         2.567e-01  3.962e-01   0.648    0.518
## I(year^2)   -6.307e-05  1.022e-04  -0.617    0.539
## 
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared:  0.08846,    Adjusted R-squared:  0.07218 
## F-statistic: 5.434 on 2 and 112 DF,  p-value: 0.005591

Here we added a quadratic term, however, neither term for this model is significant. Looking at our residual plots, we can see that the trend has not been corrected.

plot(temp.mod2)

Since this didn’t work, we can try adding one more cubic term to the model.

temp.mod3 <- lm(temp ~ year + I(year^2) + I(year^3), data = aatemp)
summary(temp.mod3)
## 
## Call:
## lm(formula = temp ~ year + I(year^2) + I(year^3), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8557 -0.9646 -0.1552  1.0485  4.1538 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  3.959e+04  1.734e+04   2.283   0.0243 *
## year        -6.159e+01  2.694e+01  -2.286   0.0241 *
## I(year^2)    3.197e-02  1.395e-02   2.291   0.0238 *
## I(year^3)   -5.527e-06  2.407e-06  -2.296   0.0236 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared:  0.1298, Adjusted R-squared:  0.1063 
## F-statistic: 5.518 on 3 and 111 DF,  p-value: 0.001436

Adding the third term in shows that now all three terms are significant based on the t-test. Let’s take a look at the residuals to see if they look any better.

plot(temp.mod3)

Looking at our first plot here we can see that the trend line is almost perfectly flat and the errors are randomly distributed. Based on this I think this model looks great and we have produced a model that accurately models our time series data.

One last thing we need to check is autocorrelation. We do this using the dwtest command, and if our p-value is less than .05 that will tell us that we have autocorrelation.

library("lmtest")
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
dwtest(temp.mod3)
## 
##  Durbin-Watson test
## 
## data:  temp.mod3
## DW = 1.7171, p-value = 0.03464
## alternative hypothesis: true autocorrelation is greater than 0

Our dwtest gave us a p-value of .03464, indicating that this model does in fact have autocorrelation.