Today in class we got started on sections 6.1 and 6.2, which dealt with Time Series data and how we model it, as well as autocorrelation. Though we learned what a time series was early in the semester, this was our first time in this course actually learning how to do analysis on it.
One of the first things we learned in class today was how we can describe a time series. The general formula for it is:
yt=TRt+Et where:
yt:The value of our time series at time t
TRt:Trend
Et:Error term
Additionally, we learned about the different trend possibilities for a time series.
No Trend –> TRt = B0 (Flat)
Linear Trend –> TRt=B0+B1(t)
Quadratic Trend –> TRt=B0+B1(t)+B1(t2)
After going through section 6.1, we jumped to the next section and learned about autocorrelation. If our residuals are correlated through time, that means we have autocorrelation. There are two different types of autocorrelation.
Positive Autocor: Pos. residual is likely to be followed by another pos. residual (and vice versa)
Negative Autocor: Pos. residual is likely to be followed by a neg. residual (and vice versa)
Our error term is only related to the time period before it and the time period after it.
We can use the Durbin Watson stat to check our model for autocorrelation. First Install package(“lmtest”) and then use the command dwtest(mod). A small p-value indicates autocorrelation.
Testing for positive autocor:
H0: error terms do not have autocor.
Ha: errors terms do ahve autocor.
Therefore if the t-stat is small we will reject the null hypothesis and determine that they do ahve autocor. In other words, a small p-value indicates autocor.
Testing for negative autocor:
H0: error terms do not have autocor.
Ha: errors terms do ahve autocor.
Therefore if the 4 - d is small we will reject the null hypothesis and determine that they do have autocor. If 4 - d is large we do not reject, meaning there is autocor.
The command will always just give us the p-value so we don’t have to worry too much about how we test for it. Let’s look at an example to see what we learned.
To start, we will plot our data using our usual plot command. We will also specify type = “l” to connect the data points a produce a trend line.
data(aatemp, package = "faraway")
plot(temp~year, type = "l", data = aatemp)
After this we will create our usual linear model.
temp.mod1 <- lm(temp~year, data = aatemp)
summary(temp.mod1)
##
## Call:
## lm(formula = temp ~ year, data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9843 -0.9113 -0.0820 0.9946 3.5343
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.005510 7.310781 3.284 0.00136 **
## year 0.012237 0.003768 3.247 0.00153 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared: 0.08536, Adjusted R-squared: 0.07727
## F-statistic: 10.55 on 1 and 113 DF, p-value: 0.001533
Here the t-test indicates that the variable yaer is a significant predictor. Next, it is important to check the residual plots to check for trends.
plot(temp.mod1)
As can be seen in the first graph, there is a slight trend to our model that we will try to fix by adding another term.
temp.mod2 <- lm(temp ~ year + I(year^2), data = aatemp)
summary(temp.mod2)
##
## Call:
## lm(formula = temp ~ year + I(year^2), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0412 -0.9538 -0.0624 0.9959 3.5820
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02 3.837e+02 -0.554 0.580
## year 2.567e-01 3.962e-01 0.648 0.518
## I(year^2) -6.307e-05 1.022e-04 -0.617 0.539
##
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared: 0.08846, Adjusted R-squared: 0.07218
## F-statistic: 5.434 on 2 and 112 DF, p-value: 0.005591
Here we added a quadratic term, however, neither term for this model is significant. Looking at our residual plots, we can see that the trend has not been corrected.
plot(temp.mod2)
Since this didn’t work, we can try adding one more cubic term to the model.
temp.mod3 <- lm(temp ~ year + I(year^2) + I(year^3), data = aatemp)
summary(temp.mod3)
##
## Call:
## lm(formula = temp ~ year + I(year^2) + I(year^3), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8557 -0.9646 -0.1552 1.0485 4.1538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.959e+04 1.734e+04 2.283 0.0243 *
## year -6.159e+01 2.694e+01 -2.286 0.0241 *
## I(year^2) 3.197e-02 1.395e-02 2.291 0.0238 *
## I(year^3) -5.527e-06 2.407e-06 -2.296 0.0236 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared: 0.1298, Adjusted R-squared: 0.1063
## F-statistic: 5.518 on 3 and 111 DF, p-value: 0.001436
Adding the third term in shows that now all three terms are significant based on the t-test. Let’s take a look at the residuals to see if they look any better.
plot(temp.mod3)
Looking at our first plot here we can see that the trend line is almost perfectly flat and the errors are randomly distributed. Based on this I think this model looks great and we have produced a model that accurately models our time series data.
One last thing we need to check is autocorrelation. We do this using the dwtest command, and if our p-value is less than .05 that will tell us that we have autocorrelation.
library("lmtest")
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dwtest(temp.mod3)
##
## Durbin-Watson test
##
## data: temp.mod3
## DW = 1.7171, p-value = 0.03464
## alternative hypothesis: true autocorrelation is greater than 0
Our dwtest gave us a p-value of .03464, indicating that this model does in fact have autocorrelation.