Today, in class, we discussed the very early concepts of time series analysis and autocorrelation. Generally, a time series is any variable graphed against time and autocorrelation is any one point having an affect on the value of any other point, either later in time or earlier.
We would utilize a time series analysis when we want to know how some variable changes with time. We need to check for autocorrelation for any time series analysis we create.
To illustrate the commands used for time series analyses and testing for autocorrelation, I will go through an example using the data set aatemp from the R package faraway. This data set gives the annual mean temperature in Ann Arbor (in F) for the years 1854 to 2000.
We can start plotting the temperature against time to develop reasonable expectations for the relationship between the two variables. We want connect all data points with lines to follow the trend over time so we’ll add the argument type = “l” to our plot() command.
plot(temp~year, data = aatemp, type = "l")
We see that the data has a slightly upward trend, but we can really not visualize the trend beyond that.
We can then start creating the models. We’ll start with a very simple, linear model.
mod1 <- lm(temp~year, data = aatemp)
We next need to look at the summary for the model to determine whether the relationship between the two variables is significant and then we can look at the diagnostic plots to check whether there is a trend in our residuals.
summary(mod1)
##
## Call:
## lm(formula = temp ~ year, data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9843 -0.9113 -0.0820 0.9946 3.5343
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.005510 7.310781 3.284 0.00136 **
## year 0.012237 0.003768 3.247 0.00153 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared: 0.08536, Adjusted R-squared: 0.07727
## F-statistic: 10.55 on 1 and 113 DF, p-value: 0.001533
plot(mod1)
We can see that the relationship between the two variables is significant, but we see a trend in our residuals on our residual plot. We can try to improve this model by adding a quadratic term. Then, we can check the summary and diagnostic plots for that model.
mod2 <- lm(temp~year + I(year^2), data = aatemp)
summary(mod2)
##
## Call:
## lm(formula = temp ~ year + I(year^2), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0412 -0.9538 -0.0624 0.9959 3.5820
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02 3.837e+02 -0.554 0.580
## year 2.567e-01 3.962e-01 0.648 0.518
## I(year^2) -6.307e-05 1.022e-04 -0.617 0.539
##
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared: 0.08846, Adjusted R-squared: 0.07218
## F-statistic: 5.434 on 2 and 112 DF, p-value: 0.005591
plot(mod2)
We see from our summary that none of our predictors are significant any longer, but we can also see that our residual plot is still not adaquate. We will take this analysis one step further by adding a cubic term to our model.
mod3 <- lm(temp~year + I(year^2) + I(year^3), data = aatemp)
summary(mod3)
##
## Call:
## lm(formula = temp ~ year + I(year^2) + I(year^3), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8557 -0.9646 -0.1552 1.0485 4.1538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.959e+04 1.734e+04 2.283 0.0243 *
## year -6.159e+01 2.694e+01 -2.286 0.0241 *
## I(year^2) 3.197e-02 1.395e-02 2.291 0.0238 *
## I(year^3) -5.527e-06 2.407e-06 -2.296 0.0236 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared: 0.1298, Adjusted R-squared: 0.1063
## F-statistic: 5.518 on 3 and 111 DF, p-value: 0.001436
plot(mod3)
Now, all three of our predictors have returned to being significant and our residual plot looks relatively good. It seems that we have found the trend that our data follows.
Now, we want to check whether our model has autocorrelation. We wil perform a Durbin-Watson test because that is the most rigorous way to detect autocorrelation, although sometimes it can be detected visually. To do this, we’ll use dwtest(model), found in the R package lmtest, and if we add the argument alternative = “greater” we can check for positive autocorrelation. If alternative = “less”, the test is testing for negative autocorrelation.
dwtest(mod3, alternative = "greater")
##
## Durbin-Watson test
##
## data: mod3
## DW = 1.7171, p-value = 0.03464
## alternative hypothesis: true autocorrelation is greater than 0
Since our p-value is less than .05, we will conclude that our model has positive autocorrelation, meaning a positive residual is more likely to follow a positive residual and a negative residual is more likely to follow a negative residual.
This topic begins our exploration into time series analysis, a large part of our course. It is similar to regression because it involves the same commands in R, but it is also very different in the type of data we are looking at, time series rather than cross-sectional, and the answers we are looking to uncover. However, this is just the beginning and as we develop a deeper understanding, the relationship between the two types of modeling may become more evident.
However, regardless of its relationship with other modeling techniques, there is no doubt that time series analysis is important in modeling the world’s data.