Today in class we talked about Modeling Trends and Autocorrelation.
When Modeling Trends, we are talking about time series data. The format is: Yt = T*Rt + Et. The Yt is the value of our time series at time t. TRt is the trend. Et is the error term. The error term is independent and identically distributed, with a mean of 0 and variance of sigma2.
There are 4 options for trends:
We can learn which trend to use when we plot the data. If all the data is around a horizontal line, then there is no trend. If the points are gathered around an increasing or decreasing line, then it is linear. If there is a curve to it, then quadratic. Lastly, if our data shows a reversal in curvature, then we can try a p-th order polynomial.
Autocorrelation when the residuals are correlated through time.
First order correlation is what we will be working with today. What it means by first order is that we only go one unit away from the original. For example, the temperature for today is related/correlated with the temperature yesterday and the temperature tomorrow.
We will use the data set aatemp to practice these methods.
library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data("aatemp")
names(aatemp)
## [1] "year" "temp"
We can run the plot to see if which know which type of trend will work best.
plot(temp ~ year, data = aatemp, type = "l")
Our data looks like it is slightly increasing, so we will construct a linear model first.
linmod <- lm(temp ~ year, data = aatemp)
summary(linmod)
##
## Call:
## lm(formula = temp ~ year, data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9843 -0.9113 -0.0820 0.9946 3.5343
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.005510 7.310781 3.284 0.00136 **
## year 0.012237 0.003768 3.247 0.00153 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared: 0.08536, Adjusted R-squared: 0.07727
## F-statistic: 10.55 on 1 and 113 DF, p-value: 0.001533
Looking at the summary function, our year variable is significant. According to pvalues, our model works. We will check the diagnostic plots to double check.
plot(linmod)
Looking at the first one, our line is slightly curved. Not too bad, but we can do a quadratic function to make sure that isn’t a better fit.
quadmod <- lm(temp ~ year + I(year^2), data = aatemp)
summary(quadmod)
##
## Call:
## lm(formula = temp ~ year + I(year^2), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0412 -0.9538 -0.0624 0.9959 3.5820
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02 3.837e+02 -0.554 0.580
## year 2.567e-01 3.962e-01 0.648 0.518
## I(year^2) -6.307e-05 1.022e-04 -0.617 0.539
##
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared: 0.08846, Adjusted R-squared: 0.07218
## F-statistic: 5.434 on 2 and 112 DF, p-value: 0.005591
Our pvalues here are not significant, so we should stick with the linear model.
Now that we have our chosen model, we can check for autocollinearity.
NOTE: autocollinearity can be used for all models, to save time we will just run it on the model we are using.
library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dwtest(linmod)
##
## Durbin-Watson test
##
## data: linmod
## DW = 1.6177, p-value = 0.01524
## alternative hypothesis: true autocorrelation is greater than 0
Unfortunately, our p-value is small, which means we reject the null hypothesis (no autocorrelation) in favor of our alternative hypothesis (there is autocorrelation).