Today in class we introduced time series analiysis. A time series is a model where a unit of time is the predictor. We also talked about autocorrelation and some ways to compute it.
We should look at a plot with time as a predictor.
library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
attach(aatemp)
plot(temp~year, type = "l")
The graph seems to be increasing slightly and does not have a constant variability. We can look at the diagnostic plots.
tempmod<- lm(temp~year)
summary(tempmod)
##
## Call:
## lm(formula = temp ~ year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9843 -0.9113 -0.0820 0.9946 3.5343
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.005510 7.310781 3.284 0.00136 **
## year 0.012237 0.003768 3.247 0.00153 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared: 0.08536, Adjusted R-squared: 0.07727
## F-statistic: 10.55 on 1 and 113 DF, p-value: 0.001533
plot(tempmod)
lets try and add a quadratic term to improve the residuals.
tempmod2<- lm(temp~ poly(year,2))
summary(tempmod2)
##
## Call:
## lm(formula = temp ~ poly(year, 2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0412 -0.9538 -0.0624 0.9959 3.5820
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.7426 0.1371 348.218 < 2e-16 ***
## poly(year, 2)1 4.7616 1.4703 3.239 0.00158 **
## poly(year, 2)2 -0.9071 1.4703 -0.617 0.53851
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared: 0.08846, Adjusted R-squared: 0.07218
## F-statistic: 5.434 on 2 and 112 DF, p-value: 0.005591
plot(tempmod2)
The quadratic is not significant, we should try cubic.
tempmod3<- lm(temp~ poly(year,3))
summary(tempmod3)
##
## Call:
## lm(formula = temp ~ poly(year, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8557 -0.9646 -0.1552 1.0485 4.1538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.7426 0.1346 354.796 <2e-16 ***
## poly(year, 3)1 4.7616 1.4430 3.300 0.0013 **
## poly(year, 3)2 -0.9071 1.4430 -0.629 0.5309
## poly(year, 3)3 -3.3132 1.4430 -2.296 0.0236 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared: 0.1298, Adjusted R-squared: 0.1063
## F-statistic: 5.518 on 3 and 111 DF, p-value: 0.001436
plot(tempmod3)
Now we have significant terms and are happier with our model. We can look at the autocorrelation to see if our residuals are random. We will us the Durbin Watson test.
library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.4
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dwtest(tempmod3,alternative = "greater")
##
## Durbin-Watson test
##
## data: tempmod3
## DW = 1.7171, p-value = 0.03464
## alternative hypothesis: true autocorrelation is greater than 0
We will reject the hypothesis that there is not autocorrelation.
This topic is similar to other linear regression but our predictor is time instead of another variable. We can use therse to forcast throughout time but our predictions may not be completely accurate.