For this class period we discussed 6.1 and 6.2 on modeling trend using polynomials and detecting auto correlation using residual plots.

With in time series data, no trend is modeled by \(\hat{\beta_0}\), linear trend is modeled by \(\hat{\beta_0} + \hat{\beta_1} x_i\) and quadradic trend is modeled by \(\hat{\beta_0} + \hat{\beta_1} x_i + \hat{\beta_2} x_i^2\) and so on until the pth term (which is used when the data shows a reversal in curvature) would be modeled by \(\hat{\beta_0} + \hat{\beta_1} x_i + \hat{\beta_2} x_i^2 + ... + \hat{\beta_p} x_i^p\) We now do an example fitting the correct model for the data.

install.packages(faraway)

library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data(divusa)
attach(divusa)
head(divusa)
##   year divorce unemployed femlab marriage birth military
## 1 1920     8.0        5.2  22.70     92.0 117.9   3.2247
## 2 1921     7.2       11.7  22.79     83.0 119.8   3.5614
## 3 1922     6.6        6.7  22.88     79.7 111.2   2.4553
## 4 1923     7.1        2.4  22.97     85.2 110.5   2.2065
## 5 1924     7.2        5.0  23.06     80.3 110.9   2.2889
## 6 1925     7.2        3.2  23.15     79.2 106.6   2.1735
names(divusa)
## [1] "year"       "divorce"    "unemployed" "femlab"     "marriage"  
## [6] "birth"      "military"
plot(birth ~ year, type = "l")

mymod <- lm(birth ~ year)
plot(mymod)

summary(mymod)
## 
## Call:
## lm(formula = birth ~ year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.159 -11.129  -4.086  11.232  33.509 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1073.45274  161.22254   6.658 4.03e-09 ***
## year          -0.50284    0.08234  -6.107 4.14e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.06 on 75 degrees of freedom
## Multiple R-squared:  0.3321, Adjusted R-squared:  0.3232 
## F-statistic:  37.3 on 1 and 75 DF,  p-value: 4.143e-08

Obviously, a linera model does not fit, so now we try a quadratic model.

year2 <- (year^2)
mymod2 <- lm(birth ~ year+ year2, data = divusa)
summary(mymod2)
## 
## Call:
## lm(formula = birth ~ year + year2, data = divusa)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.2952 -11.8939  -0.7406  12.5154  26.3901 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.428e+04  1.464e+04  -3.709 0.000400 ***
## year         5.604e+01  1.495e+01   3.748 0.000351 ***
## year2       -1.444e-02  3.818e-03  -3.782 0.000313 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.8 on 74 degrees of freedom
## Multiple R-squared:  0.4403, Adjusted R-squared:  0.4252 
## F-statistic: 29.11 on 2 and 74 DF,  p-value: 4.718e-10
plot(mymod2)

While this is a little better, moving on to a cubic model might continue to increase the fit of the model.

year3 <- (year^3)
mymod3 <- lm(birth ~ year+ year2+ year3, data = divusa)
summary(mymod3)
## 
## Call:
## lm(formula = birth ~ year + year2 + year3, data = divusa)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.382 -12.208  -1.928  12.042  26.581 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  1.557e+06  1.467e+06   1.062    0.292
## year        -2.414e+03  2.248e+03  -1.074    0.286
## year2        1.247e+00  1.148e+00   1.086    0.281
## year3       -2.148e-04  1.954e-04  -1.099    0.275
## 
## Residual standard error: 14.78 on 73 degrees of freedom
## Multiple R-squared:  0.4494, Adjusted R-squared:  0.4268 
## F-statistic: 19.86 on 3 and 73 DF,  p-value: 1.625e-09
plot(mymod3)

Since moving to a cubic model makes all the Betas insignificant, I would probably stick to the quadratic model in this case.

The next topic of discussion was on autocorrelation, which means that the errors are correlated through time. Positive autocorrelation is when a positive error term is more likely to be followed by another positive residual. (Also a negative to a negative) Negative autocorrelation is when a positive residual is more likely to be followed by a negative. (Also negative to a positive). We look at first order correlation in this situation using a Durbin Watson test.

library(lmtest)
## Warning: package 'lmtest' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.4
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
dwtest(mymod2)
## 
##  Durbin-Watson test
## 
## data:  mymod2
## DW = 0.07405, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0

If the value of d is small, we have found positive autocorrelation and if 4-d is small, we have found negative autocorrelation. At an alpha level of .05, we have failed to find any autocorrelation in this case.