Using the divusa data, fit a regression model with divorce as the response and unemployed, femlab, marriage, birth, and military as predictors.

require(faraway)
## Loading required package: faraway
data(divusa, package="faraway")
head(divusa)
##   year divorce unemployed femlab marriage birth military
## 1 1920     8.0        5.2  22.70     92.0 117.9   3.2247
## 2 1921     7.2       11.7  22.79     83.0 119.8   3.5614
## 3 1922     6.6        6.7  22.88     79.7 111.2   2.4553
## 4 1923     7.1        2.4  22.97     85.2 110.5   2.2065
## 5 1924     7.2        5.0  23.06     80.3 110.9   2.2889
## 6 1925     7.2        3.2  23.15     79.2 106.6   2.1735
lmod<-lm(divorce~unemployed+femlab+marriage+birth+military, data=divusa)
summary(lmod)
## 
## Call:
## lm(formula = divorce ~ unemployed + femlab + marriage + birth + 
##     military, data = divusa)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8611 -0.8916 -0.0496  0.8650  3.8300 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.48784    3.39378   0.733   0.4659    
## unemployed  -0.11125    0.05592  -1.989   0.0505 .  
## femlab       0.38365    0.03059  12.543  < 2e-16 ***
## marriage     0.11867    0.02441   4.861 6.77e-06 ***
## birth       -0.12996    0.01560  -8.333 4.03e-12 ***
## military    -0.02673    0.01425  -1.876   0.0647 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.65 on 71 degrees of freedom
## Multiple R-squared:  0.9208, Adjusted R-squared:  0.9152 
## F-statistic: 165.1 on 5 and 71 DF,  p-value: < 2.2e-16
  1. Make two graphical checks for correlated errors. What do you conclude?
plot(residuals(lmod) ~ year, na.omit(divusa), ylab="Residuals")
abline(h=0)

n <- length(residuals(lmod))
plot(tail(residuals(lmod),n-1) ~ head(residuals(lmod),n-1), xlab=
expression(hat(epsilon)[i]),ylab=expression(hat(epsilon)[i+1]))
abline(h=0,v=0,col=grey(0.75))

cor(residuals(lmod)[-1],residuals(lmod)[-length(residuals(lmod))])
## [1] 0.8469792

There is a long seuence of points above and below the line. The graphs suggest a positive serial correlation and a linear trend between successive residuals. There is a correlation of 0.847 between successive residuals.

(b)Allow for serial correlation with an AR(1) model for the errors (Hint: use maximum likelihood to estimate the parameters in the GLS fit by using the method=“ML” argument). What is the estimated correlation and is it significant? Does the model change which variables are found to be significant?

require(nlme)
## Loading required package: nlme
glmod <- gls(divorce~unemployed+femlab+marriage+birth+military, data=divusa, correlation=corAR1(form=~year), method="ML")
summary(glmod)
## Generalized least squares fit by maximum likelihood
##   Model: divorce ~ unemployed + femlab + marriage + birth + military 
##   Data: divusa 
##        AIC      BIC    logLik
##   179.9523 198.7027 -81.97613
## 
## Correlation Structure: AR(1)
##  Formula: ~year 
##  Parameter estimate(s):
##       Phi 
## 0.9715486 
## 
## Coefficients:
##                 Value Std.Error   t-value p-value
## (Intercept) -7.059682  5.547193 -1.272658  0.2073
## unemployed   0.107643  0.045915  2.344395  0.0219
## femlab       0.312085  0.095151  3.279878  0.0016
## marriage     0.164326  0.022897  7.176766  0.0000
## birth       -0.049909  0.022012 -2.267345  0.0264
## military     0.017946  0.014271  1.257544  0.2127
## 
##  Correlation: 
##            (Intr) unmply femlab marrig birth 
## unemployed -0.420                            
## femlab     -0.802  0.240                     
## marriage   -0.516  0.607  0.307              
## birth      -0.379  0.041  0.066 -0.094       
## military   -0.036  0.436 -0.311  0.530  0.128
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.4509327 -0.9760939 -0.6164694  1.1375377  2.1593261 
## 
## Residual standard error: 2.907665 
## Degrees of freedom: 77 total; 71 residual
intervals(glmod, which="var-cov")
## Approximate 95% confidence intervals
## 
##  Correlation structure:
##         lower      est.     upper
## Phi 0.6528097 0.9715486 0.9980192
## attr(,"label")
## [1] "Correlation structure:"
## 
##  Residual standard error:
##      lower       est.      upper 
##  0.7974404  2.9076645 10.6020628

The estimatted correlation (phi) is 0.97154. This is very high. Also the confidence interval is (0.6528097, 0.9980192) which does not contain 0. therefore AR(1) correlation is significant.

  1. Speculate why there might be correlation in the errors.

As the divorce rates can be similar to the next year’s divorce rate, the trend in the error also can be same. This can cause the auto correlation which can be further verified from the Durbin–Watson statistic.

require(lmtest)
## Loading required package: lmtest
## Warning: package 'lmtest' was built under R version 3.6.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.6.2
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
 dwtest(divorce~unemployed+femlab+marriage+birth+military, data=divusa)
## 
##  Durbin-Watson test
## 
## data:  divorce ~ unemployed + femlab + marriage + birth + military
## DW = 0.29988, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0