Using the divusa data, fit a regression model with divorce as the response and unemployed, femlab, marriage, birth, and military as predictors.
require(faraway)
## Loading required package: faraway
data(divusa, package="faraway")
head(divusa)
## year divorce unemployed femlab marriage birth military
## 1 1920 8.0 5.2 22.70 92.0 117.9 3.2247
## 2 1921 7.2 11.7 22.79 83.0 119.8 3.5614
## 3 1922 6.6 6.7 22.88 79.7 111.2 2.4553
## 4 1923 7.1 2.4 22.97 85.2 110.5 2.2065
## 5 1924 7.2 5.0 23.06 80.3 110.9 2.2889
## 6 1925 7.2 3.2 23.15 79.2 106.6 2.1735
lmod<-lm(divorce~unemployed+femlab+marriage+birth+military, data=divusa)
summary(lmod)
##
## Call:
## lm(formula = divorce ~ unemployed + femlab + marriage + birth +
## military, data = divusa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8611 -0.8916 -0.0496 0.8650 3.8300
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.48784 3.39378 0.733 0.4659
## unemployed -0.11125 0.05592 -1.989 0.0505 .
## femlab 0.38365 0.03059 12.543 < 2e-16 ***
## marriage 0.11867 0.02441 4.861 6.77e-06 ***
## birth -0.12996 0.01560 -8.333 4.03e-12 ***
## military -0.02673 0.01425 -1.876 0.0647 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.65 on 71 degrees of freedom
## Multiple R-squared: 0.9208, Adjusted R-squared: 0.9152
## F-statistic: 165.1 on 5 and 71 DF, p-value: < 2.2e-16
plot(residuals(lmod) ~ year, na.omit(divusa), ylab="Residuals")
abline(h=0)
n <- length(residuals(lmod))
plot(tail(residuals(lmod),n-1) ~ head(residuals(lmod),n-1), xlab=
expression(hat(epsilon)[i]),ylab=expression(hat(epsilon)[i+1]))
abline(h=0,v=0,col=grey(0.75))
cor(residuals(lmod)[-1],residuals(lmod)[-length(residuals(lmod))])
## [1] 0.8469792
There is a long seuence of points above and below the line. The graphs suggest a positive serial correlation and a linear trend between successive residuals. There is a correlation of 0.847 between successive residuals.
(b)Allow for serial correlation with an AR(1) model for the errors (Hint: use maximum likelihood to estimate the parameters in the GLS fit by using the method=“ML” argument). What is the estimated correlation and is it significant? Does the model change which variables are found to be significant?
require(nlme)
## Loading required package: nlme
glmod <- gls(divorce~unemployed+femlab+marriage+birth+military, data=divusa, correlation=corAR1(form=~year), method="ML")
summary(glmod)
## Generalized least squares fit by maximum likelihood
## Model: divorce ~ unemployed + femlab + marriage + birth + military
## Data: divusa
## AIC BIC logLik
## 179.9523 198.7027 -81.97613
##
## Correlation Structure: AR(1)
## Formula: ~year
## Parameter estimate(s):
## Phi
## 0.9715486
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -7.059682 5.547193 -1.272658 0.2073
## unemployed 0.107643 0.045915 2.344395 0.0219
## femlab 0.312085 0.095151 3.279878 0.0016
## marriage 0.164326 0.022897 7.176766 0.0000
## birth -0.049909 0.022012 -2.267345 0.0264
## military 0.017946 0.014271 1.257544 0.2127
##
## Correlation:
## (Intr) unmply femlab marrig birth
## unemployed -0.420
## femlab -0.802 0.240
## marriage -0.516 0.607 0.307
## birth -0.379 0.041 0.066 -0.094
## military -0.036 0.436 -0.311 0.530 0.128
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.4509327 -0.9760939 -0.6164694 1.1375377 2.1593261
##
## Residual standard error: 2.907665
## Degrees of freedom: 77 total; 71 residual
intervals(glmod, which="var-cov")
## Approximate 95% confidence intervals
##
## Correlation structure:
## lower est. upper
## Phi 0.6528097 0.9715486 0.9980192
## attr(,"label")
## [1] "Correlation structure:"
##
## Residual standard error:
## lower est. upper
## 0.7974404 2.9076645 10.6020628
The estimatted correlation (phi) is 0.97154. This is very high. Also the confidence interval is (0.6528097, 0.9980192) which does not contain 0. therefore AR(1) correlation is significant.
As the divorce rates can be similar to the next year’s divorce rate, the trend in the error also can be same. This can cause the auto correlation which can be further verified from the Durbin–Watson statistic.
require(lmtest)
## Loading required package: lmtest
## Warning: package 'lmtest' was built under R version 3.6.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.6.2
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dwtest(divorce~unemployed+femlab+marriage+birth+military, data=divusa)
##
## Durbin-Watson test
##
## data: divorce ~ unemployed + femlab + marriage + birth + military
## DW = 0.29988, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0