I will be using Linear Models with R (Faraway), to explore solving a problem where the errors have an issue and robust regression techniques are needed.
The data looks at divorce rates in the USA from 1920 - 1996.
## 'data.frame': 77 obs. of 7 variables:
## $ year : int 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 ...
## $ divorce : num 8 7.2 6.6 7.1 7.2 7.2 7.5 7.8 7.8 8 ...
## $ unemployed: num 5.2 11.7 6.7 2.4 5 3.2 1.8 3.3 4.2 3.2 ...
## $ femlab : num 22.7 22.8 22.9 23 23.1 ...
## $ marriage : num 92 83 79.7 85.2 80.3 79.2 78.7 77 74.1 75.5 ...
## $ birth : num 118 120 111 110 111 ...
## $ military : num 3.22 3.56 2.46 2.21 2.29 ...
## year divorce unemployed femlab
## Min. :1920 Min. : 6.10 Min. : 1.200 Min. :22.70
## 1st Qu.:1939 1st Qu.: 8.70 1st Qu.: 4.200 1st Qu.:27.47
## Median :1958 Median :10.60 Median : 5.600 Median :37.10
## Mean :1958 Mean :13.27 Mean : 7.173 Mean :38.58
## 3rd Qu.:1977 3rd Qu.:20.30 3rd Qu.: 7.500 3rd Qu.:47.80
## Max. :1996 Max. :22.80 Max. :24.900 Max. :59.30
## marriage birth military
## Min. : 49.70 Min. : 65.30 Min. : 1.940
## 1st Qu.: 61.90 1st Qu.: 68.90 1st Qu.: 3.469
## Median : 74.10 Median : 85.90 Median : 9.102
## Mean : 72.97 Mean : 88.89 Mean :12.365
## 3rd Qu.: 80.00 3rd Qu.:107.30 3rd Qu.:14.266
## Max. :118.10 Max. :122.90 Max. :86.641
**The year shows values from 1920 - 1996, that makes up the 77 observations that are part of the dataset.
divorce per 1,000 women aged 15 or more. or more.unemployed tells us the unemployment rate for the year.femlab is the percent of female participation in the labor force 16 and over.marriage marriages per 1,000 unmarried women aged 16.birth another “per 1,000” women aged 15-44.military another “per 1,000” military personnel.The time series also has very clean data, as we can see from the summary there are no NA values and all types are of numeric.
Using the divusa data, fit a regression model with divorce as the response and unemployed, femlab, marriage, birth and military as predictors.
##
## Call:
## lm(formula = divorce ~ unemployed + femlab + marriage + birth +
## military, data = divusa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8611 -0.8916 -0.0496 0.8650 3.8300
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.48784 3.39378 0.733 0.4659
## unemployed -0.11125 0.05592 -1.989 0.0505 .
## femlab 0.38365 0.03059 12.543 < 2e-16 ***
## marriage 0.11867 0.02441 4.861 6.77e-06 ***
## birth -0.12996 0.01560 -8.333 4.03e-12 ***
## military -0.02673 0.01425 -1.876 0.0647 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.65 on 71 degrees of freedom
## Multiple R-squared: 0.9208, Adjusted R-squared: 0.9152
## F-statistic: 165.1 on 5 and 71 DF, p-value: < 2.2e-16
## unemployed femlab marriage birth military
## 2.252888 3.613276 2.864864 2.585485 1.249596
The R-squared is very high and the multi-collinearity checks out as well.
The next thing we will look at is to see if there are any problems with the errors (specifically if they are correlated).
## Generalized least squares fit by maximum likelihood
## Model: divorce ~ unemployed + femlab + marriage + birth + military
## Data: divusa
## AIC BIC logLik
## 179.9523 198.7027 -81.97613
##
## Correlation Structure: AR(1)
## Formula: ~year
## Parameter estimate(s):
## Phi
## 0.9715486
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -7.059682 5.547193 -1.272658 0.2073
## unemployed 0.107643 0.045915 2.344395 0.0219
## femlab 0.312085 0.095151 3.279878 0.0016
## marriage 0.164326 0.022897 7.176766 0.0000
## birth -0.049909 0.022012 -2.267345 0.0264
## military 0.017946 0.014271 1.257544 0.2127
##
## Correlation:
## (Intr) unmply femlab marrig birth
## unemployed -0.420
## femlab -0.802 0.240
## marriage -0.516 0.607 0.307
## birth -0.379 0.041 0.066 -0.094
## military -0.036 0.436 -0.311 0.530 0.128
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.4509327 -0.9760939 -0.6164694 1.1375377 2.1593261
##
## Residual standard error: 2.907665
## Degrees of freedom: 77 total; 71 residual
## Approximate 95% confidence intervals
##
## Correlation structure:
## lower est. upper
## Phi 0.6528097 0.9715486 0.9980192
## attr(,"label")
## [1] "Correlation structure:"
##
## Residual standard error:
## lower est. upper
## 0.7974404 2.9076645 10.6020628
We can see above that the Phi value almost outputs a complete positive relationship at .97! Our confidence interval check also checks that at 95% the AR(1) correlations is significant.
The GLS model also did have a different result of what variables it found significant. The GLS model only found marriage and femlab to be statistically significant.
##
## Durbin-Watson test
##
## data: divorce ~ unemployed + femlab + marriage + birth + military
## DW = 0.29988, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0
We can see further above that the Dubin-Watson statistic confirms the auto correlation.
I would say that the correlation of errors exist because our time series is using year as a period. The errors for divorce rate therefore would be quite similar from year to year with a positive trend, which all in all if you think about it is unfortunately common knowledge and makes sense.