Dataset divusa is part of faraway package. It contains divorce rates data collected in the USA from 1920-1996. Dataset has 77 observations on seven variables.

(a) The regression model using unemployed, femlab, marriage, birth and military as predictors and divorce as a response variable.

library(faraway)
library(dplyr)
library(ggplot2)
library(GGally)

#Load data
divusa.df <- divusa %>% select(divorce,unemployed,femlab,marriage,birth,military)

#Linear model
divusa.lm <- lm(divorce ~ unemployed + femlab + marriage + birth + military, data = divusa.df)

sumary(divusa.lm)
##              Estimate Std. Error t value  Pr(>|t|)
## (Intercept)  2.487845   3.393779  0.7331   0.46594
## unemployed  -0.111252   0.055925 -1.9893   0.05052
## femlab       0.383649   0.030587 12.5430 < 2.2e-16
## marriage     0.118674   0.024414  4.8609 6.772e-06
## birth       -0.129959   0.015595 -8.3334 4.027e-12
## military    -0.026734   0.014247 -1.8764   0.06471
## 
## n = 77, p = 6, Residual SE = 1.65042, R-Squared = 0.92
summary(divusa.lm)
## 
## Call:
## lm(formula = divorce ~ unemployed + femlab + marriage + birth + 
##     military, data = divusa.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8611 -0.8916 -0.0496  0.8650  3.8300 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.48784    3.39378   0.733   0.4659    
## unemployed  -0.11125    0.05592  -1.989   0.0505 .  
## femlab       0.38365    0.03059  12.543  < 2e-16 ***
## marriage     0.11867    0.02441   4.861 6.77e-06 ***
## birth       -0.12996    0.01560  -8.333 4.03e-12 ***
## military    -0.02673    0.01425  -1.876   0.0647 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.65 on 71 degrees of freedom
## Multiple R-squared:  0.9208, Adjusted R-squared:  0.9152 
## F-statistic: 165.1 on 5 and 71 DF,  p-value: < 2.2e-16

I used two functions summary and sumary to display output of the linear model. sumary function is part of faraway package and displays minimal information. The difference is summary function indicates if the variable is contributing to the model with * or . next to p-value.

Output interpretation,

(b) Variance inflation factors (VIFs), explains estimated coefficients are inflated due to collinearity between variables.

#ggpairs(divusa.df)

my_fn <- function(data, mapping){
  p <- ggplot(data = data, mapping = mapping) + 
    geom_point() + 
    geom_smooth(method=lm, color="blue")
  p
}

g = ggpairs(divusa.df, lower = list(continuous = my_fn))
g

vif(divusa.lm)
## unemployed     femlab   marriage      birth   military 
##   2.252888   3.613276   2.864864   2.585485   1.249596

(c) Model without marriage variable.

#Load data
divusa.df1 <- divusa %>% select(divorce,unemployed,femlab,birth,military)

#Linear model
divusa.lm1 <- lm(divorce ~ ., data = divusa.df1)

summary(divusa.lm1)
## 
## Call:
## lm(formula = divorce ~ ., data = divusa.df1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6312 -0.9141 -0.3110  0.5808  7.8938 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.82713    2.82584   4.893 5.87e-06 ***
## unemployed  -0.21265    0.05949  -3.575 0.000631 ***
## femlab       0.29859    0.02876  10.382 5.91e-16 ***
## birth       -0.11698    0.01761  -6.641 5.04e-09 ***
## military    -0.01249    0.01598  -0.782 0.437048    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.892 on 72 degrees of freedom
## Multiple R-squared:  0.8945, Adjusted R-squared:  0.8886 
## F-statistic: 152.6 on 4 and 72 DF,  p-value: < 2.2e-16
round(cor(divusa.df1),2)
##            divorce unemployed femlab birth military
## divorce       1.00      -0.21   0.91 -0.72     0.02
## unemployed   -0.21       1.00  -0.26 -0.31    -0.40
## femlab        0.91      -0.26   1.00 -0.60     0.05
## birth        -0.72      -0.31  -0.60  1.00     0.14
## military      0.02      -0.40   0.05  0.14     1.00
vif(divusa.lm1)
## unemployed     femlab      birth   military 
##   1.939450   2.430627   2.509711   1.196751

Model without marriage variable.

#Load data
divusa.df2 <- divusa %>% select(divorce,unemployed,femlab,marriage,military)

#Linear model
divusa.lm2 <- lm(divorce ~ ., data = divusa.df2)

summary(divusa.lm2)
## 
## Call:
## lm(formula = divorce ~ ., data = divusa.df2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5080 -1.8613 -0.2785  1.5483  5.3526 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -13.17511    3.94663  -3.338  0.00134 ** 
## unemployed    0.10809    0.06892   1.568  0.12115    
## femlab        0.51252    0.03686  13.906  < 2e-16 ***
## marriage      0.08384    0.03359   2.496  0.01486 *  
## military     -0.01805    0.01985  -0.909  0.36612    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.305 on 72 degrees of freedom
## Multiple R-squared:  0.8434, Adjusted R-squared:  0.8347 
## F-statistic: 96.92 on 4 and 72 DF,  p-value: < 2.2e-16
round(cor(divusa.df2),2)
##            divorce unemployed femlab marriage military
## divorce       1.00      -0.21   0.91    -0.53     0.02
## unemployed   -0.21       1.00  -0.26    -0.27    -0.40
## femlab        0.91      -0.26   1.00    -0.65     0.05
## marriage     -0.53      -0.27  -0.65     1.00     0.26
## military      0.02      -0.40   0.05     0.26     1.00
vif(divusa.lm2)
## unemployed     femlab   marriage   military 
##   1.753836   2.689578   2.780902   1.242909

Conclusion: Removal of variables has not improved the model. In fact, it has an adverse effect on parameters \(R^2\), Residual standard error(\(S\)) and intercept(\(\beta_0\)).