w7_discussion.knit

Part 1: Multiple Linear Regression for Fertility

Model Strategy

Given the small sample size (n = 47), I start with the full model and use AIC-based stepwise selection to balance model fit and parsimony.

data(swiss)

m_full <- lm(Fertility ~ ., data = swiss)
m_best <- stepAIC(m_full, direction = "both", trace = FALSE)

formula(m_best)

#> Fertility ~ Agriculture + Education + Catholic + Infant.Mortality

summary(m_best)

#> 
#> Call:
#> lm(formula = Fertility ~ Agriculture + Education + Catholic + 
#>     Infant.Mortality, data = swiss)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -14.6765  -6.0522   0.7514   3.1664  16.1422 
#> 
#> Coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)      62.10131    9.60489   6.466 8.49e-08 ***
#> Agriculture      -0.15462    0.06819  -2.267  0.02857 *  
#> Education        -0.98026    0.14814  -6.617 5.14e-08 ***
#> Catholic          0.12467    0.02889   4.315 9.50e-05 ***
#> Infant.Mortality  1.07844    0.38187   2.824  0.00722 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 7.168 on 42 degrees of freedom
#> Multiple R-squared:  0.6993, Adjusted R-squared:  0.6707 
#> F-statistic: 24.42 on 4 and 42 DF,  p-value: 1.717e-10

Multicollinearity Check

Because education-related variables are often correlated, I examine VIF values.

vif(m_best)

#>      Agriculture        Education         Catholic Infant.Mortality 
#>         2.147153         1.816361         1.299916         1.107528

Values below 5 suggest multicollinearity is not severe in the selected model.

Interpretation

Key findings from the final model:

Education (-0.98): A one-unit increase in education is associated with nearly a one-point decrease in fertility, holding other variables constant. This is substantial and consistent with demographic transition theory, where increased education leads to delayed family formation and smaller family sizes.
Catholic (0.12): Provinces with higher proportions of Catholics tend to have higher fertility. This aligns with historical patterns where more religious regions maintained higher birth rates.
Infant Mortality (1.08): Higher infant mortality is associated with higher fertility.
Agriculture (-0.15): Greater agricultural employment slightly lowers fertility in the presence of other controls, suggesting modernization effects are nuanced once education and religion are considered.

summary(m_best)$adj.r.squared

#> [1] 0.670714

Adjusted R² ≈ 0.67 indicates the model explains a meaningful portion of fertility variation given the historical context.

Diagnostic Plots

par(mfrow = c(2, 2))
plot(m_best)

par(mfrow = c(1, 1))

Residual diagnostics suggest:

Reasonable linearity.
Mild but acceptable deviation from normality.
No extreme leverage concerns.

Given the small sample size, influential observations should be interpreted cautiously.

Part 2: Logistic Regression for Fertility > 70

Define high fertility as Fertility > 70.

swiss$fertility_high <- as.integer(swiss$Fertility > 70)
table(swiss$fertility_high)

#> 
#>  0  1 
#> 23 24

The classes are nearly balanced (23 vs 24), which improves model stability.

Model Selection

g_full <- glm(fertility_high ~ Agriculture + Examination + Education + Catholic + Infant.Mortality,
              data = swiss, family = binomial)

g_best <- stepAIC(g_full, direction = "both", trace = FALSE)

summary(g_best)

#> 
#> Call:
#> glm(formula = fertility_high ~ Agriculture + Examination + Education + 
#>     Catholic + Infant.Mortality, family = binomial, data = swiss)
#> 
#> Coefficients:
#>                  Estimate Std. Error z value Pr(>|z|)  
#> (Intercept)       4.82826    5.25607   0.919   0.3583  
#> Agriculture      -0.09615    0.04011  -2.397   0.0165 *
#> Examination      -0.32116    0.13844  -2.320   0.0203 *
#> Education        -0.12078    0.08610  -1.403   0.1607  
#> Catholic          0.02078    0.01376   1.509   0.1312  
#> Infant.Mortality  0.29078    0.21051   1.381   0.1672  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 65.135  on 46  degrees of freedom
#> Residual deviance: 32.887  on 41  degrees of freedom
#> AIC: 44.887
#> 
#> Number of Fisher Scoring iterations: 6

Odds Ratios

b <- coef(g_best)
se <- sqrt(diag(vcov(g_best)))

data.frame(
  term = names(b),
  odds_ratio = exp(b),
  ci_low = exp(b - 1.96 * se),
  ci_high = exp(b + 1.96 * se)
)

Interpretation

Agriculture (OR ≈ 0.91): Higher agricultural employment decreases the odds of high fertility.
Examination (OR ≈ 0.73): Provinces with stronger education indicators are significantly less likely to exhibit high fertility.
Other predictors show weaker statistical evidence but remain directionally informative.

These results reinforce the modernization narrative: increased educational attainment corresponds to lower fertility likelihood.

Predictive Performance* (In-Sample)

p_hat <- predict(g_best, type = "response")
pred <- as.integer(p_hat >= 0.5)

cm <- table(Actual = swiss$fertility_high, Predicted = pred)
cm

#>       Predicted
#> Actual  0  1
#>      0 18  5
#>      1  2 22

sum(diag(cm)) / sum(cm)

#> [1] 0.8510638

Accuracy ≈ 85%.

Because this is in-sample performance with only 47 observations, cross-validation would provide a more honest assessment in practice.

Final Discussion

Both models consistently show education as the strongest determinant of fertility differences. Religious composition and infant mortality also contribute meaningfully.

Key takeaways:

Education plays a central role in fertility reduction.
Religious composition remains historically influential.
Logistic and linear models tell a consistent story.
Small sample size limits causal claims.