Generalized Linear Models (GLMs; Nelder & Wedderburn, 1972) extend ordinary least squares regression to outcome variables whose conditional distribution belongs to the exponential family. A GLM is defined by three components:
The model is estimated via maximum likelihood, and inference relies on asymptotic normality of the MLE. Parameters are tested with Wald statistics (\(z = \hat\beta / \text{SE}\)) or likelihood ratio tests (LRTs), both asymptotically \(\chi^2\).
| Family | Canonical link | Use case |
|---|---|---|
| Gaussian | Identity | Continuous, symmetric outcome |
| Binomial | Logit | Binary / proportion outcome |
| Poisson | Log | Non-negative integer counts |
| Negative Binomial | Log | Overdispersed counts |
| Gamma | Inverse / Log | Positive skewed continuous |
Medical insurance charges are right-skewed and strictly positive. A log transformation induces approximate normality, making a Gaussian GLM with identity link on \(\log(\text{charges})\) appropriate. Alternatively, a Gamma GLM with log link directly models the original scale without requiring a transformation.
p1 <- ggplot(df_insurance, aes(x=charges)) +
geom_histogram(bins=50, fill="#2166ac", colour="white") +
theme_bw() + labs(title="Raw charges (right-skewed)", x="Charges (USD)")
p2 <- ggplot(df_insurance, aes(x=log_charges)) +
geom_histogram(bins=50, fill="#4dac26", colour="white") +
theme_bw() + labs(title="Log-transformed charges (approx. normal)", x="log(Charges)")
plot_multiplot(p1, p2, cols=2)
## [[1]]
We fit a saturated model including all main effects and the
theoretically motivated smoker × bmi interaction (smoking
amplifies the effect of obesity on health costs).
m_gauss_null <- glm(log_charges ~ 1,
family=gaussian(link="identity"),
data=df_insurance)
m_gauss_main <- glm(log_charges ~ age + sex + bmi + children + smoker + region,
family=gaussian(link="identity"),
data=df_insurance)
m_gauss_full <- glm(log_charges ~ age + sex + bmi + children + smoker + region +
smoker:bmi,
family=gaussian(link="identity"),
data=df_insurance)
summary(m_gauss_full)
## ## Call: ## glm(formula = log_charges ~ age + sex + bmi + children + smoker + ## region + smoker:bmi, family = gaussian(link = "identity"), ## data = df_insurance) ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.3373697 0.0766733 95.697 < 2e-16 *** ## age 0.0347953 0.0008429 41.281 < 2e-16 *** ## sexmale -0.0870645 0.0236073 -3.688 0.000235 *** ## bmi 0.0034060 0.0022676 1.502 0.133336 ## children 0.1031486 0.0097594 10.569 < 2e-16 *** ## smokeryes 0.1564189 0.1459995 1.071 0.284199 ## regionnorthwest -0.0711306 0.0337354 -2.108 0.035176 * ## regionsoutheast -0.1627269 0.0339029 -4.800 1.77e-06 *** ## regionsouthwest -0.1375125 0.0338557 -4.062 5.16e-05 *** ## bmi:smokeryes 0.0455744 0.0046633 9.773 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for gaussian family taken to be 0.1842831) ## ## Null deviance: 1130.47 on 1337 degrees of freedom ## Residual deviance: 244.73 on 1328 degrees of freedom ## AIC: 1546.1 ## ## Number of Fisher Scoring iterations: 2
Anova(m_gauss_full, type=2)
## Analysis of Deviance Table (Type II tests) ## ## Response: log_charges ## LR Chisq Df Pr(>Chisq) ## age 1704.11 1 < 2.2e-16 *** ## sex 13.60 1 0.000226 *** ## bmi 43.61 1 4.001e-11 *** ## children 111.71 1 < 2.2e-16 *** ## smoker 2822.41 1 < 2.2e-16 *** ## region 27.42 3 4.805e-06 *** ## bmi:smoker 95.51 1 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lrt_interaction <- anova(m_gauss_main, m_gauss_full, test="LRT")
lrt_interaction
## Analysis of Deviance Table ## ## Model 1: log_charges ~ age + sex + bmi + children + smoker + region ## Model 2: log_charges ~ age + sex + bmi + children + smoker + region + ## smoker:bmi ## Resid. Df Resid. Dev Df Deviance Pr(>Chi) ## 1 1329 262.33 ## 2 1328 244.73 1 17.601 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cat(sprintf("\nLRT: χ²(%d) = %.3f, p = %.4f\n",
lrt_interaction$Df[2],
lrt_interaction$Deviance[2],
lrt_interaction$`Pr(>Chi)`[2]))
## ## LRT: χ²(1) = 17.601, p = 0.0000
cat("Null deviance :", round(m_gauss_null$deviance, 2), "\n")
## Null deviance : 1130.47
cat("Residual deviance :", round(m_gauss_full$deviance, 2), "\n")
## Residual deviance : 244.73
cat("McFadden pseudo-R²:", round(1 - m_gauss_full$deviance/m_gauss_null$deviance, 4), "\n")
## McFadden pseudo-R²: 0.7835
cat("AIC :", round(AIC(m_gauss_full), 2), "\n")
## AIC : 1546.11
VIF > 5 indicates problematic collinearity; VIF > 10 is severe.
vif(m_gauss_full)
## GVIF Df GVIF^(1/(2*Df)) ## age 1.017506 1 1.008715 ## sex 1.011478 1 1.005723 ## bmi 1.387358 1 1.177862 ## children 1.004195 1 1.002095 ## smoker 25.203045 1 5.020263 ## region 1.099757 3 1.015975 ## bmi:smoker 25.533081 1 5.053027
par(mfrow=c(2,3))
plot(m_gauss_full, which=1:6)
par(mfrow=c(1,1))
nd <- expand.grid(
age=mean(df_insurance$age),
sex="female",
bmi=seq(18, 53, length.out=100),
children=0,
smoker=c("no","yes"),
region="northeast"
)
nd$fit <- predict(m_gauss_full, newdata=nd, type="response")
ggplot(nd, aes(x=bmi, y=exp(fit), colour=smoker, fill=smoker)) +
geom_line(linewidth=1) +
scale_colour_manual(values=c("#2166ac","#d6604d")) +
theme_bw() +
labs(title="Predicted charges by BMI and smoking status",
subtitle="Marginal prediction at mean age, female, 0 children, northeast",
x="BMI", y="Predicted charges (USD)", colour="Smoker")
The df_admission dataset contains binary admission
decisions (0/1) from a graduate programme alongside GRE score, GPA, and
institutional rank. A binomial GLM with logit link — logistic
regression — models the log-odds of admission as a linear
function of the predictors.
\[\log\frac{P(\text{admit}=1|\mathbf{x})}{1-P(\text{admit}=1|\mathbf{x})} = \beta_0 + \beta_1\,\text{gre} + \beta_2\,\text{gpa} + \beta_3\,\text{rank}\]
p1 <- ggplot(df_admission, aes(x=gre, fill=factor(admit))) +
geom_density(alpha=0.5) +
scale_fill_manual(values=c("#2166ac","#d6604d"), labels=c("Rejected","Admitted")) +
theme_bw() + labs(title="GRE by admission", fill=NULL)
p2 <- ggplot(df_admission, aes(x=gpa, fill=factor(admit))) +
geom_density(alpha=0.5) +
scale_fill_manual(values=c("#2166ac","#d6604d"), labels=c("Rejected","Admitted")) +
theme_bw() + labs(title="GPA by admission", fill=NULL)
plot_multiplot(p1, p2, cols=2)
## [[1]]
m_admit_null <- glm(admit ~ 1,
family=binomial(link="logit"),
data=df_admission)
m_admit_main <- glm(admit ~ gre + gpa + rank,
family=binomial(link="logit"),
data=df_admission)
summary(m_admit_main)
## ## Call: ## glm(formula = admit ~ gre + gpa + rank, family = binomial(link = "logit"), ## data = df_admission) ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -3.989979 1.139951 -3.500 0.000465 *** ## gre 0.002264 0.001094 2.070 0.038465 * ## gpa 0.804038 0.331819 2.423 0.015388 * ## rankRank2 -0.675443 0.316490 -2.134 0.032829 * ## rankRank3 -1.340204 0.345306 -3.881 0.000104 *** ## rankRank4 -1.551464 0.417832 -3.713 0.000205 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 499.98 on 399 degrees of freedom ## Residual deviance: 458.52 on 394 degrees of freedom ## AIC: 470.52 ## ## Number of Fisher Scoring iterations: 4
lrt_admit <- anova(m_admit_null, m_admit_main, test="LRT")
lrt_admit
## Analysis of Deviance Table ## ## Model 1: admit ~ 1 ## Model 2: admit ~ gre + gpa + rank ## Resid. Df Resid. Dev Df Deviance Pr(>Chi) ## 1 399 499.98 ## 2 394 458.52 5 41.459 7.578e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cat(sprintf("\nOverall model: χ²(%d) = %.3f, p = %.4f\n",
lrt_admit$Df[2],
lrt_admit$Deviance[2],
lrt_admit$`Pr(>Chi)`[2]))
## ## Overall model: χ²(5) = 41.459, p = 0.0000
or_table <- cbind(
OR = exp(coef(m_admit_main)),
exp(confint(m_admit_main))
)
round(or_table, 3)
## OR 2.5 % 97.5 % ## (Intercept) 0.019 0.002 0.167 ## gre 1.002 1.000 1.004 ## gpa 2.235 1.174 4.324 ## rankRank2 0.509 0.272 0.945 ## rankRank3 0.262 0.132 0.512 ## rankRank4 0.212 0.091 0.471
# McFadden pseudo-R²
pseudo_r2 <- 1 - m_admit_main$deviance / m_admit_null$deviance
# Nagelkerke pseudo-R²
n <- nrow(df_admission)
r2_cox <- 1 - exp((m_admit_main$deviance - m_admit_null$deviance) / n)
r2_nag <- r2_cox / (1 - exp(-m_admit_null$deviance / n))
cat("McFadden pseudo-R² :", round(pseudo_r2, 4), "\n")
## McFadden pseudo-R² : 0.0829
cat("Nagelkerke pseudo-R²:", round(r2_nag, 4), "\n")
## Nagelkerke pseudo-R²: 0.138
cat("AIC :", round(AIC(m_admit_main), 2), "\n")
## AIC : 470.52
cat("BIC :", round(BIC(m_admit_main), 2), "\n")
## BIC : 494.47
pred_prob <- predict(m_admit_main, type="response")
roc_obj <- roc(df_admission$admit, pred_prob, quiet=TRUE)
plot(roc_obj, col="#2166ac", lwd=2,
main=sprintf("ROC Curve — Graduate Admission (AUC = %.3f)", auc(roc_obj)))
abline(a=0, b=1, lty=2, col="grey60")
pred_class <- ifelse(pred_prob >= 0.5, 1, 0)
conf_mat <- table(Predicted=pred_class, Observed=df_admission$admit)
conf_mat
## Observed ## Predicted 0 1 ## 0 254 97 ## 1 19 30
cat(sprintf("\nAccuracy : %.3f\n", sum(diag(conf_mat))/sum(conf_mat)))
## ## Accuracy : 0.710
cat(sprintf("Sensitivity: %.3f\n", conf_mat[2,2]/sum(conf_mat[,2])))
## Sensitivity: 0.236
cat(sprintf("Specificity: %.3f\n", conf_mat[1,1]/sum(conf_mat[,1])))
## Specificity: 0.930
nd2 <- expand.grid(
gre = seq(220, 800, length.out=200),
gpa = c(2.5, 3.0, 3.5, 4.0),
rank = factor("Rank2", levels=levels(df_admission$rank))
)
nd2$prob <- predict(m_admit_main, newdata=nd2, type="response")
ggplot(nd2, aes(x=gre, y=prob, colour=factor(gpa), group=factor(gpa))) +
geom_line(linewidth=1) +
scale_colour_brewer(palette="RdYlBu", direction=-1) +
theme_bw() +
labs(title="Predicted admission probability by GRE and GPA",
subtitle="Institutional rank fixed at Rank 2",
x="GRE score", y="P(admit = 1)", colour="GPA")
The number of dependent children is a non-negative integer — a natural candidate for Poisson regression. A key assumption of the Poisson model is equidispersion (\(\text{Var}[Y] = \mu\)). When the variance exceeds the mean (overdispersion), the Negative Binomial GLM provides a robust alternative by introducing a dispersion parameter \(\alpha\) such that \(\text{Var}[Y] = \mu + \alpha\mu^2\).
cat("Mean of children:", round(mean(df_insurance$children), 4), "\n")
## Mean of children: 1.0949
cat("Var of children:", round(var(df_insurance$children), 4), "\n")
## Var of children: 1.4532
cat("Dispersion ratio :", round(var(df_insurance$children)/mean(df_insurance$children), 4), "\n")
## Dispersion ratio : 1.3272
ggplot(df_insurance, aes(x=children)) +
geom_bar(fill="#2166ac", colour="white") +
theme_bw() +
labs(title="Distribution of number of children", x="Children", y="Count")
m_pois <- glm(children ~ age + sex + bmi + smoker + region,
family=poisson(link="log"),
data=df_insurance)
summary(m_pois)
## ## Call: ## glm(formula = children ~ age + sex + bmi + smoker + region, family = poisson(link = "log"), ## data = df_insurance) ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -0.178742 0.156281 -1.144 0.2527 ## age 0.003231 0.001874 1.724 0.0846 . ## sexmale 0.037921 0.052505 0.722 0.4702 ## bmi 0.002430 0.004521 0.538 0.5909 ## smokeryes 0.027626 0.064748 0.427 0.6696 ## regionnorthwest 0.093801 0.075061 1.250 0.2114 ## regionsoutheast -0.007951 0.077049 -0.103 0.9178 ## regionsouthwest 0.083873 0.075436 1.112 0.2662 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for poisson family taken to be 1) ## ## Null deviance: 2001.6 on 1337 degrees of freedom ## Residual deviance: 1994.5 on 1330 degrees of freedom ## AIC: 3899.8 ## ## Number of Fisher Scoring iterations: 5
Formal overdispersion test via ratio of residual deviance to degrees of freedom; values substantially above 1 indicate overdispersion.
dispersion_ratio <- m_pois$deviance / m_pois$df.residual
cat("Residual deviance / df:", round(dispersion_ratio, 4), "\n")
## Residual deviance / df: 1.4996
cat("Interpretation: ratio", ifelse(dispersion_ratio > 1.5, "> 1.5 — overdispersion likely", "≤ 1.5 — equidispersion plausible"), "\n")
## Interpretation: ratio ≤ 1.5 — equidispersion plausible
m_nb <- glm.nb(children ~ age + sex + bmi + smoker + region,
data=df_insurance)
summary(m_nb)
## ## Call: ## glm.nb(formula = children ~ age + sex + bmi + smoker + region, ## data = df_insurance, init.theta = 2.596859124, link = log) ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -0.184715 0.185935 -0.993 0.320 ## age 0.003570 0.002235 1.597 0.110 ## sexmale 0.039297 0.062625 0.628 0.530 ## bmi 0.002196 0.005389 0.407 0.684 ## smokeryes 0.025660 0.077409 0.331 0.740 ## regionnorthwest 0.092149 0.089517 1.029 0.303 ## regionsoutheast -0.008422 0.091333 -0.092 0.927 ## regionsouthwest 0.083844 0.089907 0.933 0.351 ## ## (Dispersion parameter for Negative Binomial(2.5969) family taken to be 1) ## ## Null deviance: 1507.2 on 1337 degrees of freedom ## Residual deviance: 1502.0 on 1330 degrees of freedom ## AIC: 3834.6 ## ## Number of Fisher Scoring iterations: 1 ## ## ## Theta: 2.597 ## Std. Err.: 0.412 ## ## 2 x log-likelihood: -3816.557
cat("\nEstimated dispersion parameter (theta):", round(m_nb$theta, 4), "\n")
## ## Estimated dispersion parameter (theta): 2.5969
Because the Poisson model is nested in the NB when \(\alpha \to 0\), we compare them using a likelihood ratio test.
lrt_nb <- anova(m_pois, m_nb, test="LRT")
cat("AIC Poisson :", round(AIC(m_pois), 2), "\n")
## AIC Poisson : 3899.8
cat("AIC Negative Binom :", round(AIC(m_nb), 2), "\n")
## AIC Negative Binom : 3834.56
cat("Log-likelihood Pois:", round(logLik(m_pois), 2), "\n")
## Log-likelihood Pois: -1941.9
cat("Log-likelihood NB :", round(logLik(m_nb), 2), "\n")
## Log-likelihood NB : -1908.28
irr_table <- cbind(
IRR = exp(coef(m_nb)),
exp(confint(m_nb))
)
round(irr_table, 3)
## IRR 2.5 % 97.5 % ## (Intercept) 0.831 0.576 1.197 ## age 1.004 0.999 1.008 ## sexmale 1.040 0.920 1.176 ## bmi 1.002 0.992 1.013 ## smokeryes 1.026 0.881 1.193 ## regionnorthwest 1.097 0.920 1.307 ## regionsoutheast 0.992 0.829 1.186 ## regionsouthwest 1.087 0.912 1.297
nd3 <- expand.grid(
age = seq(18, 64, length.out=100),
sex = "female",
bmi = mean(df_insurance$bmi),
smoker = c("no","yes"),
region = "northeast"
)
nd3$predicted <- predict(m_nb, newdata=nd3, type="response")
ggplot(nd3, aes(x=age, y=predicted, colour=smoker)) +
geom_line(linewidth=1) +
scale_colour_manual(values=c("#2166ac","#d6604d")) +
theme_bw() +
labs(title="Predicted number of children by age and smoking status",
subtitle="Negative Binomial GLM — marginal at mean BMI, female, northeast",
x="Age", y="Predicted count", colour="Smoker")
The Titanic evacuation followed a “women and children first”
protocol, motivating a sex × pclass interaction: the
survival advantage of female passengers may differ by class.
m_titanic_null <- glm(survived ~ 1,
family=binomial(link="logit"),
data=df_titanic)
m_titanic_main <- glm(survived ~ sex + pclass + age,
family=binomial(link="logit"),
data=df_titanic)
m_titanic_int <- glm(survived ~ sex * pclass + age,
family=binomial(link="logit"),
data=df_titanic)
anova(m_titanic_main, m_titanic_int, test="LRT")
## Analysis of Deviance Table ## ## Model 1: survived ~ sex + pclass + age ## Model 2: survived ~ sex * pclass + age ## Resid. Df Resid. Dev Df Deviance Pr(>Chi) ## 1 1041 982.45 ## 2 1039 931.99 2 50.464 1.101e-11 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m_titanic_int)
## ## Call: ## glm(formula = survived ~ sex * pclass + age, family = binomial(link = "logit"), ## data = df_titanic) ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 4.804345 0.546937 8.784 < 2e-16 *** ## sexmale -3.886389 0.492375 -7.893 2.95e-15 *** ## pclass2nd -1.529875 0.566481 -2.701 0.00692 ** ## pclass3rd -4.064965 0.510661 -7.960 1.72e-15 *** ## age -0.038401 0.006743 -5.695 1.23e-08 *** ## sexmale:pclass2nd -0.070404 0.630978 -0.112 0.91116 ## sexmale:pclass3rd 2.488808 0.540042 4.609 4.05e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 1414.62 on 1045 degrees of freedom ## Residual deviance: 931.99 on 1039 degrees of freedom ## AIC: 945.99 ## ## Number of Fisher Scoring iterations: 5
round(cbind(OR=exp(coef(m_titanic_int)), exp(confint(m_titanic_int))), 3)
## OR 2.5 % 97.5 % ## (Intercept) 122.040 45.331 399.049 ## sexmale 0.021 0.007 0.049 ## pclass2nd 0.217 0.065 0.630 ## pclass3rd 0.017 0.006 0.043 ## age 0.962 0.949 0.975 ## sexmale:pclass2nd 0.932 0.279 3.443 ## sexmale:pclass3rd 12.047 4.498 38.727
emm <- emmeans(m_titanic_int, ~ sex * pclass, type="response")
emm_df <- as.data.frame(emm)
ggplot(emm_df, aes(x=pclass, y=prob, colour=sex, group=sex)) +
geom_point(size=3) +
geom_line(linewidth=1) +
geom_errorbar(aes(ymin=asymp.LCL, ymax=asymp.UCL), width=0.15) +
scale_colour_manual(values=c("#d6604d","#2166ac")) +
theme_bw() +
labs(title="Estimated marginal survival probabilities",
subtitle="Titanic — sex × passenger class interaction (95% CI)",
x="Passenger class", y="P(survived = 1)", colour="Sex")
pred_titanic <- predict(m_titanic_int, type="response")
roc_titanic <- roc(df_titanic$survived, pred_titanic, quiet=TRUE)
plot(roc_titanic, col="#d6604d", lwd=2,
main=sprintf("ROC Curve — Titanic Survival (AUC = %.3f)", auc(roc_titanic)))
abline(a=0, b=1, lty=2, col="grey60")
AIC penalises model complexity as \(-2\ell + 2k\); BIC uses \(-2\ell + k\ln n\) and imposes a heavier penalty for larger samples, favouring more parsimonious models.
models <- list(
"Gaussian — main effects" = m_gauss_main,
"Gaussian — + smoker×bmi" = m_gauss_full,
"Logistic — admission" = m_admit_main,
"Poisson — children" = m_pois,
"Neg. Binom — children" = m_nb,
"Logistic — Titanic main" = m_titanic_main,
"Logistic — Titanic + int." = m_titanic_int
)
aic_df <- data.frame(
Model = names(models),
k = sapply(models, function(m) length(coef(m))),
LogLik = sapply(models, function(m) round(as.numeric(logLik(m)), 2)),
AIC = sapply(models, function(m) round(AIC(m), 2)),
BIC = sapply(models, function(m) round(BIC(m), 2)),
stringsAsFactors = FALSE
)
knitr::kable(aic_df, row.names=FALSE)
|Model | k| LogLik| AIC| BIC| |:-------------------------|--:|--------:|-------:|-------:| |Gaussian — main effects | 9| -808.52| 1637.03| 1689.02| |Gaussian — + smoker×bmi | 10| -762.05| 1546.11| 1603.29| |Logistic — admission | 6| -229.26| 470.52| 494.47| |Poisson — children | 8| -1941.90| 3899.80| 3941.39| |Neg. Binom — children | 8| -1908.28| 3834.56| 3881.35| |Logistic — Titanic main | 5| -491.23| 992.45| 1017.22| |Logistic — Titanic + int. | 7| -465.99| 945.99| 980.66|
| Assumption | Gaussian GLM | Logistic GLM | Poisson GLM |
|---|---|---|---|
| Correct distributional family | Residuals ~ Normal | Binary outcome | Non-negative integer |
| Linear relationship on link scale | Partial residual plots | Log-odds linearity | Log-mean linearity |
| Independence of observations | Design assumption | Design assumption | Design assumption |
| No influential outliers | Cook’s distance, leverage | Cook’s distance | Cook’s distance |
| No multicollinearity | VIF < 5 | VIF < 5 | VIF < 5 |
| Equidispersion | — | — | Deviance / df ≈ 1 |
par(mfrow=c(1,3))
plot(m_gauss_full, which=1, main="Gaussian GLM — residuals vs fitted")
plot(m_admit_main, which=1, main="Logistic GLM — residuals vs fitted")
plot(m_nb, which=1, main="Neg. Binom — residuals vs fitted")
par(mfrow=c(1,1))
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A, 135(3), 370–384.
McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). Chapman & Hall.
Agresti, A. (2015). Foundations of Linear and Generalized Linear Models. Wiley.
Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (4th ed.). Springer.