Logistic Regression
Model Checking with Deviance & Influence Diagnostics
fit <- glm(yes/(yes+no) ~ gender + race,
weights = yes + no,
family = binomial,
data = df)
summary(fit)
##
## Call:
## glm(formula = yes/(yes + no) ~ gender + race, family = binomial,
## data = df, weights = yes + no)
##
## Deviance Residuals:
## 1 2 3 4
## -0.04513 0.04402 0.17321 -0.15493
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.83035 0.16854 -4.927 8.37e-07 ***
## gendermale 0.20261 0.08519 2.378 0.01739 *
## racewhite 0.44374 0.16766 2.647 0.00813 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 12.752784 on 3 degrees of freedom
## Residual deviance: 0.057982 on 1 degrees of freedom
## AIC: 30.414
##
## Number of Fisher Scoring iterations: 3
By default when a glm object is summarized, R provides deviance residuals.
# Residual Deviance
fit$deviance
## [1] 0.05798151
# P-value for Deviance Goodness of Fit Test
1 - pchisq(fit$deviance, fit$df.residual)
## [1] 0.8097152
# Estimated Probabilities of Marijuana Use
fitted(fit)
## 1 2 3 4
## 0.4045330 0.4541297 0.3035713 0.3480244
Within Contingency Tables
\(n_{i}:\) count at setting i of explanatory variables
\(\hat{\pi_{i}}:\) model estimated probability of success
\(n_{i}\hat{\pi_{i}}:\) fitted number of successes (estimated binomial mean)
\(n_{i}(1 - \hat{\pi_{i}}):\) fitted number of failures
n <- (df$yes + df$no)
fit.yes <- n*fitted(fit)
fit.no <- n*(1 - fitted(fit))
df <- data.frame(Race = df$race,
Gender = df$gender,
Yes = df$yes,
No = df$no,
n = n,
Fit.Yes = round(fit.yes, 2),
Fit.No = round(fit.no, 2))
kable(df)
| white |
female |
420 |
620 |
1040 |
420.71 |
619.29 |
| white |
male |
483 |
579 |
1062 |
482.29 |
579.71 |
| other |
female |
25 |
55 |
80 |
24.29 |
55.71 |
| other |
male |
32 |
62 |
94 |
32.71 |
61.29 |
Residual Analysis
Standardized vs Pearson vs Deviance Residuals
t <- summary(fit)
res <- data.frame(Standardized = rstandard(fit, type = "pearson"),
Pearson = residuals(fit, type = "pearson"),
Deviance = residuals(fit, type = "deviance"),
StdDevRes = rstandard(fit, type = "deviance"),
row.names = c("White Females", "White Males", "Other Females", "Other Males"))
kable(res)
| White Females |
-0.2409612 |
-0.0451288 |
-0.0451329 |
-0.2409831 |
| White Males |
0.2409612 |
0.0440229 |
0.0440211 |
0.2409512 |
| Other Females |
0.2409612 |
0.1736850 |
0.1732133 |
0.2403069 |
| Other Males |
-0.2409612 |
-0.1546648 |
-0.1549316 |
-0.2413769 |
# Residual Degrees of Freedom
fit$df.residual
## [1] 1
Standardized residuals are preferred as they “appropriately recognize parameter redundancies” according to Agresti