Logistic Regression

Model Checking with Deviance & Influence Diagnostics

fit <- glm(yes/(yes+no) ~ gender + race, 
           weights = yes + no, 
           family = binomial, 
           data = df)

summary(fit)
## 
## Call:
## glm(formula = yes/(yes + no) ~ gender + race, family = binomial, 
##     data = df, weights = yes + no)
## 
## Deviance Residuals: 
##        1         2         3         4  
## -0.04513   0.04402   0.17321  -0.15493  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.83035    0.16854  -4.927 8.37e-07 ***
## gendermale   0.20261    0.08519   2.378  0.01739 *  
## racewhite    0.44374    0.16766   2.647  0.00813 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 12.752784  on 3  degrees of freedom
## Residual deviance:  0.057982  on 1  degrees of freedom
## AIC: 30.414
## 
## Number of Fisher Scoring iterations: 3

By default when a glm object is summarized, R provides deviance residuals.

# Residual Deviance
fit$deviance
## [1] 0.05798151
# P-value for Deviance Goodness of Fit Test
1 - pchisq(fit$deviance, fit$df.residual)
## [1] 0.8097152
# Estimated Probabilities of Marijuana Use
fitted(fit)
##         1         2         3         4 
## 0.4045330 0.4541297 0.3035713 0.3480244

Within Contingency Tables

\(n_{i}:\) count at setting i of explanatory variables
\(\hat{\pi_{i}}:\) model estimated probability of success
\(n_{i}\hat{\pi_{i}}:\) fitted number of successes (estimated binomial mean)
\(n_{i}(1 - \hat{\pi_{i}}):\) fitted number of failures

n <- (df$yes + df$no)
fit.yes <- n*fitted(fit)
fit.no <- n*(1 - fitted(fit))
df <- data.frame(Race = df$race, 
                 Gender = df$gender,
                 Yes = df$yes,
                 No = df$no, 
                 n = n,
                 Fit.Yes = round(fit.yes, 2), 
                 Fit.No = round(fit.no, 2))
kable(df)
Race Gender Yes No n Fit.Yes Fit.No
white female 420 620 1040 420.71 619.29
white male 483 579 1062 482.29 579.71
other female 25 55 80 24.29 55.71
other male 32 62 94 32.71 61.29

Residual Analysis

Standardized vs Pearson vs Deviance Residuals

t <- summary(fit)
res <- data.frame(Standardized = rstandard(fit, type = "pearson"),
                  Pearson = residuals(fit, type = "pearson"),
                  Deviance = residuals(fit, type = "deviance"),
                  StdDevRes = rstandard(fit, type = "deviance"),
                  row.names = c("White Females", "White Males", "Other Females", "Other Males"))
kable(res)
Standardized Pearson Deviance StdDevRes
White Females -0.2409612 -0.0451288 -0.0451329 -0.2409831
White Males 0.2409612 0.0440229 0.0440211 0.2409512
Other Females 0.2409612 0.1736850 0.1732133 0.2403069
Other Males -0.2409612 -0.1546648 -0.1549316 -0.2413769
# Residual Degrees of Freedom
fit$df.residual
## [1] 1

Standardized residuals are preferred as they “appropriately recognize parameter redundancies” according to Agresti