Logistic Regression

Model Checking with Deviance & Influence Diagnostics

fit <- glm(yes/(yes+no) ~ gender + race, 
           weights = yes + no, 
           family = binomial, 
           data = df)

summary(fit)

## 
## Call:
## glm(formula = yes/(yes + no) ~ gender + race, family = binomial, 
##     data = df, weights = yes + no)
## 
## Deviance Residuals: 
##        1         2         3         4  
## -0.04513   0.04402   0.17321  -0.15493  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.83035    0.16854  -4.927 8.37e-07 ***
## gendermale   0.20261    0.08519   2.378  0.01739 *  
## racewhite    0.44374    0.16766   2.647  0.00813 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 12.752784  on 3  degrees of freedom
## Residual deviance:  0.057982  on 1  degrees of freedom
## AIC: 30.414
## 
## Number of Fisher Scoring iterations: 3

By default when a glm object is summarized, R provides deviance residuals.

# Residual Deviance
fit$deviance

## [1] 0.05798151

# P-value for Deviance Goodness of Fit Test
1 - pchisq(fit$deviance, fit$df.residual)

## [1] 0.8097152

# Estimated Probabilities of Marijuana Use
fitted(fit)

##         1         2         3         4 
## 0.4045330 0.4541297 0.3035713 0.3480244

Within Contingency Tables

\(n_{i}:\) count at setting i of explanatory variables
\(\hat{\pi_{i}}:\) model estimated probability of success
\(n_{i}\hat{\pi_{i}}:\) fitted number of successes (estimated binomial mean)
\(n_{i}(1 - \hat{\pi_{i}}):\) fitted number of failures

n <- (df$yes + df$no)
fit.yes <- n*fitted(fit)
fit.no <- n*(1 - fitted(fit))
df <- data.frame(Race = df$race, 
                 Gender = df$gender,
                 Yes = df$yes,
                 No = df$no, 
                 n = n,
                 Fit.Yes = round(fit.yes, 2), 
                 Fit.No = round(fit.no, 2))
kable(df)

Race	Gender	Yes	No	n	Fit.Yes	Fit.No
white	female	420	620	1040	420.71	619.29
white	male	483	579	1062	482.29	579.71
other	female	25	55	80	24.29	55.71
other	male	32	62	94	32.71	61.29

Residual Analysis

Standardized vs Pearson vs Deviance Residuals

t <- summary(fit)
res <- data.frame(Standardized = rstandard(fit, type = "pearson"),
                  Pearson = residuals(fit, type = "pearson"),
                  Deviance = residuals(fit, type = "deviance"),
                  StdDevRes = rstandard(fit, type = "deviance"),
                  row.names = c("White Females", "White Males", "Other Females", "Other Males"))
kable(res)

	Standardized	Pearson	Deviance	StdDevRes
White Females	-0.2409612	-0.0451288	-0.0451329	-0.2409831
White Males	0.2409612	0.0440229	0.0440211	0.2409512
Other Females	0.2409612	0.1736850	0.1732133	0.2403069
Other Males	-0.2409612	-0.1546648	-0.1549316	-0.2413769

# Residual Degrees of Freedom
fit$df.residual