Slides for Week 10 is available here.
For the lab, we will be using the code materials from this book: https://www.econometrics-with-r.org/11-2-palr.html
Install and load all necessary libraries
# install.packages("AER")
library(stargazer)
library(car)
library(sandwich)
# load `AER` package and attach the HMDA data
library(AER)
data(HMDA)
Exploratory Data Analysis and Variable Transformation
## deny pirat hirat lvrat chist mhist phist unemp selfemp insurance condomin
## 1 no 0.221 0.221 0.8000000 5 2 no 3.9 no no no
## 2 no 0.265 0.265 0.9218750 2 2 no 3.2 no no no
## 3 no 0.372 0.248 0.9203980 1 2 no 3.2 no no no
## 4 no 0.320 0.250 0.8604651 1 2 no 4.3 no no no
## 5 no 0.360 0.350 0.6000000 1 1 no 3.2 no no no
## 6 no 0.240 0.170 0.5105263 1 1 no 3.9 no no no
## afam single hschool
## 1 no no yes
## 2 no yes yes
## 3 no no yes
## 4 no no yes
## 5 no no yes
## 6 no no yes
# rename the variable 'afam' for consistency
colnames(HMDA)[colnames(HMDA) == "afam"] <- "black"
summary(HMDA)
## deny pirat hirat lvrat chist
## no :2095 Min. :0.0000 Min. :0.0000 Min. :0.0200 1:1353
## yes: 285 1st Qu.:0.2800 1st Qu.:0.2140 1st Qu.:0.6527 2: 441
## Median :0.3300 Median :0.2600 Median :0.7795 3: 126
## Mean :0.3308 Mean :0.2553 Mean :0.7378 4: 77
## 3rd Qu.:0.3700 3rd Qu.:0.2988 3rd Qu.:0.8685 5: 182
## Max. :3.0000 Max. :3.0000 Max. :1.9500 6: 201
## mhist phist unemp selfemp insurance condomin
## 1: 747 no :2205 Min. : 1.800 no :2103 no :2332 no :1694
## 2:1571 yes: 175 1st Qu.: 3.100 yes: 277 yes: 48 yes: 686
## 3: 41 Median : 3.200
## 4: 21 Mean : 3.774
## 3rd Qu.: 3.900
## Max. :10.600
## black single hschool
## no :2041 no :1444 no : 39
## yes: 339 yes: 936 yes:2341
##
##
##
##
We now estimate a simple Probit model of the probability of a mortgage denial.
# estimate the simple probit model
denyprobit <- glm(deny ~ pirat,
family = binomial(link = "probit"),
data = HMDA)
coeftest(denyprobit, vcov. = vcovHC, type = "HC1")
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.19415 0.18901 -11.6087 < 2.2e-16 ***
## pirat 2.96787 0.53698 5.5269 3.259e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We use predict() to compute the predicted change in the denial probability when P/I ratio increased from 0.3 to 0.4
# 1. compute predictions for P/I ratio = 0.3, 0.4
predictions <- predict(denyprobit,
newdata = data.frame("pirat" = c(0.3, 0.4)),
type = "response")
# 2. Compute difference in probabilities
diff(predictions)
## 2
## 0.06081433
We find that an increase in the payment-to-income ratio from 0.3 to 0.4 is predicted to increase the probability of denial by approximately 6.2 %.
We continue by using an augmented Probit model to estimate the effect of race on the probability of a mortgage application denial.
denyprobit2 <- glm(deny ~ pirat + black,
family = binomial(link = "probit"),
data = HMDA)
coeftest(denyprobit2, vcov. = vcovHC, type = "HC1")
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.258787 0.176608 -12.7898 < 2.2e-16 ***
## pirat 2.741779 0.497673 5.5092 3.605e-08 ***
## blackyes 0.708155 0.083091 8.5227 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\[\begin{align} \widehat{P(deny\vert P/I \ ratio, black)} = \Phi (-\underset{(0.18)}{2.26} + \underset{(0.50)}{2.74} P/I \ ratio + \underset{(0.08)}{0.71} black). \tag{11.6} \end{align}\]
While all coefficients are highly significant, both the estimated coefficients on the payments-to-income ratio and the indicator for African American descent are positive. Again, the coefficients are difficult to interpret but they indicate that, first, African Americans have a higher probability of denial than white applicants, holding constant the payments-to-income ratio and second, applicants with a high payments-to-income ratio face a higher risk of being rejected.
How big is the estimated difference in denial probabilities between two hypothetical applicants with the same payments-to-income ratio? As before, we may use predict() to compute this difference.
# 1. compute predictions for P/I ratio = 0.3
predictions <- predict(denyprobit2,
newdata = data.frame("black" = c("no", "yes"),
"pirat" = c(0.3, 0.3)),
type = "response")
# 2. compute difference in probabilities
diff(predictions)
## 2
## 0.1578117
In this case, the estimated difference in denial probabilities is about 15.8%.
It is fairly easy to estimate a Logit regression model using R.
denylogit <- glm(deny ~ pirat,
family = binomial(link = "logit"),
data = HMDA)
coeftest(denylogit, vcov. = vcovHC, type = "HC1")
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.02843 0.35898 -11.2218 < 2.2e-16 ***
## pirat 5.88450 1.00015 5.8836 4.014e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# plot data
plot(x = HMDA$pirat,
y = HMDA$deny,
main = "Probit and Logit Models Model of the Probability of Denial, Given P/I Ratio",
xlab = "P/I ratio",
ylab = "Deny",
pch = 20,
ylim = c(-0.4, 1.4),
cex.main = 0.9)
# add horizontal dashed lines and text
abline(h = 1, lty = 2, col = "darkred")
abline(h = 0, lty = 2, col = "darkred")
text(2.5, 0.9, cex = 0.8, "Mortgage denied")
text(2.5, -0.1, cex= 0.8, "Mortgage approved")
# add estimated regression line of Probit and Logit models
x <- seq(0, 3, 0.01)
y_probit <- predict(denyprobit, list(pirat = x), type = "response")
y_logit <- predict(denylogit, list(pirat = x), type = "response")
lines(x, y_probit, lwd = 1.5, col = "steelblue")
lines(x, y_logit, lwd = 1.5, col = "black", lty = 2)
# add a legend
legend("topleft",
horiz = TRUE,
legend = c("Probit", "Logit"),
col = c("steelblue", "black"),
lty = c(1, 2))
Both models produce very similar estimates of the probability that a mortgage application will be denied depending on the applicants payment-to-income ratio.
Following the book we extend the simple Logit model of mortgage
denial with the additional regressor black
.
# estimate a Logit regression with multiple regressors
denylogit2 <- glm(deny ~ pirat + black,
family = binomial(link = "logit"),
data = HMDA)
coeftest(denylogit2, vcov. = vcovHC, type = "HC1")
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.12556 0.34597 -11.9245 < 2.2e-16 ***
## pirat 5.37036 0.96376 5.5723 2.514e-08 ***
## blackyes 1.27278 0.14616 8.7081 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As for the Probit model, all model coefficients are highly
significant and we obtain positive estimates for the coefficients on
P/I ratio
and black
. For comparison we compute
the predicted probability of denial for two hypothetical applicants that
differ in race and have a P/I ratio
of 0.3 .
# 1. compute predictions for P/I ratio = 0.3
predictions <- predict(denylogit2,
newdata = data.frame("black" = c("no", "yes"),
"pirat" = c(0.3, 0.3)),
type = "response")
summary(predictions)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.07485 0.11218 0.14950 0.14950 0.18682 0.22415
## 2
## 0.1492945
We find that the white applicant faces a denial probability of only 7.5%, while the African American is rejected with a probability of 22.4%, a difference of 14.9%.
## [1] 0.3308136
## [1] 0.2553461
## [1] 0.7377759
## [1] 2.116387
## [1] 1.721008
## [1] 0.07352941
##
## no yes
## 0.97983193 0.02016807
##
## no yes
## 0.8836134 0.1163866
##
## no yes
## 0.6067227 0.3932773
##
## no yes
## 0.01638655 0.98361345
## [1] 3.774496
##
## no yes
## 0.7117647 0.2882353
##
## no yes
## 0.857563 0.142437
##
## no yes
## 0.8802521 0.1197479
Before estimating the models we transform the loan-to-value ratio (lvrat) into a factor variable, where \[\begin{align*} lvrat = \begin{cases} \text{low} & \text{if} \ \ lvrat < 0.8, \\ \text{medium} & \text{if} \ \ 0.8 \leq lvrat \leq 0.95, \\ \text{high} & \text{if} \ \ lvrat > 0.95 \end{cases} \end{align*}\] and convert both credit scores to numeric variables.
# define low, medium and high loan-to-value ratio
HMDA$lvrat <- factor(
ifelse(HMDA$lvrat < 0.8, "low",
ifelse(HMDA$lvrat >= 0.8 & HMDA$lvrat <= 0.95, "medium", "high")),
levels = c("low", "medium", "high"))
# convert credit scores to numeric
HMDA$mhist <- as.numeric(HMDA$mhist)
HMDA$chist <- as.numeric(HMDA$chist)
Next we reproduce the estimation results presented in Table 11.2 of the book.
# estimate all 5 models for the denial probability
logit_HMDA <- glm(deny ~ black + pirat + hirat + lvrat + chist + mhist + phist
+ insurance + selfemp,
family = binomial(link = "logit"),
data = HMDA)
probit_HMDA_1 <- glm(deny ~ black + pirat + hirat + lvrat + chist + mhist + phist
+ insurance + selfemp,
family = binomial(link = "probit"),
data = HMDA)
probit_HMDA_2 <- glm(deny ~ black + pirat + hirat + lvrat + chist + mhist + phist
+ insurance + selfemp + single + hschool + unemp,
family = binomial(link = "probit"),
data = HMDA)
probit_HMDA_3 <- glm(deny ~ black + pirat + hirat + lvrat + chist + mhist
+ phist + insurance + selfemp + single + hschool + unemp + condomin
+ I(mhist==3) + I(mhist==4) + I(chist==3) + I(chist==4) + I(chist==5)
+ I(chist==6),
family = binomial(link = "probit"),
data = HMDA)
probit_HMDA_4 <- glm(deny ~ black * (pirat + hirat) + lvrat + chist + mhist + phist
+ insurance + selfemp + single + hschool + unemp,
family = binomial(link = "probit"),
data = HMDA)
Just as in previous chapters, we store heteroskedasticity-robust
standard errors of the coefficient estimators in a list which is then
used as the argument se in stargazer()
.
rob_se <- list(sqrt(diag(vcovHC(logit_HMDA, type = "HC1"))),
sqrt(diag(vcovHC(probit_HMDA_1, type = "HC1"))),
sqrt(diag(vcovHC(probit_HMDA_2, type = "HC1"))),
sqrt(diag(vcovHC(probit_HMDA_3, type = "HC1"))),
sqrt(diag(vcovHC(probit_HMDA_4, type = "HC1"))))
logit_HMDA$AIC <- AIC(logit_HMDA)
probit_HMDA_1$AIC <- AIC(probit_HMDA_1)
probit_HMDA_2$AIC <- AIC(probit_HMDA_2)
probit_HMDA_3$AIC <- AIC(probit_HMDA_3)
probit_HMDA_4$AIC <- AIC(probit_HMDA_4)
logit_HMDA$BIC <- BIC(logit_HMDA)
probit_HMDA_1$BIC <- BIC(probit_HMDA_1)
probit_HMDA_2$BIC <- BIC(probit_HMDA_2)
probit_HMDA_3$BIC <- BIC(probit_HMDA_3)
probit_HMDA_4$BIC <- BIC(probit_HMDA_4)
# logit model
stargazer(logit_HMDA,
digits = 3, out = "logit.html",
type = "html",
header = FALSE,
title = "Logit Regressions of Mortgage Denial Using the Boston HMDA data",
dep.var.caption = "Dependent Variable: Mortgage Denial",
se = rob_se,
keep.stat = c("n","aic","bic"),
star.char = c("+", "*", "**", "***"),
star.cutoffs = c(0.1, 0.05, 0.01, 0.001),
notes = c("(+ p<0.1; * p<0.05; ** p<0.01; *** p<0.001)"),
notes.append = FALSE,
model.numbers = FALSE,
column.labels = c( "(1)"))
Dependent Variable: Mortgage Denial | |
deny | |
(1) | |
blackyes | 0.688*** |
(0.183) | |
pirat | 4.764*** |
(1.332) | |
hirat | -0.109 |
(1.298) | |
lvratmedium | 0.464** |
(0.160) | |
lvrathigh | 1.495*** |
(0.325) | |
chist | 0.290*** |
(0.039) | |
mhist | 0.279* |
(0.138) | |
phistyes | 1.226*** |
(0.203) | |
insuranceyes | 4.548*** |
(0.576) | |
selfempyes | 0.666** |
(0.214) | |
Constant | -5.707*** |
(0.484) | |
Observations | 2,380 |
Akaike Inf. Crit. | 1,293.273 |
Bayesian Inf. Crit. | 1,356.797 |
Note: | (+ p<0.1; * p<0.05; ** p<0.01; *** p<0.001) |
# probit model
stargazer(probit_HMDA_1,
probit_HMDA_2, probit_HMDA_3, probit_HMDA_4,
digits = 3, out = "probit.html",
type = "html",
header = FALSE,
title = "Logit and Probit Regressions of Mortgage Denial Using the Boston HMDA data",
dep.var.caption = "Dependent Variable: Mortgage Denial",
se = rob_se,
keep.stat = c("n","aic","bic"),
star.char = c("+", "*", "**", "***"),
star.cutoffs = c(0.1, 0.05, 0.01, 0.001),
notes = c("(+ p<0.1; * p<0.05; ** p<0.01; *** p<0.001)"),
notes.append = FALSE,
model.numbers = FALSE,
column.labels = c( "(2)", "(3)", "(4)", "(5)"))
Dependent Variable: Mortgage Denial | ||||
deny | ||||
(2) | (3) | (4) | (5) | |
blackyes | 0.389* | 0.371*** | 0.363*** | 0.246* |
(0.183) | (0.099) | (0.100) | (0.101) | |
pirat | 2.442+ | 2.464*** | 2.622*** | 2.572*** |
(1.332) | (0.673) | (0.654) | (0.665) | |
hirat | -0.185 | -0.302 | -0.502 | -0.538 |
(1.298) | (0.689) | (0.689) | (0.715) | |
lvratmedium | 0.214 | 0.216** | 0.215** | 0.216* |
(0.160) | (0.082) | (0.082) | (0.084) | |
lvrathigh | 0.791* | 0.795*** | 0.836*** | 0.788*** |
(0.325) | (0.183) | (0.184) | (0.185) | |
chist | 0.155*** | 0.158*** | 0.344*** | 0.158 |
(0.039) | (0.021) | (0.021) | (0.108) | |
mhist | 0.148 | 0.110 | 0.162* | 0.111 |
(0.138) | (0.073) | (0.076) | (0.104) | |
phistyes | 0.697*** | 0.702*** | 0.717*** | 0.705*** |
(0.203) | (0.114) | (0.115) | (0.116) | |
insuranceyes | 2.557*** | 2.585*** | 2.589*** | 2.590*** |
(0.576) | (0.305) | (0.299) | (0.306) | |
selfempyes | 0.359+ | 0.346** | 0.342** | 0.348** |
(0.214) | (0.113) | (0.116) | (0.116) | |
singleyes | 0.229 | 0.230** | 0.226** | |
(0.080) | (0.086) | |||
hschoolyes | -0.613 | -0.604** | -0.620** | |
(0.229) | (0.237) | |||
unemp | 0.030 | 0.028 | 0.030+ | |
(0.018) | (0.018) | |||
condominyes | -0.055 | |||
I(mhist == 3) | -0.107 | |||
I(mhist == 4) | -0.383 | |||
I(chist == 3) | -0.226 | |||
I(chist == 4) | -0.251 | |||
I(chist == 5) | -0.789 | |||
I(chist == 6) | -0.905 | |||
blackyes:pirat | -0.579 | |||
blackyes:hirat | 1.232 | |||
Constant | -3.041*** | -2.575*** | -2.896*** | -2.543*** |
(0.484) | (0.250) | (0.350) | (0.404) | |
Observations | 2,380 | 2,380 | 2,380 | 2,380 |
Akaike Inf. Crit. | 1,295.694 | 1,285.227 | 1,292.129 | 1,288.664 |
Bayesian Inf. Crit. | 1,359.218 | 1,366.075 | 1,413.401 | 1,381.062 |
Note: | (+ p<0.1; * p<0.05; ** p<0.01; *** p<0.001) |
Exponentiate Coefficients for logit_HMDA and Show Odds Ratios in Stargazer
# Exponentiate the coefficients and compute robust SE for the logit model
logit_OR <- exp(coef(logit_HMDA))
logit_OR_se <- exp(coef(logit_HMDA)) * sqrt(diag(vcovHC(logit_HMDA, type = "HC1")))
# Create a dummy model with exponentiated coefficients to use in stargazer
logit_HMDA_OR <- logit_HMDA
logit_HMDA_OR$coefficients <- logit_OR
# Stargazer table for exponentiated log odds (odds ratios)
stargazer(logit_HMDA_OR,
type = "html",
out = "logit_odds_ratios.html",
title = "Exponentiated Coefficients (Odds Ratios) for Logit Model of Mortgage Denial",
se = list(logit_OR_se),
digits = 3,
keep.stat = c("n","aic","bic"),
dep.var.caption = "Dependent Variable: Mortgage Denial",
star.char = c("+", "*", "**", "***"),
star.cutoffs = c(0.1, 0.05, 0.01, 0.001),
notes = c("(+ p<0.1; * p<0.05; ** p<0.01; *** p<0.001)"),
notes.append = FALSE,
model.numbers = FALSE,
column.labels = c("Logit OR"))
##
## <table style="text-align:center"><caption><strong>Exponentiated Coefficients (Odds Ratios) for Logit Model of Mortgage Denial</strong></caption>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>Dependent Variable: Mortgage Denial</td></tr>
## <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td>deny</td></tr>
## <tr><td style="text-align:left"></td><td>Logit OR</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">blackyes</td><td>1.991<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.363)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">pirat</td><td>117.263</td></tr>
## <tr><td style="text-align:left"></td><td>(156.217)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">hirat</td><td>0.897</td></tr>
## <tr><td style="text-align:left"></td><td>(1.164)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">lvratmedium</td><td>1.590<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.255)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">lvrathigh</td><td>4.458<sup>**</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(1.448)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">chist</td><td>1.337<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.052)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">mhist</td><td>1.322<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.182)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">phistyes</td><td>3.407<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.693)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">insuranceyes</td><td>94.459<sup>+</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(54.373)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">selfempyes</td><td>1.947<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.416)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">Constant</td><td>0.003<sup>*</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.002)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>2,380</td></tr>
## <tr><td style="text-align:left">Akaike Inf. Crit.</td><td>1,293.273</td></tr>
## <tr><td style="text-align:left">Bayesian Inf. Crit.</td><td>1,356.797</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right">(+ p<0.1; * p<0.05; ** p<0.01; *** p<0.001)</td></tr>
## </table>