Chapter 11: Regression with a Binary Dependent Variable

Karim Naguib (Boston University)
12/2/2013

Introduction (1)

Previously, we learned how to use binary variables as regressors (independent variables)
But in some cases we might be interested in learning how entity characteristics influence a binary dependent variable
For example, we might be interested in studying whether there is racial discrimination in the provision of loans
- We are interested in comparing individuals who have different races, but are otherwise identical
- It is not sufficient to compare average loan denial rates

Introduction (2)

We will consider two forms of regression to analyze such situations

Linear Probability Models, using OLS to do multiple regression analysis with a binary dependent variable
Nonlinear Regression Models, that might be a better fit of such binary models

Binary Dependent Variables and the Linear Probability Model

Binary Dependent Variables

Examples:

Provision of a mortgage loan
Decision to smoke/not smoke
Decision to go to college or not
A country receives foreign aid or not

Racial Discrimination Mortgage Loans

In this chapter we are interested in studying whether there is racial discrimination in provision of mortgage loans
Data compiled by researchers at the Boston Fed under the Home Mortgage Disclosure Act (HMDA)
The dependent variable is this example is a binary variable equal
- 1 if an individual is denied
- 0 otherwise

Effect of Payment-to-Income Ratio

Using a subset of the data on mortgages (\( n = 127 \))

Interpreting the OLS Regression (1)

Looking at the plot we see that when \( P/I~ratio = 0.3 \), the predicted value \( \widehat{deny} = 0.2 \).
What does it mean to be the predicted binary variable to be continuous value?
Using a probability linear model, we should interpret this as predicting that someone with such a P/I ratio would be denied a loan with a probability of 20%.

Interpreting the OLS Regression (2)

Recall that the predicted value of an OLS regression is \[ E[Y|X_1, \dots, X_k] = \beta_0 + \beta_1 X_{1i} + \dots + \beta_k X_{ki} \]
Recall that for a binary random (Bernoulli) variable \( Y \) \[ \begin{align*} E[Y] &= Pr(Y=0)\times 0 + Pr(Y=1)\times 1 \\ &= Pr(Y=1) \end{align*} \]
In a regression context \[ E[Y|X_1,\dots,X_k] = Pr(Y=1|X_1,\dots, X_k) \]

The Linear Probability Model

The linear probability model is

\[ Y_i = \beta_0 + \beta_1 X_{1i} + \dots + \beta_k X_{ki} + u_i \]

and therefore

\[ Pr(Y = 1|X_1,\dots, X_k) = \beta_0 + \beta_1 X_{1} + \dots + \beta_k X_{k} \]

\( \beta_1 \) is the change in the probability that \( Y=1 \) associated with a unit change in \( X_1 \).

R Squared in a LPM

A model with a continuous dependent variable one can imagine the possibility of getting \( R^2 = 1 \), when all the data lines on the regression line.
This would be impossible if we had a binary dependent variable, unless the regressors are also all binary.
Therefore, \( R^2 \) from a LPM regression does not have a useful interpretation.

Application to the Boston HMDA Data

lpm.reg.res.1 <- lm(deny ~ pi.rate, data=hmda.data)
coeftest(lpm.reg.res.1, vcov.=vcovHAC(lpm.reg.res.1))


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.0799     0.0323   -2.47    0.013 *  
pi.rate       0.6035     0.1029    5.86  5.1e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lpm.reg.res.2 <- lm(deny ~ pi.rate + black, data=hmda.data)
coeftest(lpm.reg.res.2, vcov.=vcovHAC(lpm.reg.res.2))


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.0905     0.0294   -3.08   0.0021 ** 
pi.rate       0.5592     0.0916    6.11  1.2e-09 ***
black         0.1774     0.0375    4.74  2.3e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Probit and Logit Regression

Introduction

Since the fit in a linear probability model could be nonsensical, we consider two alternative nonlinear regression models
Since cumulative probability distribution functions (CDFs) produce functions from 0 to 1, we use them to model \( Pr(Y=1|X_1,\dots,X_k) \)
We use two types of nonlinear models
1. Probit regressions, which uses the CDF of the standard normal
2. Logit regression, uses a “logistic” CDF

Probit Regression

The Probit regression model with a single regressor is

\[ Pr(Y=1|X) = \Phi(\beta_0 + \beta_1 X) \]

where \( \Phi \) is the CDF of the standard normal distribution

Example

Consider the mortgage example, regression loan denial on the P/I ratio
Suppose that \( \beta_0 = -2 \) and \( \beta_1 = 3 \)
What is the probability of being denied a loan is \( P/I~ratio = 0.4 \)?

\[ \begin{align*} \Phi(\beta_0 + \beta_1 P/I~ratio) &= \Phi(-2 + 3\times P/I~ratio)\\ &= \Phi(-0.8) \\ &= Pr(Z \leq -0.8) = 21.2\% \end{align*} \] where \( Z \sim N(0,1) \)

Interpreting the Coefficient (1)

\[ Pr(Y=1|X) = \Phi(\underbrace{\beta_0 + \beta_1 X}_{z}) \]

\( \beta_1 \) is the change in the \( z \)-value associated with a unit change in \( X \).
If \( \beta_1 > 0 \), an increase in \( X \) would lead to an increase in the \( z \)-value and in turn the probability of \( Y=1 \)
If \( \beta_1 < 0 \), an increase in \( X \) would lead to a decrease in the \( z \)-value and in turn the probability of \( Y=1 \)

Interpreting the Coefficient (2)

While, the effect of \( X \) on the \( z \)-value is linear, its effect on \( Pr(Y=1) \) is nonlinear

Multiple Regressor Probit

\[ Pr(Y=1|X_1, X_2) = \Phi(\beta_0 + \beta_1 X_1 + \beta_2 X_2) \]

Once again the parameters \( \beta_1 \) and \( \beta_2 \) represent the linear effect of a unit change in \( X_1 \) and \( X_2 \), respectively, on the \( z \)-value.
For example, suppose \( \beta_0 = -1.6 \), \( \beta_1 = 2 \), and \( \beta_2 = 0.5 \). If \( X_1 = 0.4 \) and \( X_2 = 1 \), the probability that \( Y=1 \) would be \( \Phi(-0.3) = 38\% \).

General Probit Model

\[ \begin{equation*} Pr(Y=1|X_1, X_2,\dots, X_k) = \\ \\ \Phi(\underbrace{\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k}_z) \end{equation*} \] To calculate the effect of a change in a regressor (e.g. from \( X_1 \) to \( X_1 + \Delta X_1 \)) on the \( Pr(Y=1|X_1,\dots,X_k) \), subtract \[ \Phi(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k) \] from \[ \Phi(\beta_0 + \beta_1 (X_1 + \Delta X_1) + \beta_2 X_2 + \dots + \beta_k X_k) \]

Application to Mortgage Data (1)

pro.reg.res.1 <- glm(deny ~ pi.rate, family=binomial(link="probit"), data=hmda.data)
coeftest(pro.reg.res.1)


z test of coefficients:

            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -2.194      0.138  -15.93  < 2e-16 ***
pi.rate        2.968      0.386    7.69  1.4e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Application to Mortgage Data (2)

pro.reg.res.2 <- glm(deny ~ pi.rate + black, family=binomial(link="probit"), data=hmda.data)
coeftest(pro.reg.res.2)


z test of coefficients:

            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.2588     0.1367  -16.52  < 2e-16 ***
pi.rate       2.7418     0.3805    7.21  5.7e-13 ***
black         0.7082     0.0834    8.50  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Logit Regression

\[ \begin{equation*} Pr(Y = 1|X_1, \dots, X_k) =\\ F(\beta_0 + \beta_1 X_1 + \dots + \beta_k X_k) =\\ \frac{1}{1+\exp(\beta_0 + \beta_1 X_1 + \dots + \beta_k X_k)} \end{equation*} \]

It is similar to the probit model, except that we use the CDF for a standard logistic distribution, instead of the CDF for a standard normal.

Comparing the Probit and Logit Regression Models

Application to Mortgage Data (1)

log.reg.res.1 <- glm(deny ~ pi.rate + black, family=binomial(link="logit"), data=hmda.data)
coeftest(log.reg.res.1)


z test of coefficients:

            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -4.126      0.268  -15.37  < 2e-16 ***
pi.rate        5.370      0.728    7.37  1.7e-13 ***
black          1.273      0.146    8.71  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1