Discrete Choice Models

This document is about estimating a model when response/dependent variable is binary.Variables on the right hand side can be of any nature (nominal, ordinal, interval or ratio). I am not discussing here theoretical aspects but only Applied aspect with a focus on interpretation of coefficients of Logit/Probit models.
Such models are in wide use in all disciplines. Whether an email one receives is a spam or not, whether you are food secure or not, one is suffering from hypertention or not, you will be denied loan or not etc. Therefore, it is important to understand models which deals with data having binary outcome.

OLS or linear probability model has three issues while estimating the model with binary outcome
i) Probability of outcome may be negative or greater than zero ii) Non-linearity of model in terms of parameters (recall interaction effect model where nonlinearity was an issue with respect to variables only interaction) iii) Residuals may be hetroscedastic video

Instead of using OLS , Logit/Probit models are estimated using Maximum Likelihood Estimation (MLE) procedure to estimate the parameters of the model.

Difference between Logit and Probit model

There is underlying difference between the two is that Logit model use Logistic distribution while Probit model underlying probability distributionis normal (Gaussian). Calculating cummulative distributive function using normal distribution was more complicated than that of logistic distribution before modern computer became avaiable. Therefore, Logistic distribution was a preferred choice. With the availability of modern computing facilities, calculating the cummulative distribution function for normal distribution has also become a trivial issue. Both Logit and Probit models produce very similar results and in practice there is hardly any difference in empirical results between the two. Choice between the two is a matter of taste. My objective of this post is to use these models and interpreting the coefficients instead of going through their theoretical underpinings.

\[\begin{equation} y=1\;\; \text {if an individual chooses to buy a car}\\ y=0\;\; \text {if an individual chooses not to buy} \label{eq:binaryydef16} \end{equation}\]

\(E(y)\) is the probability that the response variable takes the value of \(1\); therefore, a predicted value of \(y\) is a prediction for the probability that \(y=1\).

\[\begin{equation} y=E(y)+e=\beta_{1}+\beta_{2}x_{2}+...+\beta_{k}x_{k}+e \label{eq:lpmgen16} \end{equation}\]

#Load 'AER' package and HMDA data
library(AER)
library(tidyverse)
library(knitr)  ## for knitr
library(broom)  ## for tidy
library(xtable)
library(huxtable)
data(HMDA)

We observe first few observations and nature of variables used in the data

library(dplyr)
glimpse(HMDA)
## Rows: 2,380
## Columns: 14
## $ deny      <fct> no, no, no, no, no, no, no, no, yes, no, no, no, yes, no,...
## $ pirat     <dbl> 0.221, 0.265, 0.372, 0.320, 0.360, 0.240, 0.350, 0.280, 0...
## $ hirat     <dbl> 0.221, 0.265, 0.248, 0.250, 0.350, 0.170, 0.290, 0.220, 0...
## $ lvrat     <dbl> 0.8000000, 0.9218750, 0.9203980, 0.8604651, 0.6000000, 0....
## $ chist     <fct> 5, 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 2, ...
## $ mhist     <fct> 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, ...
## $ phist     <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, n...
## $ unemp     <dbl> 3.9, 3.2, 3.2, 4.3, 3.2, 3.9, 3.9, 1.8, 3.1, 3.9, 3.1, 4....
## $ selfemp   <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, n...
## $ insurance <fct> no, no, no, no, no, no, no, no, yes, no, no, no, yes, no,...
## $ condomin  <fct> no, no, no, no, no, no, yes, no, no, no, yes, no, no, no,...
## $ afam      <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, n...
## $ single    <fct> no, yes, no, no, no, no, yes, no, no, yes, yes, yes, no, ...
## $ hschool   <fct> yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, ye...

Converting factor variable into numeric

#HMDA<-HMDA %>% mutate(ifelse(deny=="no",0,1))## Convert fct variable into numeric
#be mindful about "yes" and "no" are written in small letters.
HMDA$deny <- as.numeric(HMDA$deny) - 1 ## convert deny to numeric 

lm.mod<-lm(deny~pirat,data = HMDA)
lm.mod
## 
## Call:
## lm(formula = deny ~ pirat, data = HMDA)
## 
## Coefficients:
## (Intercept)        pirat  
##    -0.07991      0.60353

\[\begin{align} \widehat{deny} = -\underset{(0.032)}{0.080} + \underset{(0.098)}{0.604} P/I \ ratio. \tag{2} \end{align}\]

ggplot(HMDA)+aes(x=pirat,y=deny)+geom_point()+geom_abline()+
  labs(x="P/I ratio",y="Deny",title = "Scatterplot Mortgage Application Denial and the Pament to Income Ration")+ylim(-0.4,1.4)+xlim(0,1.5)
## Warning: Removed 1 rows containing missing values (geom_point).

library(huxtable)
auto.probit <- glm(deny~pirat, family=binomial(link="probit"), 
                   data=HMDA)
kable(tidy(auto.probit), digits=4, align='c', caption=
        "HMDA example, estimated by probit")
HMDA example, estimated by probit
term estimate std.error statistic p.value
(Intercept) -2.1941 0.1378 -15.9271 0
pirat 2.9679 0.3858 7.6936 0
huxreg(lm.mod,auto.probit) %>%  set_caption("HMDA example, estimated by probit")
HMDA example, estimated by probit
(1) (2)
(Intercept) -0.080 *** -2.194 ***
(0.021)    (0.138)   
pirat 0.604 *** 2.968 ***
(0.061)    (0.386)   
N 2380         2380        
R2 0.040             
logLik -651.424     -831.792    
AIC 1308.847     1667.585    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Equation gives the marginal effect of a change in the regressor xk on the probability that y=1 .

The Logit Model for Binary Choice

This is very similar to the probit model, with the difference that logit uses the logistic function Λ to link the linear expression {1}+{2}x to the probability that the response variable is equal to 1 . Equations and give the defining expressions of the logit model (the two expressions are equivalent).

\[\begin{equation} p=\Lambda (\beta_{1}+\beta_{2}x)=\frac{1}{1+e^{-( \beta_{1}+\beta_{2}x)}} \label{eq:logitdefA16} \end{equation}\] 
\[\begin{equation} p=\frac{exp(\beta_{1}+\beta_{2}x)}{1+exp(\beta_{1}+\beta_{2}x)} \label{eq:logitdefB16} \end{equation}\]$$ 
HMDA.logit <- glm(deny~pirat, 
              data=HMDA, family=binomial(link="logit"))
kable(tidy(HMDA.logit), digits=5, align="c",
      caption="Logit estimates for the 'HMDA' dataset")
Logit estimates for the ‘HMDA’ dataset
term estimate std.error statistic p.value
(Intercept) -4.02843 0.26858 -14.99925 0
pirat 5.88450 0.73360 8.02141 0
huxreg(lm.mod,auto.probit,HMDA.logit) %>% set_caption("Models estimating probability of denying loans") %>% add_colnames("LPM","probit","logit")
Models estimating probability of denying loans
names model1 model2 model3
(1) (2) (3)
(Intercept) -0.080 *** -2.194 *** -4.028 ***
(0.021)    (0.138)    (0.269)   
pirat 0.604 *** 2.968 *** 5.884 ***
(0.061)    (0.386)    (0.734)   
N 2380         2380         2380        
R2 0.040                      
logLik -651.424     -831.792     -830.094    
AIC 1308.847     1667.585     1664.188    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Rename the variable afam as Black

#HMDA
 HMDA<-HMDA %>% rename(Black=afam)
mod.lm2<-lm(deny~Black+pirat,data = HMDA)
# estimate a Logit regression with multiple regressors
logit2 <- glm(deny ~ pirat + Black, 
                  family = binomial(link = "logit"), 
                  data = HMDA)
coeftest(logit2, vcov. = vcovHC, type = "HC1")
## 
## z test of coefficients:
## 
##             Estimate Std. Error  z value  Pr(>|z|)    
## (Intercept) -4.12556    0.34597 -11.9245 < 2.2e-16 ***
## pirat        5.37036    0.96376   5.5723 2.514e-08 ***
## Blackyes     1.27278    0.14616   8.7081 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\(\begin{align} \widehat{P(deny=1 \vert P/I ratio, black)} = F(-\underset{(0.35)}{4.13} + \underset{(0.96)}{5.37} P/I \ ratio + \underset{(0.15)}{1.27} black). \tag{6} \end{align}\)

mod.probit1<-glm(deny~Black+pirat, 
              data=HMDA, family=binomial(link="probit"))
mod.logit1<-glm(deny~Black+pirat, 
              data=HMDA, family=binomial(link="logit"))
hreg<-huxreg(lpm=mod.lm2,probit1=mod.probit1,logit=mod.logit1) %>% set_caption("Three binary choice models for the HMDA dataset") %>% add_colnames()
hreg
Three binary choice models for the HMDA dataset
names lpm probit1 logit
lpm probit1 logit
(Intercept) -0.091 *** -2.259 *** -4.126 ***
(0.021)    (0.137)    (0.268)   
Blackyes 0.177 *** 0.708 *** 1.273 ***
(0.018)    (0.083)    (0.146)   
pirat 0.559 *** 2.742 *** 5.370 ***
(0.060)    (0.380)    (0.728)   
N 2380         2380         2380        
R2 0.076                      
logLik -605.611     -797.136     -795.695    
AIC 1219.222     1600.272     1597.390    
*** p < 0.001; ** p < 0.01; * p < 0.05.

\[\begin{align} \widehat{deny} =& \, -\underset{(0.029)}{0.091} + \underset{(0.089)}{0.559} P/I \ ratio + \underset{(0.025)}{0.177} black. \tag{11.3} \end{align}\]

The coefficient on black is positive and significantly different from zero at the
0.01% level. The interpretation is that, holding constant the P/I ratio, being black increases the probability of a mortgage application denial by about 17.7%. This finding is compatible with racial discrimination. However, it might be distorted by omitted variable bias so discrimination could be a premature conclusion.

The linear probability model has a clear flaw of a probability model being linear. If P/I ratio increase from 0.75 to 0.80, chances of denying a loan are the same as that of from 1.2 to 1.25. Secondly P/I ratio exceeds 1.75, probability of loan being denied exceeds 1 which is not acceptable as per probability definition.

HMDA.logit <- glm(deny~pirat, 
              data=HMDA, family=binomial(link="logit"))
summary(HMDA$deny)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.1197  0.0000  1.0000
ggplot(HMDA, aes(x=pirat, y=deny)) + geom_point() + 
  stat_smooth(method="glm", method.args=list(family="binomial"), se=TRUE)
## `geom_smooth()` using formula 'y ~ x'

library(tidyverse)
HMDA.probit <- glm(deny~pirat, 
              data=HMDA, family=binomial(link="probit"))
# Function to get mpg value for a given vs probability
probX = function(p, HMDA.probit) {
  data.frame(prob=p, 
             xval = (qnorm(p) - coef(HMDA.probit)[1])/coef(HMDA.probit)[2])
}
d <- probX(c(0.1,0.5,0.9), HMDA.probit)
d
prob xval
0.1 0.307
0.5 0.739
0.9 1.17 
ggplot(HMDA, aes(x=pirat,y=deny)) + 
  geom_point() + 
  stat_smooth(method="glm", method.args=list(family=binomial(link="probit"))) +
  geom_segment(data=d, aes(x=xval, xend=xval, y=0, yend=prob), colour="red")
## `geom_smooth()` using formula 'y ~ x'