Karim Naguib (Boston University)
12/2/2013
We will consider two forms of regression to analyze such situations
Examples:
Recall that the predicted value of an OLS regression is \[ E[Y|X_1, \dots, X_k] = \beta_0 + \beta_1 X_{1i} + \dots + \beta_k X_{ki} \]
Recall that for a binary random (Bernoulli) variable \( Y \) \[ \begin{align*} E[Y] &= Pr(Y=0)\times 0 + Pr(Y=1)\times 1 \\ &= Pr(Y=1) \end{align*} \]
In a regression context \[ E[Y|X_1,\dots,X_k] = Pr(Y=1|X_1,\dots, X_k) \]
The linear probability model is
\[ Y_i = \beta_0 + \beta_1 X_{1i} + \dots + \beta_k X_{ki} + u_i \]
and therefore
\[ Pr(Y = 1|X_1,\dots, X_k) = \beta_0 + \beta_1 X_{1} + \dots + \beta_k X_{k} \]
lpm.reg.res.1 <- lm(deny ~ pi.rate, data=hmda.data)
coeftest(lpm.reg.res.1, vcov.=vcovHAC(lpm.reg.res.1))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0799 0.0323 -2.47 0.013 *
pi.rate 0.6035 0.1029 5.86 5.1e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lpm.reg.res.2 <- lm(deny ~ pi.rate + black, data=hmda.data)
coeftest(lpm.reg.res.2, vcov.=vcovHAC(lpm.reg.res.2))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0905 0.0294 -3.08 0.0021 **
pi.rate 0.5592 0.0916 6.11 1.2e-09 ***
black 0.1774 0.0375 4.74 2.3e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Probit regression model with a single regressor is
\[ Pr(Y=1|X) = \Phi(\beta_0 + \beta_1 X) \]
where \( \Phi \) is the CDF of the standard normal distribution
\[ \begin{align*} \Phi(\beta_0 + \beta_1 P/I~ratio) &= \Phi(-2 + 3\times P/I~ratio)\\ &= \Phi(-0.8) \\ &= Pr(Z \leq -0.8) = 21.2\% \end{align*} \] where \( Z \sim N(0,1) \)
\[ Pr(Y=1|X) = \Phi(\underbrace{\beta_0 + \beta_1 X}_{z}) \]
\[ Pr(Y=1|X_1, X_2) = \Phi(\beta_0 + \beta_1 X_1 + \beta_2 X_2) \]
\[ \begin{equation*} Pr(Y=1|X_1, X_2,\dots, X_k) = \\ \\ \Phi(\underbrace{\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k}_z) \end{equation*} \] To calculate the effect of a change in a regressor (e.g. from \( X_1 \) to \( X_1 + \Delta X_1 \)) on the \( Pr(Y=1|X_1,\dots,X_k) \), subtract \[ \Phi(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k) \] from \[ \Phi(\beta_0 + \beta_1 (X_1 + \Delta X_1) + \beta_2 X_2 + \dots + \beta_k X_k) \]
pro.reg.res.1 <- glm(deny ~ pi.rate, family=binomial(link="probit"), data=hmda.data)
coeftest(pro.reg.res.1)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.194 0.138 -15.93 < 2e-16 ***
pi.rate 2.968 0.386 7.69 1.4e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pro.reg.res.2 <- glm(deny ~ pi.rate + black, family=binomial(link="probit"), data=hmda.data)
coeftest(pro.reg.res.2)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2588 0.1367 -16.52 < 2e-16 ***
pi.rate 2.7418 0.3805 7.21 5.7e-13 ***
black 0.7082 0.0834 8.50 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\[ \begin{equation*} Pr(Y = 1|X_1, \dots, X_k) =\\ F(\beta_0 + \beta_1 X_1 + \dots + \beta_k X_k) =\\ \frac{1}{1+\exp(\beta_0 + \beta_1 X_1 + \dots + \beta_k X_k)} \end{equation*} \]
It is similar to the probit model, except that we use the CDF for a standard logistic distribution, instead of the CDF for a standard normal.
log.reg.res.1 <- glm(deny ~ pi.rate + black, family=binomial(link="logit"), data=hmda.data)
coeftest(log.reg.res.1)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.126 0.268 -15.37 < 2e-16 ***
pi.rate 5.370 0.728 7.37 1.7e-13 ***
black 1.273 0.146 8.71 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1