Linear Probability Model

LPM Intro

$D_{i} = x_{i}\beta + \mu_{i}$, $E\mu_{i}=0,\; E\mu_{i} \mu_{\ell}=0,\, i\ne\ell$
where $D_i= \begin{cases} 1, & \text{if "state A" occurs with probability } p_i\\ 0, & \text{otherwise with probability }1-p_i \end{cases}$

$Pr(D_i=0)+Pr(D_i=1) = 1$
The OLS estimator for $\beta$ is $\hat{\beta}=(x^{\prime}x)^{-1}x^{\prime}D$, where $D$ is $n\times 1$ vector, and $Var(\hat{\beta})=\sigma^{2}(x^{\prime}x)^{-1}$
$E(D_{i}|x_{i}) = x_{i}\beta = p_{i}$
$Var(D_{i}|x_{i}) = p_{i}(1-p_{i})=x_{i}\beta(1-x_{i}\beta)$
- The variance depends on $x_i$ (i.e., heteroscedasticity).
- When we ignore this, $\hat{\beta}$ is not minimum variance, but is unbiased $E\hat{\beta}=\beta$.

Consider a simple regression with the discrete dependent variable; \[ \begin{array}{l} D_i & = & \beta_1 + \beta_2 x_{i2} + \mu_i\\ \hat{D}_i & = & \hat{\beta}_1 + \hat{\beta}_2 x_{i2} \end{array}\]

$\frac{d \hat{D}_{i}}{d x_{i2}} = \hat{\beta}_{2}$: change of probability of $D_i$ due to 1 unit increase of $x_i$.

What if your predicted values are out of [0,1] range?

Solution to $\hat{p}_{i}<0$ or $\hat{p}_{i}>0$:

Do nothing (ignore)
$\hat{p}_{i} = x_{i}\hat{\beta}$ if $\hat{\beta}\in[0,1]$, otherwise replace $\hat{p}_{i} = 0$ if $\hat{p}_{i} < 0$, and replace $\hat{p}_{i} = 1$ if $\hat{p}_{i} > 1$.

Note: $\mu_i = D_{i} - X_i \hat{\beta}$ if $D_i = 1$, then $\mu_i = 1 - x_{i}\beta$, if $D_i = 0$, then $\mu_i = - x_{i}\beta$.
$\Rightarrow D_i \& \mu_i$ have a binomal distribution!(should be a matter when making an inference with finite sample)

Simple Example

Let us think \[y^*_{i} = x_{i}\beta + \mu_i\] , where $y^*_{i}$ is the true probability of bankruptcy, $x_i$ is an observed index for financial distress (from 0 to 10). $\mu_i$ follows i.i.d of 0 mean and 10 standard deviation.

Let’s say that $y^*_{i}$ is unobservable, but we can observe people who bankrupted.
$D_{i} = 1$ if the person bankrupted and 0 otherwise.
We would like to predict the bankruptcy probability using the observed financial index.

# simple example 
rm(list=ls())
set.seed(123)
e <- rnorm(1000, mean=0, sd=10)
x <- runif(1000, 0, 10)

beta <- 10
y <- beta*x + e 

dta <- as.data.frame(cbind(y,x))
dta$D <- rep(0, nrow(dta))
dta[dta$y > 60,]$D <- 1

dta <- dta[!(dta$y > 100),]
dta <- dta[!(dta$y < 0),]
dta$bankruptcy <- as.factor(dta$D)
with(dta, plot(x, y, col=bankruptcy))
legend(9,4.3,unique(dta$bankruptcy),col=1:length(dta$bankruptcy),pch=1)

m0 <- lm(y~x, data=dta)
m1 <- lm(D~x, data=dta)

# install.packages("sjPlot")
# install.packages("sjmisc")
# install.packages("sjlabelled")
library(sjPlot)

## Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!

library(sjmisc)
library(sjlabelled)
library(lme4)

## Loading required package: Matrix

sjPlot::tab_model(m0, m1)

	y			D
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	2.81	1.59 – 4.04	<0.001	-0.32	-0.36 – -0.28	<0.001
x	9.46	9.25 – 9.68	<0.001	0.14	0.13 – 0.15	<0.001
Observations	923			923
R² / R² adjusted	0.891 / 0.891			0.644 / 0.644

plot(D ~ x, cex = 1.3, data=dta, ylim=c(-0.5, 1.5)) +
abline(lm(D ~ x, data=dta), col="blue")

## integer(0)

legend(0.1,1.5, lty=c(1), col=c("blue"), 
      legend=c("Fitted line"), box.col="white")

par(mfrow=c(1,2))
with(m1, plot(residuals ~ fitted.values, main="Linear probability residuals")) +
  abline(h=0, col="red", lty=2)

## integer(0)

with(m0, plot(residuals ~ fitted.values, main="True latent variable residuals")) +
  abline(h=0, col="red", lty=2)

## integer(0)

#par(mfrow=c(1,2))
#hist(dta$D, main ="Di histogram")
par(mfrow=c(1,1))
par(mar = c(5,5,2,5))
hist(m1$residuals, breaks=30, main = "Residuals histogram", yaxt="n") 
par(new=TRUE)
plot(density(m1$residuals, bw=0.04),main="", axes=F, ylab=NA, xlab=NA, col="blue", lty=2)
axis(side=4)
mtext(side = 4, line = 3, 'prob. density')
legend("topleft", lty=c(1,2), col=c("black", "blue"), 
      legend=c("frequency", "nonparametric 
estimation (Gaussian)"), box.col="white")

We can easily tell that there is heteroscedasticity in the linear probability residuals.

Economic Application

How to derive a discrete choice model using economics?

We consider an example of technology adoption in agriculture. Suppose that a farmer $i$ has two types of biotechnologies (e.g., conventional seed and genetically engineered seed), and she has to choose either one to be planted in her farmland. There are $n$ number of farmers in the economy.

For $i=1, \dots, n$,

$\pi_{1i}=x_{1i}\beta_{1} + \mu_{1i}$: stochastic profit of farm $i$ adopting “new technology”
$\pi_{0i}=x_{0i}\beta_{0} + \mu_{0i}$: stochastic profit of farm $i$ maintaing “conventinoal technology”.

Economic decision to adopt “new technology” if $\pi_{0i} < \pi_{1i}$ can be expressed by \[ \text{Define } D_i= \begin{cases} 1, & \text{if }\pi_{0i}<\pi_{1i} & (P_{i})\text{ i.e. adopting the new technology}\\ 0, & \text{otherwise } \pi_{0i} \ge \pi_{1i} & (1-P_{i}) \text{ i.e. staying with conventional technology} \end{cases} \] \[ \begin{array}{l} Pr[D_{i}=1|x_{1i}, x_{0i}] &=& Pr[\pi_{1i}>\pi_{0i}] \\ &=& Pr[x_{1i}\beta_{1} + \mu_{1i} > x_{0i}\beta_{0} + \mu_{0i}] \\ &=& Pr[x_{1i}\beta_{1} - x_{0i}\beta_{0} > \mu_{0i} - \mu_{1i} ] \end{array} \]

Let $\mu_{0i} - \mu_{1i} \equiv \varepsilon_i$ and $x_{1i}\beta_{1} - x_{0i}\beta_{0} \equiv x_{i}\beta^*$
$Pr[\varepsilon_{i}<x_{i}\beta^{*}] = \int^{x_i \beta^*}_{-\infty}f_{\varepsilon}(t)dt = F_{\varepsilon}(x_{i}\beta^*) = P_i$
$Pr[D_i =0 | x_{1i}, x_{0i}] = 1 - Pr[D_{i}=1|x_{1i}, x_{0i}] = 1- P_i$

The probability distribution of $\varepsilon$, $f_{\varepsilon}()$ and $F_{\varepsilon}()$, where $\varepsilon \equiv \mu_{0i} - \mu_{1i}$, is the main part we need to know.

Note that, in the case of uniform distribution (min=0, max=1), $Pr[D_i=1|x_i] = \int^{x_i\beta^*}_{-\infty}f_{\varepsilon}(t)dt=F_{\varepsilon}(x_i\beta^*)=x_i\beta^*$ if $x_i\beta^* \in [0,1]$, otherwise, 0 if $x_i\beta^* <0$, and 1 if $x_i\beta^* >1$.
Thus, in this case, LPM is correct economic model.
Think about a specific input involved with the technology (e.g., human captial).
- $\frac{\partial P_i}{\partial x_j} = \beta_{1j} - \beta_{0j} \quad(x_j\text{ is in both } x_{1i} \text{ and } x_{0i})$
- $\frac{\partial P_i}{\partial x_j} = \beta_{1j} \quad(\text{if }x_j\text{ is only in } x_{1i})$
- $\frac{\partial P_i}{\partial x_j} = -\beta_{0j} \quad(\text{if }x_j\text{ is only in } x_{0i})$
Example: “$r^0$”= price of old technology
if price of conventional technology increases, it will urge farmers adopt the new technology: $\frac{\partial P_i}{\partial r^0} = -\beta_{0r^0} >0$.

Probit Model

Probit model estimation

In the previous part, we highlighted the importance of specifying the distribution of $\varepsilon_i$, $F_{\varepsilon}(x_i\beta^*)$.

Recall that $Pr[D_i=1|x_i] = \int^{x_i\beta^*}_{-\infty}f_{\varepsilon}(t)dt=F_{\varepsilon}(x_i\beta^*)=x_i\beta^*$ if $x_i\beta^* \in [0,1]$, otherwise, 0 if $x_i\beta^* <0$, and 1 if $x_i\beta^* >1$.

Probit model assumes normality: $\mu_{1i} \sim N(0, \sigma_{1}^2)$ and $\mu_{2i} \sim N(0, \sigma_{2}^2)$. Sum of two random variables, independently following different normal distributions, results in a normal distribution. Hence, $\varepsilon_{i} \sim N(0,\sigma_1^2 + \sigma_2^2 - 2\sigma_{12})$ where $\sigma_{12}\equiv cov(\mu_{1i}, \mu_{2i})$.

Probit model can be run through a canned program in many statistical programs.
In R, ‘glm’ command helps you to run the probit estimation.
The main idea is using the assumed normality, so the estimator should be a maximum likelihood estimator (MLE).

What is MLE?

Suppose $y_i = \beta_1 +\mu_i,\quad \mu_i \sim N(0,\sigma^2)$
Likelihood function: joint pdf for $y_i$’s, viewed as a function of $\beta_1, \sigma^2$
MLE is defined as $\arg \max \ln L(\beta_1, \sigma^2|y)$
For testing MLE estimates, you do joint type tests using likelihood ratio test statistics, not $F$.

Brief explanation about the likelihood ratio test

$\Omega:\; y_i = x_i\beta + \epsilon_i, \quad i=1,\dots,m \quad \epsilon_i\sim N(0,\sigma_{\epsilon})\quad E\epsilon_{i} \epsilon_{\ell}=0,\; i \ne \ell$ $H_0:\; R\beta = C,\; R\text{ is }(q\times k) \text{ matrix},\;rank(R)=q \le k = length(\beta)$

Say $L(W)$ is the maximum likelihood of the restricted regression model ($H_0$). Similarly, $L(\Omega)$ is the maximum likelihood of the model $\Omega$.
The likelihood ratio for the two models is defined as $\Phi = \frac{L(\hat{W})}{L(\hat{\Omega})}$, which should be always less than 1 by structure.
Decision: Reject the null if $\chi^2_0 = -2\ln\Phi>\chi_{q,\alpha}^2$ at $\alpha \cdot 100\%$ significance level, otherwise accept $H_0$.
- Note that $-2\ln \Phi \sim \chi^2_q$, where $q=rank(R)$.

Let’s return to $y=x\beta+\mu$, but $\mu \sim MVN(0, \sigma^2 I_{n})$

Then the MLE for $\beta$ is $\hat{\beta}_{OLS}=(x^{\prime}x)^{-1}x^{\prime}y=\hat{\beta}_{ML}$.
Issue of MLE when $\mu \sim$ MVN.
Recall ${ML}^2 = = = {OLS}^2 $ except for $n\rightarrow \infty$, $\hat{\mu}= y- x\hat{\beta}_{OLS} = y- x\hat{\beta}_{ML}$.
Chioce: in testing $H_0: R\beta = C_0,\; H_1: negation$. Use $F$ or $\chi^2$.
Use $\chi^2$ if $F$ not apply to
- e.g. non-linear model:
  $y_i = \beta_1 + \frac{1}{\beta_2}x_{i3} + \beta^2_{3}x_{i4} + \mu_i$
  $y = A\cdot L^{\beta_2}K^{\beta_{3}}\mu \implies \ln y = \beta_1 + \beta_2 \ln L + \beta_3 \ln K + \ln \mu$

Estimation practice

In the below chunk, I am going to show two ways to run the probit model - using ‘gls’ command and manually solve MLE process.
- For this exercise, I use “titanic” survival records (downloaded from https://stanford.io/2O9RUCF).
- The purpose of this analysis is to predict the chance of survival conditional on observable characteristics (such as passenger’s class, gender, age, fare).

rm(list=ls())
titanic <- read.csv("https://www.dropbox.com/s/pze3wk43boevzyh/titanic.csv?dl=1")

data <- titanic
# logistic regression (MLE)
titanic$class.f <- factor(data$Pclass)
titanic$sex.f <- factor(data$Sex)


attach(titanic)
native <- glm(Survived ~ class.f + sex.f + Age + Fare, family = binomial(link = "probit"), na.action = na.pass)

summary(native)

## 
## Call:
## glm(formula = Survived ~ class.f + sex.f + Age + Fare, family = binomial(link = "probit"), 
##     na.action = na.pass)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8103  -0.6806  -0.4179   0.6488   2.4720  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.0612500  0.2406347   8.566  < 2e-16 ***
## class.f2    -0.6770830  0.1694124  -3.997 6.42e-05 ***
## class.f3    -1.3648159  0.1665982  -8.192 2.56e-16 ***
## sex.fmale   -1.5237500  0.1050024 -14.512  < 2e-16 ***
## Age         -0.0188588  0.0040956  -4.605 4.13e-06 ***
## Fare         0.0002994  0.0012500   0.240    0.811    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1182.77  on 886  degrees of freedom
## Residual deviance:  803.88  on 881  degrees of freedom
## AIC: 815.88
## 
## Number of Fisher Scoring iterations: 5

class_dummies <- model.matrix(~class.f)
#class_dummies$`(Intercept)`[class_dummies$class.f2==1 | class_dummies$class.f3==1] <- 0
class_dummies <- class_dummies[,2:3]
head(class_dummies) # 2nd, 3rd dummy

##   class.f2 class.f3
## 1        0        1
## 2        0        0
## 3        0        1
## 4        0        0
## 5        0        1
## 6        0        1

sex_dummies <- model.matrix(~sex.f)
sex_dummies <- sex_dummies[,2]
head(sex_dummies) # male dummy

## 1 2 3 4 5 6 
## 1 0 0 0 1 1

dat <- data.matrix(cbind(Survived, 1, class_dummies, sex_dummies, Age, Fare))

init <- (lm(Survived ~ class.f + sex.f + Age + Fare))$coefficients # got a different result
# Other initial value trials
init <- native$coefficients # works
init <- rep(0,6) # worked well

Y <- as.numeric(dat[, 1])
X <- data.matrix(dat[, 2:7])


# requires model matrix `X` and binary response `Y`
probit.nll <- function (beta) {
  # linear predictor
  eta <- X %*% beta
  # probability
  p <- pnorm(eta)
  # negative log-likelihood
  -sum((1 - Y) * log(1 - p) + Y * log(p))
}

probit.gr <- function (beta) {
  # linear predictor
  eta <- X %*% beta
  # probability
  p <- pnorm(eta)
  # chain rule
  u <- dnorm(eta) * (Y - p) / (p * (1 - p))
  # gradient
  -crossprod(X, u) # same as -t(X)%*%u
}


fit <- optim(init, probit.nll, gr = probit.gr, method = "BFGS", hessian = TRUE)

names(fit$par) <- names(native$coefficients)
# install.packages("htmlTable")
# install.packages("magrittr")
library(htmlTable)

## Warning: package 'htmlTable' was built under R version 3.6.3

library(magrittr)


names(native$coefficients) <- c("(intercept)", "2nd class", "3rd class", "male", "age", "fare")
  
matrix(round(c(fit$par, native$coefficients), digits=3),
       nrow=6, ncol=2,
       dimnames = list(c(names(native$coefficients)),
                       c("manual", "glm"))) %>% 
htmlTable

	manual	glm
(intercept)	2.061	2.061
2nd class	-0.677	-0.677
3rd class	-1.365	-1.365
male	-1.524	-1.524
age	-0.019	-0.019
fare	0	0

Econ571: Linear Probability Model

Spring 2020

Made by Seungki Lee