Linear Probability Model

LPM Intro

\(D_{i} = x_{i}\beta + \mu_{i}\), \(E\mu_{i}=0,\; E\mu_{i} \mu_{\ell}=0,\, i\ne\ell\)
where \(D_i= \begin{cases} 1, & \text{if "state A" occurs with probability } p_i\\ 0, & \text{otherwise with probability }1-p_i \end{cases}\)

  • \(Pr(D_i=0)+Pr(D_i=1) = 1\)
  • The OLS estimator for \(\beta\) is \(\hat{\beta}=(x^{\prime}x)^{-1}x^{\prime}D\), where \(D\) is \(n\times 1\) vector, and \(Var(\hat{\beta})=\sigma^{2}(x^{\prime}x)^{-1}\)
  • \(E(D_{i}|x_{i}) = x_{i}\beta = p_{i}\)
  • \(Var(D_{i}|x_{i}) = p_{i}(1-p_{i})=x_{i}\beta(1-x_{i}\beta)\)
    • The variance depends on \(x_i\) (i.e., heteroscedasticity).
    • When we ignore this, \(\hat{\beta}\) is not minimum variance, but is unbiased \(E\hat{\beta}=\beta\).

Consider a simple regression with the discrete dependent variable; \[ \begin{array}{l} D_i & = & \beta_1 + \beta_2 x_{i2} + \mu_i\\ \hat{D}_i & = & \hat{\beta}_1 + \hat{\beta}_2 x_{i2} \end{array}\]

\(\frac{d \hat{D}_{i}}{d x_{i2}} = \hat{\beta}_{2}\): change of probability of \(D_i\) due to 1 unit increase of \(x_i\).

What if your predicted values are out of [0,1] range?

Solution to \(\hat{p}_{i}<0\) or \(\hat{p}_{i}>0\):

  1. Do nothing (ignore)
  2. \(\hat{p}_{i} = x_{i}\hat{\beta}\) if \(\hat{\beta}\in[0,1]\), otherwise replace \(\hat{p}_{i} = 0\) if \(\hat{p}_{i} < 0\), and replace \(\hat{p}_{i} = 1\) if \(\hat{p}_{i} > 1\).
  • Note: \(\mu_i = D_{i} - X_i \hat{\beta}\) if \(D_i = 1\), then \(\mu_i = 1 - x_{i}\beta\), if \(D_i = 0\), then \(\mu_i = - x_{i}\beta\).
    \(\Rightarrow D_i \& \mu_i\) have a binomal distribution!(should be a matter when making an inference with finite sample)

Simple Example

Let us think \[y^*_{i} = x_{i}\beta + \mu_i\] , where \(y^*_{i}\) is the true probability of bankruptcy, \(x_i\) is an observed index for financial distress (from 0 to 10). \(\mu_i\) follows i.i.d of 0 mean and 10 standard deviation.

  • Let’s say that \(y^*_{i}\) is unobservable, but we can observe people who bankrupted.
  • \(D_{i} = 1\) if the person bankrupted and 0 otherwise.
  • We would like to predict the bankruptcy probability using the observed financial index.
# simple example 
rm(list=ls())
set.seed(123)
e <- rnorm(1000, mean=0, sd=10)
x <- runif(1000, 0, 10)

beta <- 10
y <- beta*x + e 

dta <- as.data.frame(cbind(y,x))
dta$D <- rep(0, nrow(dta))
dta[dta$y > 60,]$D <- 1

dta <- dta[!(dta$y > 100),]
dta <- dta[!(dta$y < 0),]
dta$bankruptcy <- as.factor(dta$D)
with(dta, plot(x, y, col=bankruptcy))
legend(9,4.3,unique(dta$bankruptcy),col=1:length(dta$bankruptcy),pch=1)

m0 <- lm(y~x, data=dta)
m1 <- lm(D~x, data=dta)

# install.packages("sjPlot")
# install.packages("sjmisc")
# install.packages("sjlabelled")
library(sjPlot)
## Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!

library(sjmisc)
library(sjlabelled)
library(lme4)
## Loading required package: Matrix
sjPlot::tab_model(m0, m1)
  y D
Predictors Estimates CI p Estimates CI p
(Intercept) 2.81 1.59 – 4.04 <0.001 -0.32 -0.36 – -0.28 <0.001
x 9.46 9.25 – 9.68 <0.001 0.14 0.13 – 0.15 <0.001
Observations 923 923
R2 / R2 adjusted 0.891 / 0.891 0.644 / 0.644
plot(D ~ x, cex = 1.3, data=dta, ylim=c(-0.5, 1.5)) +
abline(lm(D ~ x, data=dta), col="blue") 
## integer(0)
legend(0.1,1.5, lty=c(1), col=c("blue"), 
      legend=c("Fitted line"), box.col="white")

par(mfrow=c(1,2))
with(m1, plot(residuals ~ fitted.values, main="Linear probability residuals")) +
  abline(h=0, col="red", lty=2)
## integer(0)
with(m0, plot(residuals ~ fitted.values, main="True latent variable residuals")) +
  abline(h=0, col="red", lty=2)

## integer(0)
#par(mfrow=c(1,2))
#hist(dta$D, main ="Di histogram")
par(mfrow=c(1,1))
par(mar = c(5,5,2,5))
hist(m1$residuals, breaks=30, main = "Residuals histogram", yaxt="n") 
par(new=TRUE)
plot(density(m1$residuals, bw=0.04),main="", axes=F, ylab=NA, xlab=NA, col="blue", lty=2)
axis(side=4)
mtext(side = 4, line = 3, 'prob. density')
legend("topleft", lty=c(1,2), col=c("black", "blue"), 
      legend=c("frequency", "nonparametric 
estimation (Gaussian)"), box.col="white")

We can easily tell that there is heteroscedasticity in the linear probability residuals.

Economic Application

How to derive a discrete choice model using economics?

We consider an example of technology adoption in agriculture. Suppose that a farmer \(i\) has two types of biotechnologies (e.g., conventional seed and genetically engineered seed), and she has to choose either one to be planted in her farmland. There are \(n\) number of farmers in the economy.

For \(i=1, \dots, n\),

  • \(\pi_{1i}=x_{1i}\beta_{1} + \mu_{1i}\): stochastic profit of farm \(i\) adopting “new technology”
  • \(\pi_{0i}=x_{0i}\beta_{0} + \mu_{0i}\): stochastic profit of farm \(i\) maintaing “conventinoal technology”.

Economic decision to adopt “new technology” if \(\pi_{0i} < \pi_{1i}\) can be expressed by \[ \text{Define } D_i= \begin{cases} 1, & \text{if }\pi_{0i}<\pi_{1i} & (P_{i})\text{ i.e. adopting the new technology}\\ 0, & \text{otherwise } \pi_{0i} \ge \pi_{1i} & (1-P_{i}) \text{ i.e. staying with conventional technology} \end{cases} \] \[ \begin{array}{l} Pr[D_{i}=1|x_{1i}, x_{0i}] &=& Pr[\pi_{1i}>\pi_{0i}] \\ &=& Pr[x_{1i}\beta_{1} + \mu_{1i} > x_{0i}\beta_{0} + \mu_{0i}] \\ &=& Pr[x_{1i}\beta_{1} - x_{0i}\beta_{0} > \mu_{0i} - \mu_{1i} ] \end{array} \]

  • Let \(\mu_{0i} - \mu_{1i} \equiv \varepsilon_i\) and \(x_{1i}\beta_{1} - x_{0i}\beta_{0} \equiv x_{i}\beta^*\)
  • \(Pr[\varepsilon_{i}<x_{i}\beta^{*}] = \int^{x_i \beta^*}_{-\infty}f_{\varepsilon}(t)dt = F_{\varepsilon}(x_{i}\beta^*) = P_i\)
  • \(Pr[D_i =0 | x_{1i}, x_{0i}] = 1 - Pr[D_{i}=1|x_{1i}, x_{0i}] = 1- P_i\)

The probability distribution of \(\varepsilon\), \(f_{\varepsilon}()\) and \(F_{\varepsilon}()\), where \(\varepsilon \equiv \mu_{0i} - \mu_{1i}\), is the main part we need to know.

  • Note that, in the case of uniform distribution (min=0, max=1), \(Pr[D_i=1|x_i] = \int^{x_i\beta^*}_{-\infty}f_{\varepsilon}(t)dt=F_{\varepsilon}(x_i\beta^*)=x_i\beta^*\) if \(x_i\beta^* \in [0,1]\), otherwise, 0 if \(x_i\beta^* <0\), and 1 if \(x_i\beta^* >1\).
  • Thus, in this case, LPM is correct economic model.
  • Think about a specific input involved with the technology (e.g., human captial).

    • \(\frac{\partial P_i}{\partial x_j} = \beta_{1j} - \beta_{0j} \quad(x_j\text{ is in both } x_{1i} \text{ and } x_{0i})\)
    • \(\frac{\partial P_i}{\partial x_j} = \beta_{1j} \quad(\text{if }x_j\text{ is only in } x_{1i})\)
    • \(\frac{\partial P_i}{\partial x_j} = -\beta_{0j} \quad(\text{if }x_j\text{ is only in } x_{0i})\)
  • Example: “\(r^0\)”= price of old technology
    if price of conventional technology increases, it will urge farmers adopt the new technology: \(\frac{\partial P_i}{\partial r^0} = -\beta_{0r^0} >0\).

Probit Model

Probit model estimation

In the previous part, we highlighted the importance of specifying the distribution of \(\varepsilon_i\), \(F_{\varepsilon}(x_i\beta^*)\).

  • Recall that \(Pr[D_i=1|x_i] = \int^{x_i\beta^*}_{-\infty}f_{\varepsilon}(t)dt=F_{\varepsilon}(x_i\beta^*)=x_i\beta^*\) if \(x_i\beta^* \in [0,1]\), otherwise, 0 if \(x_i\beta^* <0\), and 1 if \(x_i\beta^* >1\).

Probit model assumes normality: \(\mu_{1i} \sim N(0, \sigma_{1}^2)\) and \(\mu_{2i} \sim N(0, \sigma_{2}^2)\). Sum of two random variables, independently following different normal distributions, results in a normal distribution. Hence, \(\varepsilon_{i} \sim N(0,\sigma_1^2 + \sigma_2^2 - 2\sigma_{12})\) where \(\sigma_{12}\equiv cov(\mu_{1i}, \mu_{2i})\).

  • Probit model can be run through a canned program in many statistical programs.
  • In R, ‘glm’ command helps you to run the probit estimation.
  • The main idea is using the assumed normality, so the estimator should be a maximum likelihood estimator (MLE).

What is MLE?

  • Suppose \(y_i = \beta_1 +\mu_i,\quad \mu_i \sim N(0,\sigma^2)\)
  • Likelihood function: joint pdf for \(y_i\)’s, viewed as a function of \(\beta_1, \sigma^2\)
  • MLE is defined as \(\arg \max \ln L(\beta_1, \sigma^2|y)\)
  • For testing MLE estimates, you do joint type tests using likelihood ratio test statistics, not \(F\).

Brief explanation about the likelihood ratio test

\(\Omega:\; y_i = x_i\beta + \epsilon_i, \quad i=1,\dots,m \quad \epsilon_i\sim N(0,\sigma_{\epsilon})\quad E\epsilon_{i} \epsilon_{\ell}=0,\; i \ne \ell\) \(H_0:\; R\beta = C,\; R\text{ is }(q\times k) \text{ matrix},\;rank(R)=q \le k = length(\beta)\)

  • Say \(L(W)\) is the maximum likelihood of the restricted regression model (\(H_0\)). Similarly, \(L(\Omega)\) is the maximum likelihood of the model \(\Omega\).
  • The likelihood ratio for the two models is defined as \(\Phi = \frac{L(\hat{W})}{L(\hat{\Omega})}\), which should be always less than 1 by structure.
  • Decision: Reject the null if \(\chi^2_0 = -2\ln\Phi>\chi_{q,\alpha}^2\) at \(\alpha \cdot 100\%\) significance level, otherwise accept \(H_0\).

    • Note that \(-2\ln \Phi \sim \chi^2_q\), where \(q=rank(R)\).

Let’s return to \(y=x\beta+\mu\), but \(\mu \sim MVN(0, \sigma^2 I_{n})\)

  • Then the MLE for \(\beta\) is \(\hat{\beta}_{OLS}=(x^{\prime}x)^{-1}x^{\prime}y=\hat{\beta}_{ML}\).
  • Issue of MLE when \(\mu \sim\) MVN.
  • Recall ${ML}^2 = = = {OLS}^2 $ except for \(n\rightarrow \infty\), \(\hat{\mu}= y- x\hat{\beta}_{OLS} = y- x\hat{\beta}_{ML}\).
  • Chioce: in testing \(H_0: R\beta = C_0,\; H_1: negation\). Use \(F\) or \(\chi^2\).
  • Use \(\chi^2\) if \(F\) not apply to
    • e.g. non-linear model:
      \(y_i = \beta_1 + \frac{1}{\beta_2}x_{i3} + \beta^2_{3}x_{i4} + \mu_i\)
      \(y = A\cdot L^{\beta_2}K^{\beta_{3}}\mu \implies \ln y = \beta_1 + \beta_2 \ln L + \beta_3 \ln K + \ln \mu\)

Estimation practice

  • In the below chunk, I am going to show two ways to run the probit model - using ‘gls’ command and manually solve MLE process.
    • For this exercise, I use “titanic” survival records (downloaded from https://stanford.io/2O9RUCF).
    • The purpose of this analysis is to predict the chance of survival conditional on observable characteristics (such as passenger’s class, gender, age, fare).
rm(list=ls())
titanic <- read.csv("https://www.dropbox.com/s/pze3wk43boevzyh/titanic.csv?dl=1")

data <- titanic
# logistic regression (MLE)
titanic$class.f <- factor(data$Pclass)
titanic$sex.f <- factor(data$Sex)


attach(titanic)
native <- glm(Survived ~ class.f + sex.f + Age + Fare, family = binomial(link = "probit"), na.action = na.pass)

summary(native)
## 
## Call:
## glm(formula = Survived ~ class.f + sex.f + Age + Fare, family = binomial(link = "probit"), 
##     na.action = na.pass)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8103  -0.6806  -0.4179   0.6488   2.4720  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.0612500  0.2406347   8.566  < 2e-16 ***
## class.f2    -0.6770830  0.1694124  -3.997 6.42e-05 ***
## class.f3    -1.3648159  0.1665982  -8.192 2.56e-16 ***
## sex.fmale   -1.5237500  0.1050024 -14.512  < 2e-16 ***
## Age         -0.0188588  0.0040956  -4.605 4.13e-06 ***
## Fare         0.0002994  0.0012500   0.240    0.811    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1182.77  on 886  degrees of freedom
## Residual deviance:  803.88  on 881  degrees of freedom
## AIC: 815.88
## 
## Number of Fisher Scoring iterations: 5
class_dummies <- model.matrix(~class.f)
#class_dummies$`(Intercept)`[class_dummies$class.f2==1 | class_dummies$class.f3==1] <- 0
class_dummies <- class_dummies[,2:3]
head(class_dummies) # 2nd, 3rd dummy
##   class.f2 class.f3
## 1        0        1
## 2        0        0
## 3        0        1
## 4        0        0
## 5        0        1
## 6        0        1
sex_dummies <- model.matrix(~sex.f)
sex_dummies <- sex_dummies[,2]
head(sex_dummies) # male dummy
## 1 2 3 4 5 6 
## 1 0 0 0 1 1
dat <- data.matrix(cbind(Survived, 1, class_dummies, sex_dummies, Age, Fare))

init <- (lm(Survived ~ class.f + sex.f + Age + Fare))$coefficients # got a different result
# Other initial value trials
init <- native$coefficients # works
init <- rep(0,6) # worked well

Y <- as.numeric(dat[, 1])
X <- data.matrix(dat[, 2:7])


# requires model matrix `X` and binary response `Y`
probit.nll <- function (beta) {
  # linear predictor
  eta <- X %*% beta
  # probability
  p <- pnorm(eta)
  # negative log-likelihood
  -sum((1 - Y) * log(1 - p) + Y * log(p))
}

probit.gr <- function (beta) {
  # linear predictor
  eta <- X %*% beta
  # probability
  p <- pnorm(eta)
  # chain rule
  u <- dnorm(eta) * (Y - p) / (p * (1 - p))
  # gradient
  -crossprod(X, u) # same as -t(X)%*%u
}


fit <- optim(init, probit.nll, gr = probit.gr, method = "BFGS", hessian = TRUE)

names(fit$par) <- names(native$coefficients)
# install.packages("htmlTable")
# install.packages("magrittr")
library(htmlTable)
## Warning: package 'htmlTable' was built under R version 3.6.3
library(magrittr)


names(native$coefficients) <- c("(intercept)", "2nd class", "3rd class", "male", "age", "fare")
  
matrix(round(c(fit$par, native$coefficients), digits=3),
       nrow=6, ncol=2,
       dimnames = list(c(names(native$coefficients)),
                       c("manual", "glm"))) %>% 
htmlTable
manual glm
(intercept) 2.061 2.061
2nd class -0.677 -0.677
3rd class -1.365 -1.365
male -1.524 -1.524
age -0.019 -0.019
fare 0 0