\(D_{i} = x_{i}\beta + \mu_{i}\), \(E\mu_{i}=0,\; E\mu_{i} \mu_{\ell}=0,\, i\ne\ell\)
where \(D_i= \begin{cases} 1, & \text{if "state A" occurs with probability } p_i\\ 0, & \text{otherwise with probability }1-p_i \end{cases}\)
Consider a simple regression with the discrete dependent variable; \[ \begin{array}{l} D_i & = & \beta_1 + \beta_2 x_{i2} + \mu_i\\ \hat{D}_i & = & \hat{\beta}_1 + \hat{\beta}_2 x_{i2} \end{array}\]
\(\frac{d \hat{D}_{i}}{d x_{i2}} = \hat{\beta}_{2}\): change of probability of \(D_i\) due to 1 unit increase of \(x_i\).
What if your predicted values are out of [0,1] range?
Solution to \(\hat{p}_{i}<0\) or \(\hat{p}_{i}>0\):
Let us think \[y^*_{i} = x_{i}\beta + \mu_i\] , where \(y^*_{i}\) is the true probability of bankruptcy, \(x_i\) is an observed index for financial distress (from 0 to 10). \(\mu_i\) follows i.i.d of 0 mean and 10 standard deviation.
# simple example
rm(list=ls())
set.seed(123)
e <- rnorm(1000, mean=0, sd=10)
x <- runif(1000, 0, 10)
beta <- 10
y <- beta*x + e
dta <- as.data.frame(cbind(y,x))
dta$D <- rep(0, nrow(dta))
dta[dta$y > 60,]$D <- 1
dta <- dta[!(dta$y > 100),]
dta <- dta[!(dta$y < 0),]
dta$bankruptcy <- as.factor(dta$D)
with(dta, plot(x, y, col=bankruptcy))
legend(9,4.3,unique(dta$bankruptcy),col=1:length(dta$bankruptcy),pch=1)
m0 <- lm(y~x, data=dta)
m1 <- lm(D~x, data=dta)
# install.packages("sjPlot")
# install.packages("sjmisc")
# install.packages("sjlabelled")
library(sjPlot)
## Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!
library(sjmisc)
library(sjlabelled)
library(lme4)
## Loading required package: Matrix
sjPlot::tab_model(m0, m1)
| y | D | |||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p |
| (Intercept) | 2.81 | 1.59 – 4.04 | <0.001 | -0.32 | -0.36 – -0.28 | <0.001 |
| x | 9.46 | 9.25 – 9.68 | <0.001 | 0.14 | 0.13 – 0.15 | <0.001 |
| Observations | 923 | 923 | ||||
| R2 / R2 adjusted | 0.891 / 0.891 | 0.644 / 0.644 | ||||
plot(D ~ x, cex = 1.3, data=dta, ylim=c(-0.5, 1.5)) +
abline(lm(D ~ x, data=dta), col="blue")
## integer(0)
legend(0.1,1.5, lty=c(1), col=c("blue"),
legend=c("Fitted line"), box.col="white")
par(mfrow=c(1,2))
with(m1, plot(residuals ~ fitted.values, main="Linear probability residuals")) +
abline(h=0, col="red", lty=2)
## integer(0)
with(m0, plot(residuals ~ fitted.values, main="True latent variable residuals")) +
abline(h=0, col="red", lty=2)
## integer(0)
#par(mfrow=c(1,2))
#hist(dta$D, main ="Di histogram")
par(mfrow=c(1,1))
par(mar = c(5,5,2,5))
hist(m1$residuals, breaks=30, main = "Residuals histogram", yaxt="n")
par(new=TRUE)
plot(density(m1$residuals, bw=0.04),main="", axes=F, ylab=NA, xlab=NA, col="blue", lty=2)
axis(side=4)
mtext(side = 4, line = 3, 'prob. density')
legend("topleft", lty=c(1,2), col=c("black", "blue"),
legend=c("frequency", "nonparametric
estimation (Gaussian)"), box.col="white")
We can easily tell that there is heteroscedasticity in the linear probability residuals.
We consider an example of technology adoption in agriculture. Suppose that a farmer \(i\) has two types of biotechnologies (e.g., conventional seed and genetically engineered seed), and she has to choose either one to be planted in her farmland. There are \(n\) number of farmers in the economy.
For \(i=1, \dots, n\),
Economic decision to adopt “new technology” if \(\pi_{0i} < \pi_{1i}\) can be expressed by \[ \text{Define } D_i= \begin{cases} 1, & \text{if }\pi_{0i}<\pi_{1i} & (P_{i})\text{ i.e. adopting the new technology}\\ 0, & \text{otherwise } \pi_{0i} \ge \pi_{1i} & (1-P_{i}) \text{ i.e. staying with conventional technology} \end{cases} \] \[ \begin{array}{l} Pr[D_{i}=1|x_{1i}, x_{0i}] &=& Pr[\pi_{1i}>\pi_{0i}] \\ &=& Pr[x_{1i}\beta_{1} + \mu_{1i} > x_{0i}\beta_{0} + \mu_{0i}] \\ &=& Pr[x_{1i}\beta_{1} - x_{0i}\beta_{0} > \mu_{0i} - \mu_{1i} ] \end{array} \]
The probability distribution of \(\varepsilon\), \(f_{\varepsilon}()\) and \(F_{\varepsilon}()\), where \(\varepsilon \equiv \mu_{0i} - \mu_{1i}\), is the main part we need to know.
Think about a specific input involved with the technology (e.g., human captial).
Example: “\(r^0\)”= price of old technology
if price of conventional technology increases, it will urge farmers adopt the new technology: \(\frac{\partial P_i}{\partial r^0} = -\beta_{0r^0} >0\).
In the previous part, we highlighted the importance of specifying the distribution of \(\varepsilon_i\), \(F_{\varepsilon}(x_i\beta^*)\).
Probit model assumes normality: \(\mu_{1i} \sim N(0, \sigma_{1}^2)\) and \(\mu_{2i} \sim N(0, \sigma_{2}^2)\). Sum of two random variables, independently following different normal distributions, results in a normal distribution. Hence, \(\varepsilon_{i} \sim N(0,\sigma_1^2 + \sigma_2^2 - 2\sigma_{12})\) where \(\sigma_{12}\equiv cov(\mu_{1i}, \mu_{2i})\).
\(\Omega:\; y_i = x_i\beta + \epsilon_i, \quad i=1,\dots,m \quad \epsilon_i\sim N(0,\sigma_{\epsilon})\quad E\epsilon_{i} \epsilon_{\ell}=0,\; i \ne \ell\) \(H_0:\; R\beta = C,\; R\text{ is }(q\times k) \text{ matrix},\;rank(R)=q \le k = length(\beta)\)
Decision: Reject the null if \(\chi^2_0 = -2\ln\Phi>\chi_{q,\alpha}^2\) at \(\alpha \cdot 100\%\) significance level, otherwise accept \(H_0\).
Let’s return to \(y=x\beta+\mu\), but \(\mu \sim MVN(0, \sigma^2 I_{n})\)
rm(list=ls())
titanic <- read.csv("https://www.dropbox.com/s/pze3wk43boevzyh/titanic.csv?dl=1")
data <- titanic
# logistic regression (MLE)
titanic$class.f <- factor(data$Pclass)
titanic$sex.f <- factor(data$Sex)
attach(titanic)
native <- glm(Survived ~ class.f + sex.f + Age + Fare, family = binomial(link = "probit"), na.action = na.pass)
summary(native)
##
## Call:
## glm(formula = Survived ~ class.f + sex.f + Age + Fare, family = binomial(link = "probit"),
## na.action = na.pass)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8103 -0.6806 -0.4179 0.6488 2.4720
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.0612500 0.2406347 8.566 < 2e-16 ***
## class.f2 -0.6770830 0.1694124 -3.997 6.42e-05 ***
## class.f3 -1.3648159 0.1665982 -8.192 2.56e-16 ***
## sex.fmale -1.5237500 0.1050024 -14.512 < 2e-16 ***
## Age -0.0188588 0.0040956 -4.605 4.13e-06 ***
## Fare 0.0002994 0.0012500 0.240 0.811
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1182.77 on 886 degrees of freedom
## Residual deviance: 803.88 on 881 degrees of freedom
## AIC: 815.88
##
## Number of Fisher Scoring iterations: 5
class_dummies <- model.matrix(~class.f)
#class_dummies$`(Intercept)`[class_dummies$class.f2==1 | class_dummies$class.f3==1] <- 0
class_dummies <- class_dummies[,2:3]
head(class_dummies) # 2nd, 3rd dummy
## class.f2 class.f3
## 1 0 1
## 2 0 0
## 3 0 1
## 4 0 0
## 5 0 1
## 6 0 1
sex_dummies <- model.matrix(~sex.f)
sex_dummies <- sex_dummies[,2]
head(sex_dummies) # male dummy
## 1 2 3 4 5 6
## 1 0 0 0 1 1
dat <- data.matrix(cbind(Survived, 1, class_dummies, sex_dummies, Age, Fare))
init <- (lm(Survived ~ class.f + sex.f + Age + Fare))$coefficients # got a different result
# Other initial value trials
init <- native$coefficients # works
init <- rep(0,6) # worked well
Y <- as.numeric(dat[, 1])
X <- data.matrix(dat[, 2:7])
# requires model matrix `X` and binary response `Y`
probit.nll <- function (beta) {
# linear predictor
eta <- X %*% beta
# probability
p <- pnorm(eta)
# negative log-likelihood
-sum((1 - Y) * log(1 - p) + Y * log(p))
}
probit.gr <- function (beta) {
# linear predictor
eta <- X %*% beta
# probability
p <- pnorm(eta)
# chain rule
u <- dnorm(eta) * (Y - p) / (p * (1 - p))
# gradient
-crossprod(X, u) # same as -t(X)%*%u
}
fit <- optim(init, probit.nll, gr = probit.gr, method = "BFGS", hessian = TRUE)
names(fit$par) <- names(native$coefficients)
# install.packages("htmlTable")
# install.packages("magrittr")
library(htmlTable)
## Warning: package 'htmlTable' was built under R version 3.6.3
library(magrittr)
names(native$coefficients) <- c("(intercept)", "2nd class", "3rd class", "male", "age", "fare")
matrix(round(c(fit$par, native$coefficients), digits=3),
nrow=6, ncol=2,
dimnames = list(c(names(native$coefficients)),
c("manual", "glm"))) %>%
htmlTable
| manual | glm | |
|---|---|---|
| (intercept) | 2.061 | 2.061 |
| 2nd class | -0.677 | -0.677 |
| 3rd class | -1.365 | -1.365 |
| male | -1.524 | -1.524 |
| age | -0.019 | -0.019 |
| fare | 0 | 0 |