Using Monte Carlo to simulate how asymptotic variance works in Probit Model.
This function returns the (negative) log-likelihood function of the probit model
Recall: Likelihood of probit model is as following: \[ f(y|x_i)= \Phi(x_i\theta)^y(1-\Phi(x_i\theta)^(1-y) \] Thus we have the Log-likelihood of Probit model: \[ logf(y|x_i)= y log \Phi(x_i\theta) + (1-y)log (1-\Phi(x_i\theta)) \]
Probit_LL <- function(y,x,par) {
Phi = pnorm(x %*% par)
phi = dnorm(x %*% par)
n = length(y)
k = length(par)
# Computing the log-likelihood
f = sum(y*log(Phi)) + sum((1-y)*log(1-Phi))
f = -f
return(f)
}
Probit_LL_g.R returns the gradient (derivative) of the (negative) log-likelihood function of the probit model Recall: \[ \partial logf(y|x_i)/\partial \beta = \sum [y_i * \phi(x_i^T\beta) / \Phi(x_i^T\beta) - (1-y_i)*\phi(x_i^T\beta)/ (1-\Phi(x_i^T\beta))]*x_i \] Based on the formula, we write the function Probit_LL_g to compute the gradient as following:
Probit_LL_g <- function (y,x,par) {
Phi = pnorm(x %*% par) # Phi is Cumulative probability
phi = dnorm(x %*% par) # phi is Probability Density
n = length(y) # sample size
k = length(par) # number of coefficients
g = t(matrix(rep(phi/Phi,k),nrow=n)*x) %*% y -
t(matrix(rep(phi/(1-Phi),k),nrow=n)*x) %*% (1-y)
g = -g
return(g)
}
We set our true parameters 1, 1, 1. Hence our true model has the form:
We generate our data based on the true parameters, and the error terms is normally distributed, independent across i.
library(numDeriv) # loads the functions grad and hessian which numerically evaluate the gradient and hessian
## Warning: package 'numDeriv' was built under R version 3.3.2
rm(list=ls()) # Clear workspace
set.seed(1) # Set seed for random number generator
n = 100 # Set the sample size
theta0 = c(1,1,1) # True parameter value
k = length(theta0)
# Data generating process
x = cbind(matrix(1,n,1),matrix(rnorm(2*n,0,1),ncol=2)) # regressors
u = rnorm(n,0,1) # error term
y_star = x %*% theta0 + u # latent "utility"
y = ceiling(y_star/max(abs(y_star))) # observed outcome (y=1 if y_star >0, otherwise y=0)
dat = data.frame(x,y) # Create data frame
# Set working directory
setwd("E://STUDY//R//Topic in Econometrics//CodeR")
# Load the log-likelihood, its derivative, and the hessian
source("Probit_LL.R")
source("Probit_LL_g.R")
source("Probit_LL_h.R")
Before estimate the parameters, we check if the gradient function was correctly programmed by comparing it to a numerical approximation of it. The two results should be the same. Indeed:
Probit_LL_g(y,x,theta0)
## [,1]
## [1,] -3.3350902
## [2,] -0.2611465
## [3,] 5.5465893
grad(function(u) Probit_LL(y,x,u),theta0)
## [1] -3.3350902 -0.2611465 5.5465893
result <- optim(par = theta0, Probit_LL, y = y, x = x, gr = Probit_LL_g,
method = c("BFGS"), control = list(reltol=1e-9), hessian=TRUE)
result$par
## [1] 1.0021022 0.9801432 0.7803194
## [1] 0.9705418 1.0313936 0.8723140
Basically we do the same thing as above, with slightly modified version. Instead of generating the data one time, we create a loop around the data generating process and estimation part. When the loop finishes, we will get a vector of 1000 sets of estimated parameters (theta_hat).
library(numDeriv) # loads the functions grad and hessian which numerically evaluate the gradient and hessian
rm(list=ls()) # Clear workspace
# Set working directory
setwd("E://STUDY//R//Topic in Econometrics//CodeR")
# Load the log-likelihood, its derivative, and the hessian
source("Probit_LL.R")
source("Probit_LL_g.R")
source("Probit_LL_h.R")
set.seed(2) # Set seed for random number generator
n = 1000 # Set the sample size
theta0 = c(1,1,1) # True parameter value
k = length(theta0) # Number of parameters
num = 1000 # Number of Monte Carlo iterations
theta_hat_vec = matrix(0,num,k) # Matrix of 0 with dimension 1000x3
#Create a loop of 1000 iterations
for (it in 1:num) {
# Data generating process
x = cbind(matrix(1,n,1),matrix(rnorm(2*n,0,1),ncol=2)) # regressors
u = rnorm(n,0,1) # error terms
y_star = x %*% theta0 + u # latent "utility" (unobsered y)
y = ceiling(y_star/max(abs(y_star))) # observed outcome (y=1 if y_star>0, otherwise y=0)
dat = data.frame(x,y) # Create data frame
# Compute parameters that optimize the log-likelihood function with given data
result <- optim(par = theta0, Probit_LL, y = y, x = x, gr = Probit_LL_g, method = c("BFGS"), control = list(reltol=1e-9))
theta_hat_vec[it,1:k] = result$par #Store computed parameter of one loop into theta_hat_vec
# When the loop finishes, we will get a vector of 1000 sets of theta_hat
}
hist(theta_hat_vec[1:num,1])
- The histogram looks like a normal distribution (since we use relatively large sample size, n = 1000). If we set a smaller sample size, saying n = 100, the histogram won’t look like a normal distribution.
colMeans(theta_hat_vec)
## [1] 1.059279 1.089475 1.099344
var(theta_hat_vec)
## [,1] [,2] [,3]
## [1,] 0.05860257 0.03592809 0.03491932
## [2,] 0.03592809 0.09123579 0.05179772
## [3,] 0.03491932 0.05179772 0.09194506