Monte Carlo for Probit Model

II.Create needed Functions

1.Function for computing log-likelihood (named Probit_LL)

This function returns the (negative) log-likelihood function of the probit model

Recall: Likelihood of probit model is as following: \[ f(y|x_i)= \Phi(x_i\theta)^y(1-\Phi(x_i\theta)^(1-y) \] Thus we have the Log-likelihood of Probit model: \[ logf(y|x_i)= y log \Phi(x_i\theta) + (1-y)log (1-\Phi(x_i\theta)) \]

Probit_LL <- function(y,x,par) {
    Phi = pnorm(x %*% par)
    phi = dnorm(x %*% par)

    n = length(y) 
    k = length(par)

    # Computing the log-likelihood
    f = sum(y*log(Phi)) + sum((1-y)*log(1-Phi))
    f = -f

    return(f)
}

2.Function for computing the Gradient (named Probit_LL_g)

Probit_LL_g.R returns the gradient (derivative) of the (negative) log-likelihood function of the probit model Recall: \[ \partial logf(y|x_i)/\partial \beta = \sum [y_i * \phi(x_i^T\beta) / \Phi(x_i^T\beta) - (1-y_i)*\phi(x_i^T\beta)/ (1-\Phi(x_i^T\beta))]*x_i \] Based on the formula, we write the function Probit_LL_g to compute the gradient as following:

Probit_LL_g <- function (y,x,par) {
    Phi = pnorm(x %*% par) # Phi is Cumulative probability
    phi = dnorm(x %*% par) # phi is Probability Density

    n = length(y)           # sample size
    k = length(par)         # number of coefficients

    g = t(matrix(rep(phi/Phi,k),nrow=n)*x) %*% y - 
        t(matrix(rep(phi/(1-Phi),k),nrow=n)*x) %*% (1-y)
    g = -g

    return(g)
}

III. Monte Carlo Simulation

1. First glimpe on asymptotic concept

We set our true parameters 1, 1, 1. Hence our true model has the form:
At first we set sample size n = 100
We generate our data based on the true parameters, and the error terms is normally distributed, independent across i.

library(numDeriv)   # loads the functions grad and hessian which numerically evaluate the gradient and hessian

## Warning: package 'numDeriv' was built under R version 3.3.2

rm(list=ls())       # Clear workspace

set.seed(1)         # Set seed for random number generator

n = 100         # Set the sample size

theta0 = c(1,1,1)   # True parameter value

k = length(theta0) 

# Data generating process
x = cbind(matrix(1,n,1),matrix(rnorm(2*n,0,1),ncol=2))  # regressors

u = rnorm(n,0,1)                    # error term

y_star = x %*% theta0 + u               # latent "utility"

y = ceiling(y_star/max(abs(y_star)))    # observed outcome (y=1 if y_star >0, otherwise y=0)
    
dat = data.frame(x,y)                   # Create data frame

We will use function Probit_LL.R and Probit_LL_g. Hence, we need to set them into the same directory, then load the two functions.

# Set working directory
setwd("E://STUDY//R//Topic in Econometrics//CodeR")


# Load the log-likelihood, its derivative, and the hessian
source("Probit_LL.R")
source("Probit_LL_g.R")
source("Probit_LL_h.R")

Before estimate the parameters, we check if the gradient function was correctly programmed by comparing it to a numerical approximation of it. The two results should be the same. Indeed:

Probit_LL_g(y,x,theta0)

##            [,1]
## [1,] -3.3350902
## [2,] -0.2611465
## [3,]  5.5465893

grad(function(u) Probit_LL(y,x,u),theta0)

## [1] -3.3350902 -0.2611465  5.5465893

Now everything is ready for Maximum likelihood Estimator. We use optim to estimate the parameters of the model.

result <- optim(par = theta0, Probit_LL, y = y, x = x, gr = Probit_LL_g, 
                method = c("BFGS"), control = list(reltol=1e-9), hessian=TRUE)

To see the estimated parameter:

result$par

## [1] 1.0021022 0.9801432 0.7803194

We can see that the first and the second estimated parameter are quite close to the true ones. However the third one is quite far away. (0.78 is quite different than 1)
We repeat the above process with larger sample size, saying n=1000 and see what happens to estimated parameter.

## [1] 0.9705418 1.0313936 0.8723140

The estimated parameters are now closer to the true values (1,1,1) compared to the case when we have less sample size (n=100)
To capture the asymptotic concept, we use Monte Carlo method as in the following part.

2. Using Monte Carlo to illustrate the Asymptotic concept

Basically we do the same thing as above, with slightly modified version. Instead of generating the data one time, we create a loop around the data generating process and estimation part. When the loop finishes, we will get a vector of 1000 sets of estimated parameters (theta_hat).

library(numDeriv)   # loads the functions grad and hessian which numerically evaluate the gradient and hessian

rm(list=ls())       # Clear workspace

# Set working directory
setwd("E://STUDY//R//Topic in Econometrics//CodeR")

# Load the log-likelihood, its derivative, and the hessian
source("Probit_LL.R")
source("Probit_LL_g.R")
source("Probit_LL_h.R")


set.seed(2)         # Set seed for random number generator

n = 1000        # Set the sample size

theta0 = c(1,1,1)   # True parameter value

k = length(theta0)      # Number of parameters

num = 1000      # Number of Monte Carlo iterations

theta_hat_vec = matrix(0,num,k) # Matrix of 0 with dimension 1000x3

#Create a loop of 1000 iterations
for (it in 1:num) { 
    # Data generating process
    x = cbind(matrix(1,n,1),matrix(rnorm(2*n,0,1),ncol=2))  # regressors

    u = rnorm(n,0,1)                                        # error terms

    y_star = x %*% theta0 + u                       # latent "utility" (unobsered y)

    y = ceiling(y_star/max(abs(y_star)))    # observed outcome (y=1 if y_star>0, otherwise y=0)
    
    dat = data.frame(x,y)                   # Create data frame

    # Compute parameters that optimize the log-likelihood function with given data
    result <- optim(par = theta0, Probit_LL, y = y, x = x, gr = Probit_LL_g, method = c("BFGS"), control = list(reltol=1e-9))

    theta_hat_vec[it,1:k] = result$par #Store computed parameter of one loop into theta_hat_vec
        # When the loop finishes, we will get a vector of 1000 sets of theta_hat
    
}

Plot the histogram of the distribution of the estimator over Monte Carlo iterations

hist(theta_hat_vec[1:num,1])

- The histogram looks like a normal distribution (since we use relatively large sample size, n = 1000). If we set a smaller sample size, saying n = 100, the histogram won’t look like a normal distribution.

According to the CLT, the average of theta_hat is close to the true value. Indeed,

colMeans(theta_hat_vec)

## [1] 1.059279 1.089475 1.099344

Finally, we see the variance of estimated parameter over Monte Carlo interations:

var(theta_hat_vec)

##            [,1]       [,2]       [,3]
## [1,] 0.05860257 0.03592809 0.03491932
## [2,] 0.03592809 0.09123579 0.05179772
## [3,] 0.03491932 0.05179772 0.09194506