CLT says that with infinite sample, your estimators will have normal distribution. Yet, you always have a finite sample. The question is which estimation method should you use to get the best estimated parameters? In here we use Monte Carlo method to find out whether using Gradient, Hessian matrix or Expected value of Hessian Matrix will give the best estimated values of the variance of parameters.
J_1 returns estimates of the (inverse) of the asymptotic variance matrix
J_1 <- function(y,x,par) {
Phi = pnorm(x %*% par)
phi = dnorm(x %*% par)
n = length(y)
k = length(par)
g = matrix(rep(y*phi/Phi - (1-y)*phi/(1-Phi),k),nrow=n)*x
f = t(g)%*%g
return(f)
}
Probit_LL_h.R returns the hessian of the (negative) log-likelihood function of the probit model
Probit_LL_h <- function (y,x,par) {
Phi = pnorm(x %*% par)
phi = dnorm(x %*% par)
n = length(y)
k = length(par)
# Computing the Hessian (second partial derivative)
h = t(x)%*% (matrix(rep((1-y)*(phi/(1-Phi)*(x%*%par)-(phi^2)/(1-Phi)^2),k),nrow=n)*x) +
t(x)%*% (matrix(rep((y)*(phi/Phi*(-x%*%par)-(phi^2)/Phi^2),k),nrow=n)*x)
h = -h
return(h)
}
J_3 returns estimates of the (inverse) of the asymptotic variance matrix
J_3 <- function(y,x,par) {
Phi = pnorm(x %*% par)
phi = dnorm(x %*% par)
n = length(y)
k = length(par)
f = t(x)%*%(matrix(rep(phi/Phi*phi/(1-Phi),k),nrow=n)*x)
return(f)
}
Probit_LL returns the (negative) log-likelihood function of the probit model
Probit_LL <- function(y,x,par) {
Phi = pnorm(x %*% par)
phi = dnorm(x %*% par)
n = length(y)
k = length(par)
# Computing the log-likelihood
f = sum(y*log(Phi)) + sum((1-y)*log(1-Phi))
f = -f
return(f)
}
Probit_LL_g returns the gradient of the (negative) log-likelihood function of the probit model
Probit_LL_g <- function (y,x,par) {
Phi = pnorm(x %*% par) # Phi is Cumulative probability
phi = dnorm(x %*% par) # phi is Proability Density
n = length(y) # sample size
k = length(par) # number of coefficients
# Computing the gradient
# (What is its formular??? Where is it from?) Is it from derivative wrt theta0 of
# the function of the density of yi given xi - equation 13.6 Wooldridge
# Yes!
g = t(matrix(rep(phi/Phi,k),nrow=n)*x) %*% y -
t(matrix(rep(phi/(1-Phi),k),nrow=n)*x) %*% (1-y)
g = -g
return(g)
}
We build a loop around the data generating process and estimation part and computes the average estimate of the three different variances matrices based on J1, J2, J3, which are coded in separate functions, Probit_LL_h, J_1, and J_3.
library(numDeriv) # loads the functions grad and hessian which numerically evaluate the gradient and hessian
## Warning: package 'numDeriv' was built under R version 3.3.2
rm(list=ls()) # Clear workspace
# Load the log-likelihood, its derivative, and the hessian
source("Probit_LL.R")
source("Probit_LL_g.R")
source("Probit_LL_h.R")
source("J_1.R")
source("J_3.R")
set.seed(2) # Set seed for random number generator
n = 300 # Set the sample size
theta0 = c(1,1,1) # True parameter value
k = length(theta0) # Number of parameters
num = 1000 # Number of Monte Carlo iterations
theta_hat_vec = matrix(0,num,k) # Matrix of 0 with dimension 1000x3
J_1_inv = matrix(0,k,k) #Matrix of 0 with dimension 3x3
J_2_inv = matrix(0,k,k) #Matrix of 0 with dimension 3x3
J_3_inv = matrix(0,k,k) #Matrix of 0 with dimension 3x3
Monte Carlo loop:
#Create a loop of 1000 iterations
for (it in 1:num) {
# Data generating process
x = cbind(matrix(1,n,1),matrix(rnorm(2*n,0,1),ncol=2)) # regressors
u = rnorm(n,0,1) # error terms
y_star = x %*% theta0 + u # latent "utility" (unobsered y)
y = ceiling(y_star/max(abs(y_star))) # observed outcome (y=1 if y_star>0, otherwise y=0)
dat = data.frame(x,y) # Create data frame
# Compute parameters that optimize the log-likelihood function with given data
result <- optim(par = theta0, Probit_LL, y = y, x = x, gr = Probit_LL_g, method = c("BFGS"), control = list(reltol=1e-9))
theta_hat_vec[it,1:k] = result$par #Store computed parameter of one loop into theta_hat_vec
# When the loop finishes, we will get a vector of 1000 sets of theta_hat
J_1_inv = J_1_inv + solve(J_1(y,x,result$par))
J_2_inv = J_2_inv + solve(Probit_LL_h(y,x,result$par))
J_3_inv = J_1_inv + solve(J_1(y,x,result$par))
}
Variance of theta_hat over Monte Carlo iterations
var(theta_hat_vec)
## [,1] [,2] [,3]
## [1,] 0.01758578 0.01102783 0.01101053
## [2,] 0.01102783 0.02116617 0.01212997
## [3,] 0.01101053 0.01212997 0.02279791
Average of variance estimate based on J_1
J_1_inv/num
## [,1] [,2] [,3]
## [1,] 0.016575690 0.009627735 0.009638856
## [2,] 0.009627735 0.020458742 0.009568992
## [3,] 0.009638856 0.009568992 0.020447097
Average of variance estimate based on J_2
J_2_inv/num
## [,1] [,2] [,3]
## [1,] 0.015767336 0.008933802 0.008978805
## [2,] 0.008933802 0.019363556 0.009129810
## [3,] 0.008978805 0.009129810 0.019434550
Average of variance estimate based on J_3
J_3_inv/num
## [,1] [,2] [,3]
## [1,] 0.016587160 0.009633201 0.009643042
## [2,] 0.009633201 0.020470625 0.009572925
## [3,] 0.009643042 0.009572925 0.020459840
In the case of sample size of 300, we can see the estimator J3 returns the best estimated variance (i.e. closest to variance got from Monte Carlo iteraton).
We can do the exact thing but instead of sample size 300, we change to sample size of 1000. In this case we’ll see the J1 now is the best estimator.