In this section, I will start to explore the MLE, with an emphasis on understanding its statistical propertities. At the same time, I wish to dig a bit on how R estimate based on MLE, which should be helpful to R users. To begin simply, I use the MLE to estimate parameters of a random variable which follows normal distribution.
n <- 1000
x <- rnorm(n,2,3) # with mean = 2, sd = 3. (true parameter)
The essential part of MLE is to specify the likelihood function. In R, you can easily use “dnorm” to obtain the density, and specify “log = TRUE”. Then, the objective to minimize the negative sum of log likelihood function, which is equivalent to maximize the positive sum.
LL <- function(beta, sigma){
R = dnorm(x, beta, sigma, log = TRUE)
-sum(R)
}
Following that, the parameters to be estimated can be passed to the “mle2” function, available in package “bbmle”, which uses the optimization technique to find the solution.
library(bbmle)
## Warning: package 'bbmle' was built under R version 3.3.1
## Loading required package: stats4
fit_norm <- mle2(LL, start = list(beta = 0, sigma = 1), lower = c(-Inf, 0), upper = c(Inf, Inf), method = 'L-BFGS-B')
## Warning in fix_order(call$lower, "lower bounds", -Inf): lower bounds not
## named: rearranging to match 'start'
summary(fit_norm)
## Maximum likelihood estimation
##
## Call:
## mle2(minuslogl = LL, start = list(beta = 0, sigma = 1), method = "L-BFGS-B",
## lower = c(-Inf, 0), upper = c(Inf, Inf))
##
## Coefficients:
## Estimate Std. Error z value Pr(z)
## beta 1.938032 0.088086 22.002 < 2.2e-16 ***
## sigma 2.785509 0.062286 44.721 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## -2 log L: 4886.738
The whole estimation is intuitive and straightforward. In Greene’s “Econometric Analysis”, three alternatives to calculate the variance or covariance are proposed. Which one is used in the R package? The answer is that it actually use the empirical hessian matrix.
diag(vcov(fit_norm)) # default way to calculate covariance in the package.
## beta sigma
## 0.007759060 0.003879531
diag(solve(fit_norm@details$hessian)) # check the covariance using hessian package.
## [1] 0.007759060 0.003879531
The reason is that the optimization technique makes use of the hessian information. It would be convenient to calculate the covariance using it. However, which one could be the most efficient one? To answer the question, I calculate the information matrix and BHHH estimator, respectively. (see Greene’s “Econometric Analysis” 6 edition, page 493 and 495, for details)
cc <- coef(fit_norm)
sigma_hat <- cc['sigma']
info_cov <- c((sigma_hat^2)/n, (2*sigma_hat^4)/n) # information matrix
BHHH <- diag(2)
diag(BHHH) <- c(-n/sigma_hat^2, n/(2*sigma_hat^4) - sum((x - cc['beta'])^2)/sigma_hat^(6)) # BHHH
info_cov
## sigma sigma
## 0.00775906 0.12040604
diag(-solve(BHHH))
## [1] 0.00775906 0.12040608
In this case, we see that the empirical hessian estimator is more efficient in estimating sigma. While the information matrix estimator and BHHH estimator are almost identical. The intuitive reason for the first part of the arguement above is that the hessian matrix has the information about second-order derivatives embedded, which naturally bring more accuracy in estimation. For the second half, the information matrix estimator is a analogue of the BHHH estimator, which should be very close to each other, asymptotically, based on law of large numbers.