Code Example

The Likelihood Function of a Normal Distribution

Below is the log of a likelihood function coded in R.

normalF <- function(parvec) {
  # Log of likelihood of a normal distribution
  # parvec[1] - mean
  # parvec[2] - standard deviation
  # x - set of observations. Should be initialized before MLE
  sum ( -0.5* log(parvec[2]) - 0.5*(x - parvec[1])^2/parvec[2] )
}

x = c(1,2,3,4) # set of observations
normalF(c(1,1)) # log likelihood function value for given x and mu=sd=1 
## [1] -7

Given a set of observations we can find the parameters \(\mu\) and \(\sigma\) that maximize the value of the likelihood function.

set.seed(1729)
x = rnorm(100,2,4) # a hundred numbers with mean 2 and sd 4
MLE = optim(c(0.1,0.1), # initial values for mu and sigma
        fn = normalF, # function to maximize
        method = "L-BFGS-B", # this method lets set lower bounds
        lower = 0.00001, # lower limit for parameters
        control = list(fnscale = -1), # maximize the function
        hessian = T # calculate Hessian matricce because we will need for confidence intervals
        )
MLE
## $par
## [1]  1.943386 12.206119
## 
## $value
## [1] -175.0967
## 
## $counts
## function gradient 
##       21       21 
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
## 
## $hessian
##               [,1]          [,2]
## [1,] -8.192612e+00  8.419931e-07
## [2,]  8.419931e-07 -3.355920e-01
# This can be also done using the bbmle package
library(bbmle)
m <- mle2(x~dnorm(mean=mu,sd=sd),start=list(mu=0.1,sd=0.1),data=data.frame(x))
m
## 
## Call:
## mle2(minuslogl = x ~ dnorm(mean = mu, sd = sd), start = list(mu = 0.1, 
##     sd = 0.1), data = data.frame(x))
## 
## Coefficients:
##       mu       sd 
## 1.943385 3.493719 
## 
## Log-likelihood: -266.99

Looks like the bbmle package does a better job at maximizing the log-likelihood function than the optim function from the R base.

Confidence intervals for the parameters

Now, that we have parameters \(\mu\) and \(\sigma\) that maximize the log likelihood function and the Hessian matrix we can calculate the confidence intervals for the parameters. For the bbmle package output we can just use the confint function.

confint(m)
##       2.5 %   97.5 %
## mu 1.251950 2.634819
## sd 3.060392 4.040214

To do it manually we use the Hessian matrix.

# find confidence interval. To do that we use the Hessian matrix from the MLE routine
# https://stats.stackexchange.com/questions/27033/in-r-given-an-output-from-optim-with-a-hessian-matrix-how-to-calculate-paramet
hess.inv <- solve(-MLE$hessian)
prop_sigma<-sqrt(diag(hess.inv))
prop_sigma<-diag(as.matrix(prop_sigma))
upper<-MLE$par+1.96*prop_sigma
lower<-MLE$par-1.96*prop_sigma
interval<-data.frame(estimated.value=MLE$par, upper=upper, lower=lower, row.names = c("mu","sd"))
interval # almost exactly the same result as from the bbmle package
##    estimated.value     upper     lower
## mu        1.943386  2.628156  1.258615
## sd       12.206119 12.890889 11.521349

A Bit of Theory Behind MLE of a Normal Distribution

Given a set of points we would like to find parameters of a distribution (\(\mu\) - mean and \(\sigma\) - standard deviation for a Normal Distribution) that maximize the likelihood of observing those points from that distribution.

We start with the Probability Density function (PDF) for the Normal Distribution.

\[ P(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-(x-\mu)^2/(2\sigma^2)} \]

We have the data to which we would like to fit the PDF to, so we know the vector \(x\). We need to find \(\mu\) and \(\sigma\). Since the vector \(x\) is known we will vary the \(\mu\) and \(\sigma\) parameters to maximize the value of the likelihood function \(L(\mu.\sigma)\).

\[ L(\mu,\sigma)= \frac{1}{\sqrt{2\pi}\sigma}e^{-(x-\mu)^2/(2\sigma^2)} \]

We continue to take the log of the likelihood function because it is easier to work with sums rather than products.

\[ \ln[L(\mu,\sigma)] = -\frac{n}{2}\ln(2\pi) - \frac{n}{2}\ln(\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^{n} (x-\mu)^2 \]

The very first term is a constant so we can drop it. Similarly, we can drop the \(n\) from the numerator in the second term. We are left with

\[ \ln[L(\mu,\sigma)] = - \frac{1}{2}\ln(\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^{n} (x-\mu)^2 \]

We code this function in R and use optim to maximize the log likelihood function which you saw in the section above.

References