Maximum Likelihood Estimation (MLE)

Definition: MLE finds parameter values that maximize the probability of observing the data

Key concepts:

The Likelihood function

\[ L(\theta | x) = \prod_{i=1}^n f(x_i | \theta) \]

Log-likelihood function \[ l(\theta | x) = \sum_{i=1}^n log f(x_i | \theta) \]
Maximum likelihood estimators

Why use Log-Likelihood?

Converts products to sums
Numerically more stable
Preserves the maximum
Easier to differentiate
Same parameter estimates as likelihood function

## $true_params
## [1] 2.0 1.5
## 
## $mle_estimates
## [1] 2.024363 1.486704
## 
## $loglik_value
## [1] -1815.564
## 
## $plot

Example 1: Bernoulli Distribution

Write the log-likelihood function for a Bernoulli distribution

Log-likelihood function \[ l(p | y) = \sum_{i=1}^n [ y_i log(p) + (1-y_i)log(1-p) ] \]

First let’s generate some sample data from a Bernoulli distribution. Suppose y is a random variable that only takes on the value of either 1 or 0.

Also suppose that y follows a Bernoulli distribution where \(p=0.7\). In other words, \(y\) is a random variable such that it takes on the value 1 with probability \(p=0.7\) and it takes on the value 0 with probability \(1-p=0.3\).

Below, we generate a sample of \(n=100\) draws of \(y\) using this information about its true distribution:

# Problem 1: Bernoulli Distribution

# Generate sample data
set.seed(123)
n <- 100
true_p <- 0.7
y_bern <- rbinom(n, 1, true_p)

Now that we have our sample, we can use Maximum Likelihood Estimation to estimate the parameter of the distribution of \(y\). In this example, we know what the ACTUAL distribution of \(y\) is. i.e. it is a Bernoulli with \(p=0.7\). But we will get an estimate of p as if we do not know its true value.

We will then compare that estimate \(\hat{p}\) to the true \(p\).

To do this, first write down the log likelihood function for the Bernoulli distribution :

# Log-likelihood function for Bernoulli
bern_loglik <- function(p, data) {
    sum(data * log(p) + (1 - data) * log(1 - p))
}

Choose \(\hat{p}\) to maximize that log likelihood function :

# Find MLE using optimize
bern_mle <- optimize(bern_loglik, 
                    interval = c(0, 1), 
                    data = y_bern,
                    maximum = TRUE)

Example 2: Normal Distribution with Known Variance

We will now find the MLE for the mean of a normal distribution with known variance.

For a normal distribution with known variance \(\sigma^2\), the log-likelihood is:

\[ l(\mu | x,\sigma^2) = -\frac{n}{2} log(2\pi \sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n ( x_i - \mu )^2 \]

First write down the log likelihood function for the normal distribution :

sigma_known <- 2
x_norm <- rnorm(n, mean = 5, sd = sigma_known)

# Log-likelihood function for normal with known variance
norm_known_var_loglik <- function(mu, data, sigma) {
    sum(dnorm(data, mean = mu, sd = sigma, log = TRUE))
}

Choose \(\hat{p}\) to maximize that log likelihood function and plot the likelihood function:

# Find MLE using optimize
norm_mle <- optimize(norm_known_var_loglik, 
                    interval = c(0, 10),
                    data = x_norm,
                    sigma = sigma_known,
                    maximum = TRUE)

Example 3: Compare likelihood surfaces for different sample sizes

n_small <- 20
n_large <- 200

# Generate data
x_small <- rnorm(n_small, mean = 5, sd = 2)
x_large <- rnorm(n_large, mean = 5, sd = 2)

# Calculate log-likelihoods
ll_small <- sapply(mu_grid, norm_known_var_loglik, 
                  data = x_small, sigma = sigma_known)
ll_large <- sapply(mu_grid, norm_known_var_loglik, 
                  data = x_large, sigma = sigma_known)

# Plot comparison
p4 <- ggplot() +
    geom_line(data = data.frame(mu = mu_grid, loglik = ll_small),
              aes(x = mu, y = loglik, color = "n = 20")) +
    geom_line(data = data.frame(mu = mu_grid, loglik = ll_large),
              aes(x = mu, y = loglik, color = "n = 200")) +
    labs(title = "Log-likelihood Functions for Different Sample Sizes",
         x = "\u03bc", y = "Log-likelihood",
         color = "Sample Size") +
    theme_minimal()

# Prepare outputs
list(
    bernoulli_mle = bern_mle$maximum,
    normal_mle = norm_mle$maximum,
    plot_bernoulli = p1,
    plot_normal = p2,
    plot_comparison = p4
)

## $bernoulli_mle
## [1] 0.7100019
## 
## $normal_mle
## [1] 4.892508
## 
## $plot_bernoulli

## 
## $plot_normal

## 
## $plot_comparison

Some Observations:

Larger sample size (n = 200) produces a more peaked likelihood surface
Smaller sample size (n = 20) has a flatter likelihood surface
Both curves are centered around the true parameter value
Demonstrates consistency of MLE as sample size increases

These solutions demonstrate key properties of maximum likelihood estimation:

Consistency: Estimates converge to true parameters

Efficiency: Larger sample sizes lead to more precise estimates

Invariance: The maximum of the log-likelihood occurs at the same point as the likelihood

Computational feasibility: Numerical optimization methods work well in R

The visualizations help understand the shape of the likelihood surface and how it changes with sample size, which is crucial for understanding the behavior of maximum likelihood estimators.