Definition: MLE finds parameter values that maximize the probability of observing the data
Key concepts:
\[ L(\theta | x) = \prod_{i=1}^n f(x_i | \theta) \]
Log-likelihood function \[ l(\theta | x) = \sum_{i=1}^n log f(x_i | \theta) \]
Maximum likelihood estimators
Why use Log-Likelihood?
Converts products to sums
Numerically more stable
Preserves the maximum
Easier to differentiate
Same parameter estimates as likelihood function
## $true_params
## [1] 2.0 1.5
##
## $mle_estimates
## [1] 2.024363 1.486704
##
## $loglik_value
## [1] -1815.564
##
## $plot
Write the log-likelihood function for a Bernoulli distribution
First let’s generate some sample data from a Bernoulli distribution. Suppose y is a random variable that only takes on the value of either 1 or 0.
Also suppose that y follows a Bernoulli distribution where \(p=0.7\). In other words, \(y\) is a random variable such that it takes on the value 1 with probability \(p=0.7\) and it takes on the value 0 with probability \(1-p=0.3\).
Below, we generate a sample of \(n=100\) draws of \(y\) using this information about its true distribution:
# Problem 1: Bernoulli Distribution
# Generate sample data
set.seed(123)
n <- 100
true_p <- 0.7
y_bern <- rbinom(n, 1, true_p)Now that we have our sample, we can use Maximum Likelihood Estimation to estimate the parameter of the distribution of \(y\). In this example, we know what the ACTUAL distribution of \(y\) is. i.e. it is a Bernoulli with \(p=0.7\). But we will get an estimate of p as if we do not know its true value.
We will then compare that estimate \(\hat{p}\) to the true \(p\).
To do this, first write down the log likelihood function for the Bernoulli distribution :
# Log-likelihood function for Bernoulli
bern_loglik <- function(p, data) {
sum(data * log(p) + (1 - data) * log(1 - p))
}Choose \(\hat{p}\) to maximize that log likelihood function :
We will now find the MLE for the mean of a normal distribution with known variance.
For a normal distribution with known variance \(\sigma^2\), the log-likelihood is:
\[ l(\mu | x,\sigma^2) = -\frac{n}{2} log(2\pi \sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n ( x_i - \mu )^2 \]
First write down the log likelihood function for the normal distribution :
sigma_known <- 2
x_norm <- rnorm(n, mean = 5, sd = sigma_known)
# Log-likelihood function for normal with known variance
norm_known_var_loglik <- function(mu, data, sigma) {
sum(dnorm(data, mean = mu, sd = sigma, log = TRUE))
}Choose \(\hat{p}\) to maximize that log likelihood function and plot the likelihood function:
n_small <- 20
n_large <- 200
# Generate data
x_small <- rnorm(n_small, mean = 5, sd = 2)
x_large <- rnorm(n_large, mean = 5, sd = 2)
# Calculate log-likelihoods
ll_small <- sapply(mu_grid, norm_known_var_loglik,
data = x_small, sigma = sigma_known)
ll_large <- sapply(mu_grid, norm_known_var_loglik,
data = x_large, sigma = sigma_known)
# Plot comparison
p4 <- ggplot() +
geom_line(data = data.frame(mu = mu_grid, loglik = ll_small),
aes(x = mu, y = loglik, color = "n = 20")) +
geom_line(data = data.frame(mu = mu_grid, loglik = ll_large),
aes(x = mu, y = loglik, color = "n = 200")) +
labs(title = "Log-likelihood Functions for Different Sample Sizes",
x = "\u03bc", y = "Log-likelihood",
color = "Sample Size") +
theme_minimal()
# Prepare outputs
list(
bernoulli_mle = bern_mle$maximum,
normal_mle = norm_mle$maximum,
plot_bernoulli = p1,
plot_normal = p2,
plot_comparison = p4
)## $bernoulli_mle
## [1] 0.7100019
##
## $normal_mle
## [1] 4.892508
##
## $plot_bernoulli
##
## $plot_normal
##
## $plot_comparison
Some Observations:
Larger sample size (n = 200) produces a more peaked likelihood surface
Smaller sample size (n = 20) has a flatter likelihood surface
Both curves are centered around the true parameter value
Demonstrates consistency of MLE as sample size increases
These solutions demonstrate key properties of maximum likelihood estimation:
Consistency: Estimates converge to true parameters
Efficiency: Larger sample sizes lead to more precise estimates
Invariance: The maximum of the log-likelihood occurs at the same point as the likelihood
Computational feasibility: Numerical optimization methods work well in R
The visualizations help understand the shape of the likelihood surface and how it changes with sample size, which is crucial for understanding the behavior of maximum likelihood estimators.