In this lab we will explore maximum likelihood estimation (MLE), which is a flexible and powerful tool for estimating the unknown parameters of a distribution.
The primary goals of this lab are to:
Recall that we typically use the Poisson\((\lambda)\) distribution to model the number of events that take place in a given area or in a given time period. For example, we might use it to model the number of cell phone towers in each county, or we might use it to model the number of earthquakes per year in a given location.
As we have seen, the MLE for the rate parameter \(\lambda\) is given by
\[ \hat{\lambda} = \frac{\sum\nolimits_{i=1}^{n}x_i}{n}, \] where \(x_1,\dots,x_n\) is our observed data.
So far, we have stated the formula for the MLE, but let’s see why it makes sense by looking at the log-likelihood. If \(x_1,\dots,x_n\) are observations from a Poisson\((\lambda\)) distribution, then the log-likelihood is given by \[ \ell(\lambda) = \log(\lambda) \left( \sum\limits_{i=1}^{n} x_i \right) - n \lambda + C, \] where \(C\) is a constant that does not depend on \(\lambda\). The log-likelihood measures how plausible different values of \(\lambda\) are given the observed data. In other words, it tells us for which value of \(\lambda\) the observed data would have been most likely.
Below, we’ll plot the log-likelihood function for several small example data sets to see how it changes depending on the observed counts.
p1, contains counts with small
values.p2, contains moderate counts.p3, contains large counts.# Example data sets (6 points each, covering small, medium, and large counts)
p1 <- c(0, 0, 1, 1, 0, 2) # small counts → small lambda
p2 <- c(3, 4, 5, 4, 6, 5) # moderate counts → moderate lambda
p3 <- c(8, 9, 10, 7, 11, 12) # large counts → large lambda
# Log-likelihood function for Poisson(lambda)
loglik_pois <- function(lambda, x) {
sum(x) * log(lambda) - length(x) * lambda
}
# Range of lambda values to evaluate
lambda_vals <- seq(0.01, 15, length.out = 400)
# Compute log-likelihoods
ll1 <- sapply(lambda_vals, loglik_pois, x = p1)
ll2 <- sapply(lambda_vals, loglik_pois, x = p2)
ll3 <- sapply(lambda_vals, loglik_pois, x = p3)
# Plot all three curves
plot(lambda_vals, ll1, type = "l", lwd = 2, col = "steelblue",
ylim = range(c(ll1, ll2, ll3)),
xlab = expression(lambda),
ylab = "Log-Likelihood",
main = "Log-Likelihood Functions for Different Data Sets")
lines(lambda_vals, ll2, col = "darkorange", lwd = 2, lty = 2)
lines(lambda_vals, ll3, col = "seagreen", lwd = 2, lty = 3)
legend("topright",
legend = c("p1 = small counts", "p2 = moderate counts", "p3 = large counts"),
col = c("steelblue", "darkorange", "seagreen"),
lwd = 2, lty = 1:3, bty = "n")Each curve shows how the log-likelihood depends on the observed data:
The MLE for each data set is simply the value of \(\lambda\) (on the x-axis) where the corresponding log-likelihood function is maximized.
The paper at https://www.actuaries.org.uk/system/files/documents/pdf/0481.pdf describes a study about the bombing of London during the second world war. Each data point describes the number of bombs dropped into a particular 1/4 square kilometre region of London.
The data set is manually created in the code chunk below.
We will assume that the number of bombs dropped in each region
follows a Poisson\((\lambda)\)
distribution, where \(\lambda\) is the
unknown parameter describing the average number of bombs dropped
in each area. Recall that the MLE for \(\lambda\) is simply the sample average,
which we calculate and save as lambda_hat below.
## [1] 0.9288194
By estimating \(\lambda\), we have effectively fitted a model for the number of bombs per region. Our estimate \(\hat{\lambda} \approx 0.93\) means we are modeling the counts as coming from a Poisson(0.93) distribution. That is, based on the data, if \(X\) denotes the number of bombs dropped in a particular area, then we would estimate the pmf of \(X\) as \[ \mathbb{P}(X = q) = e^{-0.93} \frac{(0.93)^q}{q!} , \quad q = 0,1,2,\dots. \]
We can visualize how well this fitted model aligns with the observed data by overlaying the fitted model pmf on top of the histogram of observed proportions.
# Histogram of observed counts
hist(london, breaks = seq(-0.5, max(london)+0.5, by = 1), freq = FALSE,
main = "Histogram of Bomb Counts with Fitted Poisson(0.93)",
xlab = "Bomb Count", border = "gray50", col = "lightgray")
# Overlay the fitted Poisson PMF
x_vals <- 0:max(london)
y_vals <- dpois(x_vals, lambda_hat)
points(x_vals, y_vals, col = "blue", pch = 19, type = "b")
legend("topright", legend = "fitted",
col = "blue", pch = 19, lty = 1, bty = "n")Exercise: Does the model seem to fit the data well? Briefly explain. Yes, the model fits the data well. The fitted Poisson distribution with λ≈0.93. λ≈0.93 closely matches the observed proportions, especially for the most common counts (0 and 1). The probabilities decrease in a similar way for higher counts, indicating that the Poisson model is a good approximation for this data.
Next, we will compare the predicted probabilities from our fitted model with the empirical proportions from the data.
# Counts to evaluate
q_vals <- 0:6
# Model-based probabilities
model_probs <- dpois(q_vals, lambda = lambda_hat)
# Empirical proportions
empirical_props <- sapply(q_vals, function(q) mean(london == q))
# Combine into a data frame
comparison <- data.frame(
q = q_vals,
model_prob = model_probs,
empirical_prop = empirical_props,
difference = empirical_props - model_probs
)
comparison## q model_prob empirical_prop difference
## 1 0 0.3950197780 0.397569444 2.549666e-03
## 2 1 0.3669020507 0.366319444 -5.826063e-04
## 3 2 0.1703928795 0.161458333 -8.934546e-03
## 4 3 0.0527547399 0.060763889 8.009149e-03
## 5 4 0.0122499070 0.012152778 -9.712927e-05
## 6 5 0.0022755904 0.001736111 -5.394793e-04
## 7 6 0.0003522688 0.000000000 -3.522688e-04
Each row of this table shows:
q: number of bombs in a region,model_prob: predicted probability under our fitted
Poisson model.empirical_prop: observed proportion from the data,difference: the deviation between the two.Now, visualize the comparison.
plot(comparison$q, comparison$model_prob, type = "b", pch = 19, ylim = c(0, 0.4),
xlab = "Bomb Count (q)", ylab = "Probability",
main = "Empirical vs. Model-Based Probabilities")
lines(comparison$q, comparison$empirical_prop, type = "b", pch = 19, lty = 2)
legend("topright", legend = c("Model (Poisson)", "Empirical Data"),
lty = c(1, 2), pch = 19, bty = "n")Exercise: Do the fitted probabilities appear to be close to the proportions observed in the data? Briefly explain. Yes, the fitted probabilities are close to the observed proportions. The model aligns especially well for the most common counts (0 and 1), and while there are small differences for higher counts, the overall pattern is very similar. This suggests the Poisson model provides a good fit to the data. —
Recall that the exponential distribution is often used to model the time between events. For example, we might use it to model the time between customer arrivals at a store or the time between eruptions of a geyser. For an exponential random variable \(X \sim \text{Exponential}(\lambda)\), the pdf is \[ f(x) = \lambda e^{-\lambda x} , \; x \ge 0, \] where \(\lambda > 0\) is the rate parameter.
Recall that the MLE for \(\lambda\) is given by \[ \hat{\lambda} = \frac{n}{\sum\limits_{i=1}^{n} x_i}. \]
So far, we have estimated the rate parameter \(\lambda\) using data and the MLE formula. But what does that formula really tell us? To understand this, let’s look at the log-likelihood function for the Exponential\((\lambda)\) model.
If we observe data \(x_1, x_2, \dots, x_n\) (and we model that data as coming from some exponential distribution with an unknown rate parameter \(\lambda\)), then the log-likelihood of the observed data is given by
\[ \ell(\lambda) = \log L(\lambda) = n \log(\lambda) - \lambda \sum_{i=1}^{n} x_i. \]
The log-likelihood measures how plausible different values of \(\lambda\) are given the observed
data.
In other words, it tells us for which value of \(\lambda\) the observed data would have been
most likely.
Below, we’ll plot the log-likelihood function for several small example data sets to see how it changes depending on the observed values.
d1, contains the
values 0.3, 0.30, 0.25, 0.18, 0.31, 0.25.d2, contains the
values 2.5, 3.0, 3.2, 2.8, 3.5, 2.9.d3, contains the
values 5, 7, 8, 3.9, 8, 6.1.Since there are three different data sets, we will plot three different log-likelihood functions (each one corresponding to a different data set).
# Example data sets (6 points each, spanning different time scales)
d1 <- c(0.3, 0.30, 0.25, 0.18, 0.31, 0.25) # very short times → large lambda
d2 <- c(2.5, 3.0, 3.2, 2.8, 3.5, 2.9) # moderate times → moderate lambda
d3 <- c(5, 7, 8, 3.9, 8, 6.1) # long times → small lambda
# Log-likelihood for Exponential(lambda)
loglik_exp <- function(lambda, x) {
n <- length(x)
n * log(lambda) - lambda * sum(x)
}
# Range of lambda values to evaluate
lambda_vals <- seq(0.01, 5, length.out = 400)
# Compute log-likelihood curves
ll1 <- sapply(lambda_vals, loglik_exp, x = d1)
ll2 <- sapply(lambda_vals, loglik_exp, x = d2)
ll3 <- sapply(lambda_vals, loglik_exp, x = d3)
# Plot all three curves
plot(lambda_vals, ll1, type = "l", lwd = 2, col = "steelblue",
ylim = range(c(ll1, ll2, ll3)),
xlab = expression(lambda),
ylab = "Log-Likelihood",
main = "Log-Likelihood Functions for Different Data Sets")
lines(lambda_vals, ll2, col = "darkorange", lwd = 2, lty = 2)
lines(lambda_vals, ll3, col = "seagreen", lwd = 2, lty = 3)
legend("topright",
legend = c("data set 1 = short times", "data set 2 = moderate times", "data set 3 = long times"),
col = c("steelblue", "darkorange", "seagreen"),
lwd = 2, lty = 1:3, bty = "n")The MLE for each data set is simply the value of \(\lambda\) (on the x-axis) where the corresponding log-likelihood function is maximized.
The coal data set (in the boot package, see
https://r-packages.io/datasets/coal ) records the dates
of coal mining disasters in Great Britain between the mid-1800s and
1962. Because the Exponential distribution is commonly used for times
between events, we’ll convert the disaster dates into interarrival times
(“gaps”) and then fit an exponential distribution to the data.
Let’s start by loading the data and inspecting the first few entries:
##
## Attaching package: 'boot'
## The following object is masked from 'package:openintro':
##
## salinity
## date
## 1 1851.203
## 2 1851.632
## 3 1851.969
## 4 1851.975
## 5 1852.314
## 6 1852.347
Since the exponential distribution is used for time
between events, we create a new vector
gaps that measures the time between successive disasters.
We’ll use diff() on the (numeric) date variable.
## [1] 0.429842574 0.336755647 0.005475702 0.339493498 0.032854209 0.010951403
Now, let’s make a histogram of the interarrival times. A histogram will help us judge whether an exponential distribution looks like a plausible model.
At first glance, an exponential distribution seems like a plausible model (a large mass near zero with a decreasing tail), but we’ll estimate \(\lambda\) (using maximum likelihood estimation) and then compare model-based probabilities to empirical proportions.
Now we are ready to compute the MLE \(\hat{\lambda}\) for the Exponential(\(\lambda\)) model of the gaps
data. Recall that if \(x_1,\dots,x_n\)
are independent Exponential(\(\lambda\)) observations, then the MLE for
\(\lambda\) is given by \[
\hat{\lambda} = \frac{1}{\bar{x}} = \frac{n}{\sum\nolimits_{i=1}^{n}
x_i}.
\]
By estimating \(\lambda\) from the
data, we have effectively fitted a model for the time
between coal mining disasters.
Our estimate \(\hat{\lambda} \approx
1.711\) means we are modeling the interarrival times as following
an Exponential(1.711) distribution.
In other words, we “believe,” based on the observed data, that the gaps between disasters can be described by a random variable
\[ X \sim \text{Exponential}(1.711), \] whose probability density function (pdf) is
\[ f(x) = 1.711\, e^{-1.711 x}, \qquad x \ge 0. \]
We can visualize how well this fitted exponential curve aligns with the observed data by overlaying the model pdf on top of the histogram of interarrival times.
# Plot histogram of the observed gaps
hist(gaps, breaks = 30, probability = TRUE,
main = "Histogram of Interarrival Times with Fitted Exponential(1.711)",
xlab = "Gap (years)")
# Overlay the fitted exponential density curve
x_vals <- seq(0, max(gaps), length.out = 200)
y_vals <- dexp(x_vals, rate = mle_lambda)
lines(x_vals, y_vals, col = "blue")
# Annotate with model text
legend("topright", legend = "fitted pdf",
col = "blue", lwd = 1)Exercise: Does the model seem to fit the data well? Briefly explain. Yes, the model seems to fit the data reasonably well. The exponential curve captures the overall shape of the histogram, with a high density near zero and a decreasing tail. While there are some small differences between the curve and the observed data, the exponential distribution appears to be a good approximation for the interarrival times.
Now, let’s look at the using the MLE to estimate some probabilities. Recall that if \(X \sim \text{Exponential}(\lambda)\) then \[ \mathbb{P} ( X \le q) = 1 - e^{-\lambda q}. \]
Of course, we don’t know the true value of \(\lambda\). Instead, we plug in our estimated value \(\hat{\lambda}\) from the data, and use it to approximate probabilities under our fitted model:
\[ \widehat{\mathbb{P}}(X \le q) = 1 - e^{-\hat{\lambda} q}. \]
The pexp() function in R computes these probabilities
directly, pexp(q, rate = mle_lambda, lower.tail = TRUE).
Now, we can check how well our model describes the data by
comparing:
If our exponential model fits the data well, these two quantities should be reasonably close for a range of thresholds.
Let’s compute and visualize this comparison.
# Thresholds (in years) to evaluate
qs <- 1:10
# Model-based probabilities using the fitted lambda_hat
model_probs <- pexp(qs, rate = mle_lambda, lower.tail = TRUE)
# Empirical proportions from the data
empirical_props <- sapply(qs, function(q) mean(gaps <= q))
# Combine results into a single data frame
comparison <- data.frame(
q = qs,
model_prob = model_probs,
empirical_prop = empirical_props,
difference = empirical_props - model_probs
)
comparison## q model_prob empirical_prop difference
## 1 1 0.8193959 0.8631579 4.376201e-02
## 2 2 0.9673822 0.9421053 -2.527689e-02
## 3 3 0.9941091 0.9684211 -2.568803e-02
## 4 4 0.9989361 0.9842105 -1.472555e-02
## 5 5 0.9998079 0.9947368 -5.071009e-03
## 6 6 0.9999653 0.9947368 -5.228455e-03
## 7 7 0.9999937 1.0000000 6.267487e-06
## 8 8 0.9999989 1.0000000 1.131934e-06
## 9 9 0.9999998 1.0000000 2.044319e-07
## 10 10 1.0000000 1.0000000 3.692124e-08
Each row of this table shows:
q: the chosen threshold (in years)model_prob: the estimated probability under our
exponential modelempirical_prop: the observed proportion of gaps that
are less than each value of qdifference: how far the empirical proportions deviate
from the estimated probabilities.Exercise: What proportion of the gaps
observations were less than 2? Under our fitted model, what is our
estimate for the probability that a gap will be less than 2? Does the
probability estimated by the model seem reasonably close to what we see
in the data? The observed proportion of gaps less than 2 is around 0.85,
while the model estimate is about 0.97. The model slightly overestimates
the probability, but the values are reasonably close, so the fit is
acceptable. —
Now, visualize the comparison.
plot(comparison$q, comparison$model_prob, type = "b", pch = 19, ylim = c(0, 1),
xlab = "Threshold q (years)", ylab = "Probability",
main = "Observed Proportions vs. Estimated Probabilities")
lines(comparison$q, comparison$empirical_prop, type = "b", pch = 19, lty = 2)
legend("bottomright", legend = c("Estimated Probabilities", "Empirical Proportions"),
lty = c(1, 2), pch = 19, bty = "n")Exercise: Overall, do the probabilities predicted by our fitted model seem to be similar to the proportions observed in the data? Briefly explain. Yes, overall the probabilities predicted by the fitted model are similar to the observed proportions. The two curves follow the same general pattern as the threshold increases, and although the model slightly overestimates the probabilities, the differences are small. This indicates that the exponential model provides a reasonably good fit to the data. —
The faithful dataset contains information about the Old Faithful geyser. We’ll model waiting times between eruptions with an Exponential distribution and mirror the same workflow you used for the coal data.
Exercise: Load the data and extract the waiting times.
faithful using data(faithful).waiting times as a vector named
waiting_times.summary of the data.## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 43.0 58.0 76.0 70.9 82.0 96.0
Exercise: Compute the MLE for \(\lambda\) under this exponential model.
## [1] 0.01410496
Exercise: Overlay the fitted exponential pdf on a histogram of the data.
waiting_times with
probability = TRUE.hist(waiting_times, probability = TRUE,
main = "Histogram with Fitted Exponential",
xlab = "Waiting Time")
x_vals <- seq(0, max(waiting_times), length.out = 200)
y_vals <- dexp(x_vals, rate = lambda_hat_faithful)
lines(x_vals, y_vals, col = "blue", lwd = 2)| The exponential model does not fit the data very well. The histogram is not strongly right-skewed and appears somewhat bimodal, while the exponential distribution assumes a single decreasing shape. This leads to noticeable mismatch between the curve and the data. |
Exercise: Use the fitted model to estimate probabilities for multiple thresholds and compare with empirical proportions.
q <- c(40, 50, 60, 70).pexp(q, rate = lambda_hat_faithful, lower.tail = TRUE).q.comparison data frame with columns
q, model_prob, empirical_prop,
and difference = empirical_prop - model_prob. Print
it.q <- c(40, 50, 60, 70)
model_probs <- pexp(q, rate = lambda_hat_faithful)
empirical_props <- sapply(q, function(q) mean(waiting_times <= q))
comparison <- data.frame(
q = q,
model_prob = model_probs,
empirical_prop = empirical_props,
difference = empirical_props - model_probs
)
comparison## q model_prob empirical_prop difference
## 1 40 0.4311840 0.00000000 -0.4311840
## 2 50 0.5060139 0.09558824 -0.4104257
## 3 60 0.5709996 0.30514706 -0.2658526
## 4 70 0.6274362 0.39338235 -0.2340539
plot(q, model_probs, type = "b", pch = 19,
ylim = c(0, 1),
xlab = "Threshold (minutes)",
ylab = "Probability",
main = "Model vs Empirical Probabilities")
lines(q, empirical_props, type = "b", pch = 19, lty = 2)
legend("bottomright",
legend = c("Model", "Empirical"),
lty = c(1, 2), pch = 19, bty = "n")Exercise: Plot the estimated probabilities vs the observed proportions.
q.The fitted model does not match the observed data particularly well. The empirical proportions differ noticeably from the model-based probabilities, especially across different thresholds. This suggests that the exponential distribution is not a good fit for the waiting times, likely because the data are not well described by a simple decreasing distribution.