In this lab we will explore maximum likelihood estimation (MLE), which is a flexible and powerful tool for estimating the unknown parameters of a distribution.

The primary goals of this lab are to:

  • Understand log-likelihood functions.
  • Investigate the MLEs of several data sets.
  • Practice using MLEs to say things about the populations from which data sets were sampled.

Part 1: Maximum Likelihood Estimation for Poisson Data

Recall that we typically use the Poisson\((\lambda)\) distribution to model the number of events that take place in a given area or in a given time period. For example, we might use it to model the number of cell phone towers in each county, or we might use it to model the number of earthquakes per year in a given location.

As we have seen, the MLE for the rate parameter \(\lambda\) is given by

\[ \hat{\lambda} = \frac{\sum\nolimits_{i=1}^{n}x_i}{n}, \] where \(x_1,\dots,x_n\) is our observed data.

Investigating the Log-Likelihood of the Poisson Distribution

So far, we have stated the formula for the MLE, but let’s see why it makes sense by looking at the log-likelihood. If \(x_1,\dots,x_n\) are observations from a Poisson\((\lambda\)) distribution, then the log-likelihood is given by \[ \ell(\lambda) = \log(\lambda) \left( \sum\limits_{i=1}^{n} x_i \right) - n \lambda + C, \] where \(C\) is a constant that does not depend on \(\lambda\). The log-likelihood measures how plausible different values of \(\lambda\) are given the observed data. In other words, it tells us for which value of \(\lambda\) the observed data would have been most likely.

Below, we’ll plot the log-likelihood function for several small example data sets to see how it changes depending on the observed counts.

  • The first data set, p1, contains counts with small values.
  • The second data set, p2, contains moderate counts.
  • The third data set, p3, contains large counts.
# Example data sets (6 points each, covering small, medium, and large counts)
p1 <- c(0, 0, 1, 1, 0, 2)     # small counts → small lambda
p2 <- c(3, 4, 5, 4, 6, 5)     # moderate counts → moderate lambda
p3 <- c(8, 9, 10, 7, 11, 12)  # large counts → large lambda

# Log-likelihood function for Poisson(lambda)
loglik_pois <- function(lambda, x) {
  sum(x) * log(lambda) - length(x) * lambda
}

# Range of lambda values to evaluate
lambda_vals <- seq(0.01, 15, length.out = 400)

# Compute log-likelihoods
ll1 <- sapply(lambda_vals, loglik_pois, x = p1)
ll2 <- sapply(lambda_vals, loglik_pois, x = p2)
ll3 <- sapply(lambda_vals, loglik_pois, x = p3)

# Plot all three curves
plot(lambda_vals, ll1, type = "l", lwd = 2, col = "steelblue",
     ylim = range(c(ll1, ll2, ll3)),
     xlab = expression(lambda),
     ylab = "Log-Likelihood",
     main = "Log-Likelihood Functions for Different Data Sets")
lines(lambda_vals, ll2, col = "darkorange", lwd = 2, lty = 2)
lines(lambda_vals, ll3, col = "seagreen", lwd = 2, lty = 3)

legend("topright",
       legend = c("p1 = small counts", "p2 = moderate counts", "p3 = large counts"),
       col = c("steelblue", "darkorange", "seagreen"),
       lwd = 2, lty = 1:3, bty = "n")

Each curve shows how the log-likelihood depends on the observed data:

  • The blue curve (small counts) peaks at a small \(\lambda\).
  • The orange curve (moderate counts) peaks at a moderate \(\lambda\).
  • The green curve (large counts) peaks at a large \(\lambda\).

The MLE for each data set is simply the value of \(\lambda\) (on the x-axis) where the corresponding log-likelihood function is maximized.

Poisson MLE for World War II Data

The paper at https://www.actuaries.org.uk/system/files/documents/pdf/0481.pdf describes a study about the bombing of London during the second world war. Each data point describes the number of bombs dropped into a particular 1/4 square kilometre region of London.

The data set is manually created in the code chunk below.

london <- c(rep(0, 229), rep(1, 211), rep(2,93), rep(3,35), rep(4,7), rep(5,1))

We will assume that the number of bombs dropped in each region follows a Poisson\((\lambda)\) distribution, where \(\lambda\) is the unknown parameter describing the average number of bombs dropped in each area. Recall that the MLE for \(\lambda\) is simply the sample average, which we calculate and save as lambda_hat below.

lambda_hat <- mean(london)
lambda_hat
## [1] 0.9288194

Visualizing the Fitted Poisson Model

By estimating \(\lambda\), we have effectively fitted a model for the number of bombs per region. Our estimate \(\hat{\lambda} \approx 0.93\) means we are modeling the counts as coming from a Poisson(0.93) distribution. That is, based on the data, if \(X\) denotes the number of bombs dropped in a particular area, then we would estimate the pmf of \(X\) as \[ \mathbb{P}(X = q) = e^{-0.93} \frac{(0.93)^q}{q!} , \quad q = 0,1,2,\dots. \]

We can visualize how well this fitted model aligns with the observed data by overlaying the fitted model pmf on top of the histogram of observed proportions.

# Histogram of observed counts
hist(london, breaks = seq(-0.5, max(london)+0.5, by = 1), freq = FALSE,
     main = "Histogram of Bomb Counts with Fitted Poisson(0.93)",
     xlab = "Bomb Count", border = "gray50", col = "lightgray")

# Overlay the fitted Poisson PMF
x_vals <- 0:max(london)
y_vals <- dpois(x_vals, lambda_hat)
points(x_vals, y_vals, col = "blue", pch = 19, type = "b")

legend("topright", legend = "fitted",
       col = "blue", pch = 19, lty = 1, bty = "n")


Exercise: Does the model seem to fit the data well? Briefly explain. Yes, the model fits the data well. The fitted Poisson distribution with λ≈0.93. λ≈0.93 closely matches the observed proportions, especially for the most common counts (0 and 1). The probabilities decrease in a similar way for higher counts, indicating that the Poisson model is a good approximation for this data.


Comparing Model-Based and Empirical Probabilities

Next, we will compare the predicted probabilities from our fitted model with the empirical proportions from the data.

# Counts to evaluate
q_vals <- 0:6

# Model-based probabilities
model_probs <- dpois(q_vals, lambda = lambda_hat)

# Empirical proportions
empirical_props <- sapply(q_vals, function(q) mean(london == q))

# Combine into a data frame
comparison <- data.frame(
  q = q_vals,
  model_prob = model_probs,
  empirical_prop = empirical_props,
  difference = empirical_props - model_probs
)
comparison
##   q   model_prob empirical_prop    difference
## 1 0 0.3950197780    0.397569444  2.549666e-03
## 2 1 0.3669020507    0.366319444 -5.826063e-04
## 3 2 0.1703928795    0.161458333 -8.934546e-03
## 4 3 0.0527547399    0.060763889  8.009149e-03
## 5 4 0.0122499070    0.012152778 -9.712927e-05
## 6 5 0.0022755904    0.001736111 -5.394793e-04
## 7 6 0.0003522688    0.000000000 -3.522688e-04

Each row of this table shows:

  • q: number of bombs in a region,
  • model_prob: predicted probability under our fitted Poisson model.
  • empirical_prop: observed proportion from the data,
  • difference: the deviation between the two.

Now, visualize the comparison.

plot(comparison$q, comparison$model_prob, type = "b", pch = 19, ylim = c(0, 0.4),
     xlab = "Bomb Count (q)", ylab = "Probability",
     main = "Empirical vs. Model-Based Probabilities")
lines(comparison$q, comparison$empirical_prop, type = "b", pch = 19, lty = 2)
legend("topright", legend = c("Model (Poisson)", "Empirical Data"),
       lty = c(1, 2), pch = 19, bty = "n")


Exercise: Do the fitted probabilities appear to be close to the proportions observed in the data? Briefly explain. Yes, the fitted probabilities are close to the observed proportions. The model aligns especially well for the most common counts (0 and 1), and while there are small differences for higher counts, the overall pattern is very similar. This suggests the Poisson model provides a good fit to the data. —

Part 2: Maximum Likelihood Estimation for Exponential Data

Recall that the exponential distribution is often used to model the time between events. For example, we might use it to model the time between customer arrivals at a store or the time between eruptions of a geyser. For an exponential random variable \(X \sim \text{Exponential}(\lambda)\), the pdf is \[ f(x) = \lambda e^{-\lambda x} , \; x \ge 0, \] where \(\lambda > 0\) is the rate parameter.

Recall that the MLE for \(\lambda\) is given by \[ \hat{\lambda} = \frac{n}{\sum\limits_{i=1}^{n} x_i}. \]

Investigating the Log-Likelihood of the Exponential Distribution

So far, we have estimated the rate parameter \(\lambda\) using data and the MLE formula. But what does that formula really tell us? To understand this, let’s look at the log-likelihood function for the Exponential\((\lambda)\) model.

If we observe data \(x_1, x_2, \dots, x_n\) (and we model that data as coming from some exponential distribution with an unknown rate parameter \(\lambda\)), then the log-likelihood of the observed data is given by

\[ \ell(\lambda) = \log L(\lambda) = n \log(\lambda) - \lambda \sum_{i=1}^{n} x_i. \]

The log-likelihood measures how plausible different values of \(\lambda\) are given the observed data.
In other words, it tells us for which value of \(\lambda\) the observed data would have been most likely.

Below, we’ll plot the log-likelihood function for several small example data sets to see how it changes depending on the observed values.

  • The first data set, which we save as d1, contains the values 0.3, 0.30, 0.25, 0.18, 0.31, 0.25.
  • The second data set, which we save as d2, contains the values 2.5, 3.0, 3.2, 2.8, 3.5, 2.9.
  • The third data set, which we save as d3, contains the values 5, 7, 8, 3.9, 8, 6.1.

Since there are three different data sets, we will plot three different log-likelihood functions (each one corresponding to a different data set).

# Example data sets (6 points each, spanning different time scales)
d1 <- c(0.3, 0.30, 0.25, 0.18, 0.31, 0.25)   # very short times → large lambda
d2 <- c(2.5, 3.0, 3.2, 2.8, 3.5, 2.9)         # moderate times → moderate lambda
d3 <- c(5, 7, 8, 3.9, 8, 6.1)                   # long times → small lambda

# Log-likelihood for Exponential(lambda)
loglik_exp <- function(lambda, x) {
  n <- length(x)
  n * log(lambda) - lambda * sum(x)
}

# Range of lambda values to evaluate
lambda_vals <- seq(0.01, 5, length.out = 400)

# Compute log-likelihood curves
ll1 <- sapply(lambda_vals, loglik_exp, x = d1)
ll2 <- sapply(lambda_vals, loglik_exp, x = d2)
ll3 <- sapply(lambda_vals, loglik_exp, x = d3)

# Plot all three curves
plot(lambda_vals, ll1, type = "l", lwd = 2, col = "steelblue",
     ylim = range(c(ll1, ll2, ll3)),
     xlab = expression(lambda),
     ylab = "Log-Likelihood",
     main = "Log-Likelihood Functions for Different Data Sets")
lines(lambda_vals, ll2, col = "darkorange", lwd = 2, lty = 2)
lines(lambda_vals, ll3, col = "seagreen", lwd = 2, lty = 3)

legend("topright",
       legend = c("data set 1 = short times", "data set 2 = moderate times", "data set 3 = long times"),
       col = c("steelblue", "darkorange", "seagreen"),
       lwd = 2, lty = 1:3, bty = "n")

The MLE for each data set is simply the value of \(\lambda\) (on the x-axis) where the corresponding log-likelihood function is maximized.

Exponential MLE for Coal Data

The coal data set (in the boot package, see https://r-packages.io/datasets/coal ) records the dates of coal mining disasters in Great Britain between the mid-1800s and 1962. Because the Exponential distribution is commonly used for times between events, we’ll convert the disaster dates into interarrival times (“gaps”) and then fit an exponential distribution to the data.

Let’s start by loading the data and inspecting the first few entries:

library(boot)
## 
## Attaching package: 'boot'
## The following object is masked from 'package:openintro':
## 
##     salinity
data(coal)
head(coal) #print out the first few rows
##       date
## 1 1851.203
## 2 1851.632
## 3 1851.969
## 4 1851.975
## 5 1852.314
## 6 1852.347

Since the exponential distribution is used for time between events, we create a new vector gaps that measures the time between successive disasters. We’ll use diff() on the (numeric) date variable.

gaps <- diff(as.numeric(coal$date))
head(gaps)
## [1] 0.429842574 0.336755647 0.005475702 0.339493498 0.032854209 0.010951403

Now, let’s make a histogram of the interarrival times. A histogram will help us judge whether an exponential distribution looks like a plausible model.

hist(gaps, breaks = 30)

At first glance, an exponential distribution seems like a plausible model (a large mass near zero with a decreasing tail), but we’ll estimate \(\lambda\) (using maximum likelihood estimation) and then compare model-based probabilities to empirical proportions.

Now we are ready to compute the MLE \(\hat{\lambda}\) for the Exponential(\(\lambda\)) model of the gaps data. Recall that if \(x_1,\dots,x_n\) are independent Exponential(\(\lambda\)) observations, then the MLE for \(\lambda\) is given by \[ \hat{\lambda} = \frac{1}{\bar{x}} = \frac{n}{\sum\nolimits_{i=1}^{n} x_i}. \]

mle_lambda <- 1/mean(gaps)

Visualizing the Fitted Exponential Model

By estimating \(\lambda\) from the data, we have effectively fitted a model for the time between coal mining disasters.
Our estimate \(\hat{\lambda} \approx 1.711\) means we are modeling the interarrival times as following an Exponential(1.711) distribution.

In other words, we “believe,” based on the observed data, that the gaps between disasters can be described by a random variable

\[ X \sim \text{Exponential}(1.711), \] whose probability density function (pdf) is

\[ f(x) = 1.711\, e^{-1.711 x}, \qquad x \ge 0. \]

We can visualize how well this fitted exponential curve aligns with the observed data by overlaying the model pdf on top of the histogram of interarrival times.

# Plot histogram of the observed gaps
hist(gaps, breaks = 30, probability = TRUE,
     main = "Histogram of Interarrival Times with Fitted Exponential(1.711)",
     xlab = "Gap (years)")

# Overlay the fitted exponential density curve
x_vals <- seq(0, max(gaps), length.out = 200)
y_vals <- dexp(x_vals, rate = mle_lambda)
lines(x_vals, y_vals, col = "blue")

# Annotate with model text
legend("topright", legend = "fitted pdf",
       col = "blue", lwd = 1)


Exercise: Does the model seem to fit the data well? Briefly explain. Yes, the model seems to fit the data reasonably well. The exponential curve captures the overall shape of the histogram, with a high density near zero and a decreasing tail. While there are some small differences between the curve and the observed data, the exponential distribution appears to be a good approximation for the interarrival times.


Comparing Model-Based and Empirical Probabilities

Now, let’s look at the using the MLE to estimate some probabilities. Recall that if \(X \sim \text{Exponential}(\lambda)\) then \[ \mathbb{P} ( X \le q) = 1 - e^{-\lambda q}. \]

Of course, we don’t know the true value of \(\lambda\). Instead, we plug in our estimated value \(\hat{\lambda}\) from the data, and use it to approximate probabilities under our fitted model:

\[ \widehat{\mathbb{P}}(X \le q) = 1 - e^{-\hat{\lambda} q}. \]

The pexp() function in R computes these probabilities directly, pexp(q, rate = mle_lambda, lower.tail = TRUE). Now, we can check how well our model describes the data by comparing:

  1. Model-based probabilities from the fitted exponential distribution, and
  2. Empirical proportions of observed interarrival times less than a given threshold.

If our exponential model fits the data well, these two quantities should be reasonably close for a range of thresholds.

Let’s compute and visualize this comparison.

# Thresholds (in years) to evaluate
qs <- 1:10

# Model-based probabilities using the fitted lambda_hat
model_probs <- pexp(qs, rate = mle_lambda, lower.tail = TRUE)

# Empirical proportions from the data
empirical_props <- sapply(qs, function(q) mean(gaps <= q))

# Combine results into a single data frame
comparison <- data.frame(
  q = qs,
  model_prob = model_probs,
  empirical_prop = empirical_props,
  difference = empirical_props - model_probs
)
comparison
##     q model_prob empirical_prop    difference
## 1   1  0.8193959      0.8631579  4.376201e-02
## 2   2  0.9673822      0.9421053 -2.527689e-02
## 3   3  0.9941091      0.9684211 -2.568803e-02
## 4   4  0.9989361      0.9842105 -1.472555e-02
## 5   5  0.9998079      0.9947368 -5.071009e-03
## 6   6  0.9999653      0.9947368 -5.228455e-03
## 7   7  0.9999937      1.0000000  6.267487e-06
## 8   8  0.9999989      1.0000000  1.131934e-06
## 9   9  0.9999998      1.0000000  2.044319e-07
## 10 10  1.0000000      1.0000000  3.692124e-08

Each row of this table shows:

  • q: the chosen threshold (in years)
  • model_prob: the estimated probability under our exponential model
  • empirical_prop: the observed proportion of gaps that are less than each value of q
  • difference: how far the empirical proportions deviate from the estimated probabilities.

Exercise: What proportion of the gaps observations were less than 2? Under our fitted model, what is our estimate for the probability that a gap will be less than 2? Does the probability estimated by the model seem reasonably close to what we see in the data? The observed proportion of gaps less than 2 is around 0.85, while the model estimate is about 0.97. The model slightly overestimates the probability, but the values are reasonably close, so the fit is acceptable. —

Now, visualize the comparison.

plot(comparison$q, comparison$model_prob, type = "b", pch = 19, ylim = c(0, 1),
     xlab = "Threshold q (years)", ylab = "Probability",
     main = "Observed Proportions vs. Estimated Probabilities")
lines(comparison$q, comparison$empirical_prop, type = "b", pch = 19, lty = 2)
legend("bottomright", legend = c("Estimated Probabilities", "Empirical Proportions"),
       lty = c(1, 2), pch = 19, bty = "n")


Exercise: Overall, do the probabilities predicted by our fitted model seem to be similar to the proportions observed in the data? Briefly explain. Yes, overall the probabilities predicted by the fitted model are similar to the observed proportions. The two curves follow the same general pattern as the threshold increases, and although the model slightly overestimates the probabilities, the differences are small. This indicates that the exponential model provides a reasonably good fit to the data. —

Part 3: Do at Home

The faithful dataset contains information about the Old Faithful geyser. We’ll model waiting times between eruptions with an Exponential distribution and mirror the same workflow you used for the coal data.


Exercise: Load the data and extract the waiting times.

  • Load faithful using data(faithful).
  • Save the waiting times as a vector named waiting_times.
  • Print a summary of the data.
data(faithful)
waiting_times <- faithful$waiting
summary(waiting_times)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    43.0    58.0    76.0    70.9    82.0    96.0


Exercise: Compute the MLE for \(\lambda\) under this exponential model.

lambda_hat_faithful <- 1 / mean(waiting_times)
lambda_hat_faithful
## [1] 0.01410496


Exercise: Overlay the fitted exponential pdf on a histogram of the data.

  • Make a histogram of waiting_times with probability = TRUE.
  • Overlay the fitted pdf.
  • Does the fitted model seem to described the observed data well?
hist(waiting_times, probability = TRUE,
     main = "Histogram with Fitted Exponential",
     xlab = "Waiting Time")

x_vals <- seq(0, max(waiting_times), length.out = 200)
y_vals <- dexp(x_vals, rate = lambda_hat_faithful)

lines(x_vals, y_vals, col = "blue", lwd = 2)

The exponential model does not fit the data very well. The histogram is not strongly right-skewed and appears somewhat bimodal, while the exponential distribution assumes a single decreasing shape. This leads to noticeable mismatch between the curve and the data.

Exercise: Use the fitted model to estimate probabilities for multiple thresholds and compare with empirical proportions.

  • Consider thresholds (in minutes), q <- c(40, 50, 60, 70).
  • Compute model-based probabilities with pexp(q, rate = lambda_hat_faithful, lower.tail = TRUE).
  • Compute empirical proportions for each q.
  • Create a comparison data frame with columns q, model_prob, empirical_prop, and difference = empirical_prop - model_prob. Print it.
q <- c(40, 50, 60, 70)

model_probs <- pexp(q, rate = lambda_hat_faithful)
empirical_props <- sapply(q, function(q) mean(waiting_times <= q))

comparison <- data.frame(
  q = q,
  model_prob = model_probs,
  empirical_prop = empirical_props,
  difference = empirical_props - model_probs
)

comparison
##    q model_prob empirical_prop difference
## 1 40  0.4311840     0.00000000 -0.4311840
## 2 50  0.5060139     0.09558824 -0.4104257
## 3 60  0.5709996     0.30514706 -0.2658526
## 4 70  0.6274362     0.39338235 -0.2340539
plot(q, model_probs, type = "b", pch = 19,
     ylim = c(0, 1),
     xlab = "Threshold (minutes)",
     ylab = "Probability",
     main = "Model vs Empirical Probabilities")

lines(q, empirical_props, type = "b", pch = 19, lty = 2)

legend("bottomright",
       legend = c("Model", "Empirical"),
       lty = c(1, 2), pch = 19, bty = "n")



Exercise: Plot the estimated probabilities vs the observed proportions.

  • On the same plot, draw the model-based probabilities (solid line) and empirical proportions (dashed) against q.
  • Does the fitted model appear to fit the observed data well? Why or why not?

The fitted model does not match the observed data particularly well. The empirical proportions differ noticeably from the model-based probabilities, especially across different thresholds. This suggests that the exponential distribution is not a good fit for the waiting times, likely because the data are not well described by a simple decreasing distribution.