Profile Likelihood: A Concrete Example

Introduction

Profile likelihood is a technique for making inference about a parameter of interest in the presence of nuisance parameters. The idea is simple:

For each fixed value of the parameter of interest \(\gamma\), maximize the likelihood over the nuisance parameter \(\eta\).

This gives a profile likelihood function \(L_p(\gamma) = \max_{\eta} L(\gamma,\eta)\).

Treat \(L_p(\gamma)\) as a likelihood for \(\gamma\) alone.

This document works through a concrete example.

Model Setup

We have three independent observations:

\[ \begin{aligned} y_1 &\sim N(\gamma + \eta, 1) \\ y_2 &\sim N(\gamma + 2\eta, 1) \\ y_3 &\sim N(\gamma - \eta, 1) \end{aligned} \]

Here:

\(\gamma\) is the parameter of interest
\(\eta\) is the nuisance parameter
The variance is known and equals 1 (common for all observations)

Data

Observed values:

\[y_1 = 2,\; y_2 = 5,\; y_3 = 0\]

Full Likelihood

The likelihood function (up to a constant) is:

\[L(\gamma,\eta) \propto \exp\left[-\frac{1}{2}(2-\gamma-\eta)^2 - \frac{1}{2}(5-\gamma-2\eta)^2 - \frac{1}{2}(0-\gamma+\eta)^2\right]\]

The log-likelihood (ignoring constants) is:

\[\ell(\gamma,\eta) = -\frac{1}{2}\left[(2-\gamma-\eta)^2 + (5-\gamma-2\eta)^2 + (0-\gamma+\eta)^2\right]\]

For easier differentiation, rewrite with positive expressions:

\[\ell(\gamma,\eta) = -\frac{1}{2}\left[(\gamma+\eta-2)^2 + (\gamma+2\eta-5)^2 + (\gamma-\eta)^2\right]\]

Step 1: For fixed \(\gamma\), maximize over \(\eta\)

Treat \(\gamma\) as constant. Take derivative with respect to \(\eta\):

\[\frac{\partial \ell}{\partial \eta} = -\frac{1}{2}\left[2(\gamma+\eta-2)(1) + 2(\gamma+2\eta-5)(2) + 2(\gamma-\eta)(-1)\right]\]

Cancel the factor 2 with \(-\frac{1}{2}\):

\[\frac{\partial \ell}{\partial \eta} = -\left[(\gamma+\eta-2) + 2(\gamma+2\eta-5) - (\gamma-\eta)\right]\]

Set derivative to zero:

\[(\gamma+\eta-2) + 2(\gamma+2\eta-5) - (\gamma-\eta) = 0\]

Expand:

\[\gamma+\eta-2 + 2\gamma+4\eta-10 - \gamma+\eta = 0\]

Combine terms:

\(\gamma + 2\gamma - \gamma = 2\gamma\)
\(\eta + 4\eta + \eta = 6\eta\)
Constants: \(-2 -10 = -12\)

Thus:

\[2\gamma + 6\eta - 12 = 0\]

Solve for \(\eta\):

\[\hat{\eta}(\gamma) = \frac{12 - 2\gamma}{6} = 2 - \frac{\gamma}{3}\]

Important observation: \(\hat{\eta}(\gamma)\) depends on \(\gamma\). This is typical in profile likelihood.

Step 2: Plug \(\hat{\eta}(\gamma)\) back into the likelihood

We have \(\hat{\eta}(\gamma) = 2 - \frac{\gamma}{3}\).

Compute each term:

\[ \begin{aligned} \gamma + \hat{\eta}(\gamma) - 2 &= \gamma + \left(2 - \frac{\gamma}{3}\right) - 2 = \frac{2\gamma}{3} \\ \gamma + 2\hat{\eta}(\gamma) - 5 &= \gamma + 4 - \frac{2\gamma}{3} - 5 = \frac{\gamma}{3} - 1 \\ \gamma - \hat{\eta}(\gamma) &= \gamma - \left(2 - \frac{\gamma}{3}\right) = \frac{4\gamma}{3} - 2 \end{aligned} \]

The profile log-likelihood is:

\[\ell_p(\gamma) = -\frac{1}{2}\left[\left(\frac{2\gamma}{3}\right)^2 + \left(\frac{\gamma}{3} - 1\right)^2 + \left(\frac{4\gamma}{3} - 2\right)^2\right]\]

Step 3: Simplify \(\ell_p(\gamma)\)

Expand the squares:

\[ \begin{aligned} \left(\frac{2\gamma}{3}\right)^2 &= \frac{4\gamma^2}{9} \\ \left(\frac{\gamma}{3} - 1\right)^2 &= \frac{\gamma^2}{9} - \frac{2\gamma}{3} + 1 \\ \left(\frac{4\gamma}{3} - 2\right)^2 &= \frac{16\gamma^2}{9} - \frac{16\gamma}{3} + 4 \end{aligned} \]

Sum them:

Quadratic terms: \(\frac{4\gamma^2}{9} + \frac{\gamma^2}{9} + \frac{16\gamma^2}{9} = \frac{21\gamma^2}{9} = \frac{7\gamma^2}{3}\)
Linear terms: \(-\frac{2\gamma}{3} - \frac{16\gamma}{3} = -\frac{18\gamma}{3} = -6\gamma\)
Constant terms: \(1 + 4 = 5\)

Therefore:

\[\ell_p(\gamma) = -\frac{1}{2}\left(\frac{7\gamma^2}{3} - 6\gamma + 5\right)\]

Step 4: Maximize \(\ell_p(\gamma)\) over \(\gamma\)

Take derivative:

\[\ell_p'(\gamma) = -\frac{1}{2}\left(\frac{14\gamma}{3} - 6\right) = -\frac{7\gamma}{3} + 3\]

Set to zero:

\[-\frac{7\gamma}{3} + 3 = 0 \quad\Rightarrow\quad \frac{7\gamma}{3} = 3 \quad\Rightarrow\quad \hat{\gamma}_p = \frac{9}{7} \approx 1.2857\]

Then:

\[\hat{\eta}(\hat{\gamma}_p) = 2 - \frac{9/7}{3} = 2 - \frac{9}{21} = 2 - \frac{3}{7} = \frac{11}{7} \approx 1.5714\]

Visualization

Let’s visualize the profile likelihood function.

# Define the profile log-likelihood function
profile_loglik <- function(gamma) {
  -0.5 * ((7*gamma^2)/3 - 6*gamma + 5)
}

# Define the full log-likelihood for contour plot
full_loglik <- function(gamma, eta) {
  term1 <- (gamma + eta - 2)^2
  term2 <- (gamma + 2*eta - 5)^2
  term3 <- (gamma - eta)^2
  -0.5 * (term1 + term2 + term3)
}

# Create data for profile likelihood
gamma_vals <- seq(-2, 5, length.out = 200)
profile_vals <- profile_loglik(gamma_vals)

# Create data for contour plot
gamma_grid <- seq(-2, 5, length.out = 100)
eta_grid <- seq(-2, 5, length.out = 100)
z <- outer(gamma_grid, eta_grid, Vectorize(full_loglik))

# Profile path: eta_hat(gamma) = 2 - gamma/3
eta_path <- 2 - gamma_vals/3
profile_path_vals <- sapply(gamma_vals, function(g) full_loglik(g, 2 - g/3))

# Find maximum
gamma_hat <- 9/7
eta_hat <- 11/7

# Plot profile likelihood
par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))

# Left panel: Profile likelihood function
plot(gamma_vals, profile_vals, type = "l", lwd = 2,
     xlab = expression(gamma), ylab = expression(ell[p](gamma)),
     main = "Profile Log-Likelihood")
abline(v = gamma_hat, col = "red", lty = 2, lwd = 2)
abline(h = profile_loglik(gamma_hat), col = "red", lty = 2, lwd = 2)
points(gamma_hat, profile_loglik(gamma_hat), col = "red", pch = 19, cex = 1.5)
legend("topright", legend = c(expression(hat(gamma)[p] == 9/7)),
       col = "red", lty = 2, lwd = 2, bty = "n")

# Right panel: Contour plot of full likelihood with profile path
contour(gamma_grid, eta_grid, z, levels = pretty(z, 20),
        xlab = expression(gamma), ylab = expression(eta),
        main = "Full Likelihood with Profile Path")
# Add profile path
lines(gamma_vals, eta_path, col = "blue", lwd = 2)
# Add maximum point
points(gamma_hat, eta_hat, col = "red", pch = 19, cex = 1.5)
legend("topright", legend = c("Profile path", "MLE"),
       col = c("blue", "red"), lty = c(1, NA), pch = c(NA, 19),
       lwd = 2, bty = "n")

Key Insights

The profile path (blue curve on the right panel) shows how
\(\hat{\eta}(\gamma)\) changes with \(\gamma\).

For each \(\gamma\), we move vertically (in \(\eta\)) to the maximum of the likelihood for that fixed \(\gamma\). The resulting likelihood values trace the profile likelihood curve.

The profile MLE \(\hat{\gamma}_p = 9/7\) is the value of \(\gamma\) that maximizes \(\ell_p(\gamma)\).

The full MLE (red point in the contour plot) occurs at the same \(\gamma\) because the profile likelihood was constructed by maximizing over \(\eta\) at each \(\gamma\). In this case, the full MLE is
\((\hat{\gamma}, \hat{\eta}) = (9/7, 11/7)\).

Why profile likelihood?

It reduces dimension: from two parameters to one. We can now construct confidence intervals for \(\gamma\) using the profile likelihood ratio statistic:

\[ R(\gamma) = 2[\ell_p(\hat{\gamma}_p) - \ell_p(\gamma)] \sim \chi^2_1 \]

Profile Likelihood Ratio Confidence Interval

We can compute an approximate 95% confidence interval for \(\gamma\) using the profile likelihood ratio:

# Profile likelihood ratio statistic
profile_lr <- function(gamma) {
  2 * (profile_loglik(gamma_hat) - profile_loglik(gamma))
}

# Find where LR crosses 3.84 (95% quantile of chi-square with 1 df)
chi2_95 <- qchisq(0.95, df = 1)

# Find lower bound
find_lower <- function(g) profile_lr(g) - chi2_95
lower <- uniroot(find_lower, c(-2, gamma_hat))$root

# Find upper bound
find_upper <- function(g) profile_lr(g) - chi2_95
upper <- uniroot(find_upper, c(gamma_hat, 5))$root

# Plot profile likelihood with confidence interval
gamma_vals_ci <- seq(-1, 4, length.out = 500)
profile_vals_ci <- profile_loglik(gamma_vals_ci)
lr_vals <- profile_lr(gamma_vals_ci)

par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))

# Left: Profile log-likelihood with CI
plot(gamma_vals_ci, profile_vals_ci, type = "l", lwd = 2,
     xlab = expression(gamma), ylab = expression(ell[p](gamma)),
     main = "95% Confidence Interval for Gamma")
abline(v = gamma_hat, col = "red", lty = 2, lwd = 1)
abline(h = profile_loglik(gamma_hat) - chi2_95/2, col = "darkgreen", lty = 2)
abline(v = lower, col = "blue", lty = 3, lwd = 2)
abline(v = upper, col = "blue", lty = 3, lwd = 2)
legend("topright", legend = c(expression(hat(gamma)[p]), "95% CI"),
       col = c("red", "blue"), lty = c(2, 3), lwd = 2, bty = "n")

# Right: Profile likelihood ratio
plot(gamma_vals_ci, lr_vals, type = "l", lwd = 2,
     xlab = expression(gamma), ylab = expression(R(gamma)),
     main = "Profile Likelihood Ratio")
abline(h = chi2_95, col = "darkgreen", lty = 2, lwd = 2)
abline(v = lower, col = "blue", lty = 3, lwd = 2)
abline(v = upper, col = "blue", lty = 3, lwd = 2)
legend("topright", legend = c(expression(chi[1]^2(0.95) == 3.84), "95% CI"),
       col = c("darkgreen", "blue"), lty = c(2, 3), lwd = 2, bty = "n")

cat("95% Profile Likelihood Confidence Interval for γ:\n")

## 95% Profile Likelihood Confidence Interval for γ:

cat(sprintf("(%.4f, %.4f)\n", lower, upper))

## (0.0026, 2.5688)

Summary

Profile likelihood is a powerful tool for inference when nuisance parameters are present.

For each fixed \(\gamma\), we maximize over \(\eta\), obtaining \(\hat{\eta}(\gamma)\).

The resulting \(\ell_p(\gamma)\) is treated as a likelihood for \(\gamma\) alone.

The profile MLE \(\hat{\gamma}_p\) equals the full MLE’s \(\gamma\) component (when the full MLE exists and is unique).

Confidence intervals can be constructed using the profile likelihood ratio statistic.

This example shows that \(\hat{\eta}(\gamma)\) is not constant — it varies with \(\gamma\), which is why we can’t simply “plug in” a single estimate.