Profile likelihood is a technique for making inference about a parameter of interest in the presence of nuisance parameters. The idea is simple:
For each fixed value of the parameter of interest \(\gamma\), maximize the likelihood over the nuisance parameter \(\eta\).
This gives a profile likelihood function \(L_p(\gamma) = \max_{\eta} L(\gamma,\eta)\).
Treat \(L_p(\gamma)\) as a likelihood for \(\gamma\) alone.
This document works through a concrete example.
We have three independent observations:
\[ \begin{aligned} y_1 &\sim N(\gamma + \eta, 1) \\ y_2 &\sim N(\gamma + 2\eta, 1) \\ y_3 &\sim N(\gamma - \eta, 1) \end{aligned} \]
Here:
Observed values:
\[y_1 = 2,\; y_2 = 5,\; y_3 = 0\]
The likelihood function (up to a constant) is:
\[L(\gamma,\eta) \propto \exp\left[-\frac{1}{2}(2-\gamma-\eta)^2 - \frac{1}{2}(5-\gamma-2\eta)^2 - \frac{1}{2}(0-\gamma+\eta)^2\right]\]
The log-likelihood (ignoring constants) is:
\[\ell(\gamma,\eta) = -\frac{1}{2}\left[(2-\gamma-\eta)^2 + (5-\gamma-2\eta)^2 + (0-\gamma+\eta)^2\right]\]
For easier differentiation, rewrite with positive expressions:
\[\ell(\gamma,\eta) = -\frac{1}{2}\left[(\gamma+\eta-2)^2 + (\gamma+2\eta-5)^2 + (\gamma-\eta)^2\right]\]
Treat \(\gamma\) as constant. Take derivative with respect to \(\eta\):
\[\frac{\partial \ell}{\partial \eta} = -\frac{1}{2}\left[2(\gamma+\eta-2)(1) + 2(\gamma+2\eta-5)(2) + 2(\gamma-\eta)(-1)\right]\]
Cancel the factor 2 with \(-\frac{1}{2}\):
\[\frac{\partial \ell}{\partial \eta} = -\left[(\gamma+\eta-2) + 2(\gamma+2\eta-5) - (\gamma-\eta)\right]\]
Set derivative to zero:
\[(\gamma+\eta-2) + 2(\gamma+2\eta-5) - (\gamma-\eta) = 0\]
Expand:
\[\gamma+\eta-2 + 2\gamma+4\eta-10 - \gamma+\eta = 0\]
Combine terms:
Thus:
\[2\gamma + 6\eta - 12 = 0\]
Solve for \(\eta\):
\[\hat{\eta}(\gamma) = \frac{12 - 2\gamma}{6} = 2 - \frac{\gamma}{3}\]
Important observation: \(\hat{\eta}(\gamma)\) depends on \(\gamma\). This is typical in profile likelihood.
We have \(\hat{\eta}(\gamma) = 2 - \frac{\gamma}{3}\).
Compute each term:
\[ \begin{aligned} \gamma + \hat{\eta}(\gamma) - 2 &= \gamma + \left(2 - \frac{\gamma}{3}\right) - 2 = \frac{2\gamma}{3} \\ \gamma + 2\hat{\eta}(\gamma) - 5 &= \gamma + 4 - \frac{2\gamma}{3} - 5 = \frac{\gamma}{3} - 1 \\ \gamma - \hat{\eta}(\gamma) &= \gamma - \left(2 - \frac{\gamma}{3}\right) = \frac{4\gamma}{3} - 2 \end{aligned} \]
The profile log-likelihood is:
\[\ell_p(\gamma) = -\frac{1}{2}\left[\left(\frac{2\gamma}{3}\right)^2 + \left(\frac{\gamma}{3} - 1\right)^2 + \left(\frac{4\gamma}{3} - 2\right)^2\right]\]
Expand the squares:
\[ \begin{aligned} \left(\frac{2\gamma}{3}\right)^2 &= \frac{4\gamma^2}{9} \\ \left(\frac{\gamma}{3} - 1\right)^2 &= \frac{\gamma^2}{9} - \frac{2\gamma}{3} + 1 \\ \left(\frac{4\gamma}{3} - 2\right)^2 &= \frac{16\gamma^2}{9} - \frac{16\gamma}{3} + 4 \end{aligned} \]
Sum them:
Therefore:
\[\ell_p(\gamma) = -\frac{1}{2}\left(\frac{7\gamma^2}{3} - 6\gamma + 5\right)\]
Take derivative:
\[\ell_p'(\gamma) = -\frac{1}{2}\left(\frac{14\gamma}{3} - 6\right) = -\frac{7\gamma}{3} + 3\]
Set to zero:
\[-\frac{7\gamma}{3} + 3 = 0 \quad\Rightarrow\quad \frac{7\gamma}{3} = 3 \quad\Rightarrow\quad \hat{\gamma}_p = \frac{9}{7} \approx 1.2857\]
Then:
\[\hat{\eta}(\hat{\gamma}_p) = 2 - \frac{9/7}{3} = 2 - \frac{9}{21} = 2 - \frac{3}{7} = \frac{11}{7} \approx 1.5714\]
Let’s visualize the profile likelihood function.
# Define the profile log-likelihood function
profile_loglik <- function(gamma) {
-0.5 * ((7*gamma^2)/3 - 6*gamma + 5)
}
# Define the full log-likelihood for contour plot
full_loglik <- function(gamma, eta) {
term1 <- (gamma + eta - 2)^2
term2 <- (gamma + 2*eta - 5)^2
term3 <- (gamma - eta)^2
-0.5 * (term1 + term2 + term3)
}
# Create data for profile likelihood
gamma_vals <- seq(-2, 5, length.out = 200)
profile_vals <- profile_loglik(gamma_vals)
# Create data for contour plot
gamma_grid <- seq(-2, 5, length.out = 100)
eta_grid <- seq(-2, 5, length.out = 100)
z <- outer(gamma_grid, eta_grid, Vectorize(full_loglik))
# Profile path: eta_hat(gamma) = 2 - gamma/3
eta_path <- 2 - gamma_vals/3
profile_path_vals <- sapply(gamma_vals, function(g) full_loglik(g, 2 - g/3))
# Find maximum
gamma_hat <- 9/7
eta_hat <- 11/7
# Plot profile likelihood
par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))
# Left panel: Profile likelihood function
plot(gamma_vals, profile_vals, type = "l", lwd = 2,
xlab = expression(gamma), ylab = expression(ell[p](gamma)),
main = "Profile Log-Likelihood")
abline(v = gamma_hat, col = "red", lty = 2, lwd = 2)
abline(h = profile_loglik(gamma_hat), col = "red", lty = 2, lwd = 2)
points(gamma_hat, profile_loglik(gamma_hat), col = "red", pch = 19, cex = 1.5)
legend("topright", legend = c(expression(hat(gamma)[p] == 9/7)),
col = "red", lty = 2, lwd = 2, bty = "n")
# Right panel: Contour plot of full likelihood with profile path
contour(gamma_grid, eta_grid, z, levels = pretty(z, 20),
xlab = expression(gamma), ylab = expression(eta),
main = "Full Likelihood with Profile Path")
# Add profile path
lines(gamma_vals, eta_path, col = "blue", lwd = 2)
# Add maximum point
points(gamma_hat, eta_hat, col = "red", pch = 19, cex = 1.5)
legend("topright", legend = c("Profile path", "MLE"),
col = c("blue", "red"), lty = c(1, NA), pch = c(NA, 19),
lwd = 2, bty = "n")
The profile path (blue curve on the right panel) shows how
\(\hat{\eta}(\gamma)\) changes with
\(\gamma\).
For each \(\gamma\), we move vertically (in \(\eta\)) to the maximum of the likelihood for that fixed \(\gamma\). The resulting likelihood values trace the profile likelihood curve.
The profile MLE \(\hat{\gamma}_p = 9/7\) is the value of \(\gamma\) that maximizes \(\ell_p(\gamma)\).
The full MLE (red point in the contour plot) occurs at the same \(\gamma\) because the profile likelihood was
constructed by maximizing over \(\eta\)
at each \(\gamma\). In this case, the
full MLE is
\((\hat{\gamma}, \hat{\eta}) = (9/7,
11/7)\).
It reduces dimension: from two parameters to one. We can now construct confidence intervals for \(\gamma\) using the profile likelihood ratio statistic:
\[ R(\gamma) = 2[\ell_p(\hat{\gamma}_p) - \ell_p(\gamma)] \sim \chi^2_1 \]
We can compute an approximate 95% confidence interval for \(\gamma\) using the profile likelihood ratio:
# Profile likelihood ratio statistic
profile_lr <- function(gamma) {
2 * (profile_loglik(gamma_hat) - profile_loglik(gamma))
}
# Find where LR crosses 3.84 (95% quantile of chi-square with 1 df)
chi2_95 <- qchisq(0.95, df = 1)
# Find lower bound
find_lower <- function(g) profile_lr(g) - chi2_95
lower <- uniroot(find_lower, c(-2, gamma_hat))$root
# Find upper bound
find_upper <- function(g) profile_lr(g) - chi2_95
upper <- uniroot(find_upper, c(gamma_hat, 5))$root
# Plot profile likelihood with confidence interval
gamma_vals_ci <- seq(-1, 4, length.out = 500)
profile_vals_ci <- profile_loglik(gamma_vals_ci)
lr_vals <- profile_lr(gamma_vals_ci)
par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))
# Left: Profile log-likelihood with CI
plot(gamma_vals_ci, profile_vals_ci, type = "l", lwd = 2,
xlab = expression(gamma), ylab = expression(ell[p](gamma)),
main = "95% Confidence Interval for Gamma")
abline(v = gamma_hat, col = "red", lty = 2, lwd = 1)
abline(h = profile_loglik(gamma_hat) - chi2_95/2, col = "darkgreen", lty = 2)
abline(v = lower, col = "blue", lty = 3, lwd = 2)
abline(v = upper, col = "blue", lty = 3, lwd = 2)
legend("topright", legend = c(expression(hat(gamma)[p]), "95% CI"),
col = c("red", "blue"), lty = c(2, 3), lwd = 2, bty = "n")
# Right: Profile likelihood ratio
plot(gamma_vals_ci, lr_vals, type = "l", lwd = 2,
xlab = expression(gamma), ylab = expression(R(gamma)),
main = "Profile Likelihood Ratio")
abline(h = chi2_95, col = "darkgreen", lty = 2, lwd = 2)
abline(v = lower, col = "blue", lty = 3, lwd = 2)
abline(v = upper, col = "blue", lty = 3, lwd = 2)
legend("topright", legend = c(expression(chi[1]^2(0.95) == 3.84), "95% CI"),
col = c("darkgreen", "blue"), lty = c(2, 3), lwd = 2, bty = "n")
cat("95% Profile Likelihood Confidence Interval for γ:\n")
## 95% Profile Likelihood Confidence Interval for γ:
cat(sprintf("(%.4f, %.4f)\n", lower, upper))
## (0.0026, 2.5688)
Profile likelihood is a powerful tool for inference when nuisance parameters are present.
For each fixed \(\gamma\), we maximize over \(\eta\), obtaining \(\hat{\eta}(\gamma)\).
The resulting \(\ell_p(\gamma)\) is treated as a likelihood for \(\gamma\) alone.
The profile MLE \(\hat{\gamma}_p\) equals the full MLE’s \(\gamma\) component (when the full MLE exists and is unique).
Confidence intervals can be constructed using the profile likelihood ratio statistic.
This example shows that \(\hat{\eta}(\gamma)\) is not constant — it varies with \(\gamma\), which is why we can’t simply “plug in” a single estimate.