Putting down some thoughts about estimating the maximum, as possibly relevant to using uncertainty in likelihood estimates to calculate confidence intervals.

Simple example

Let \(Y_i\), \(i=1, ..., N\) be a set of observations where \(Y_i \sim N(\mu_i, \sigma^2)\). For now we assume a shared variance across \(i\), although that could be changed later. We are interested in estimating \(\phi = \max_i(\mu_i)\). What is the MLE for \(\phi\) and what is \(Var(\hat\phi)\)

Multi-observation example

Let \(Y_{ij}\), \(i=1, ..., N_i\) and \(j=1, ..., N_j\) be observations on a continuous random variable. The data is generated by an underlying model specified as \(Y_{ij} \sim N(\mu_i, \sigma^2)\), assuming a shared variance across \(i\). The \(\mu_i\) are fixed but unknown values. We wish to estimate these values, and specifically, we are interested in estimating the maximum of the means, say, \(\phi = \max_i(\mu_i)\).

(In the context of our particle filtering example, \(N_i\) could be the number of grid-points for a second parameter, and \(N_j\) could be the number of filters at each grid-point.)

We propose two estimators for \(\phi\). First, we start with a “naive” estimator \(\hat \phi^{A} = max_i(\bar Y_i)\) where \(\bar Y_i = \frac{1}{N_j}\sum_{j=1}^{N_j}Y_{ij}\). This estimator of \(\phi\) is simply the largest observed group mean within the data.
Second, we propose a “shrinkage” estimator \(\hat \phi^B\) that comes from a random effects model. Specifically, if we fit the model \[Y_ij = \alpha + \mu_i + \epsilon_{ij}\] with \(\mu_i \sim N(0, \tau)\) and \(\epsilon_{ij} \sim N(0, \sigma)\), then \(\hat\phi^B = \hat\alpha + max_i(\hat\mu_i)\) where the \(\hat \mu_i\) are taken as the standard BLUP estimators for the random effects.

We fix the following parameters and values for our model:

mu <- c(3, 2, 1)              ## group-specific means ==> phi = max(mu) = 3
sigma <- rep(10, length(mu))  ## fixed within-group variance
Nj <- 5                       ## number of observations per group
nsim <- 1000                  ## number of simulation iterations

Then, we run a simulation to compare the estimators:

library(dplyr)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
library(lme4)
library(tidyr)
theme_set(theme_bw())
set.seed(189)

## data structure
x <- expand.grid(ni=1:length(mu), nj=1:Nj)
x$ni <- factor(x$ni)

sim_data <- data.frame(
    id = 1:nsim,
    phi_hatA = NA,
    phi_hatB = NA,
    phi_hatA_sd = NA
)
for(i in 1:nsim) {
    ## generate the data
    x$y <- rnorm(nrow(x), mu[x$ni], sigma)
    y_sums <- group_by(x, ni) %>% 
        summarize(
            meany = mean(y),
            sdy = sd(y)
            )
    fit <- lm(y~factor(ni), data=x)
    sim_data[i,"phi_hatA"] <- max(y_sums$meany)
    sim_data[i,"phi_hatA_sd"] <- y_sums$sdy[which(y_sums$meany == sim_data[i,"phi_hatA"])]
    lme_fit <- lmer(y ~ (1|ni), data=x)
    sim_data[i,"phi_hatB"] <- max(unlist(ranef(lme_fit))) + fixef(lme_fit)
}    

The following boxplot shows the estimates for each estimator. The horizontal red dashed line is the true value of \(\phi\). The shrinkage estimator looks virtually unbiased.

sim_data_long <- gather(sim_data, key=param, value = phi, c(-id, -phi_hatA_sd))
qplot(x=param, y=phi, data=sim_data_long, geom="boxplot") + 
    geom_hline(yintercept=max(mu), color="red", linetype=2)

The following plot shows an example of one simulated dataset. The x-axis indicates the group number and the black points are simulated observations. The red points are the true underlying \(\mu_i\) and the red dashed line is the true maximum \(\mu_i\). The blue crosses are the raw group means used to estimate \(\hat\phi^A\). The green x’s indicate the group means calcualted by the random effects model used to estimate \(\hat\phi^B\). Note that the shrinkage estimator is closer to the true mean than the naive estimator.

In the context of particle filtering, the black points represent estimates of the likelihood from particle filters at a single value of the parameter of interest and across length(ni) values of a second parameter.

ggplot() +
    geom_hline(yintercept=max(mu), color="red", linetype=2) +
    geom_point(aes(y=y, x=factor(ni)), data=x, alpha=I(.5)) +
    geom_point(aes(x=factor(1:3), y=mu), shape=2, color="red") +
    geom_point(aes(x=factor(ni), y=meany), data=y_sums, shape=3, color="blue") +
    geom_point(aes(x=factor(ni), y=unlist(ranef(lme_fit)) + fixef(lme_fit)), data=y_sums, shape=4, color="green")

Open questions