Estimating the maximum

Simple example

Let \(Y_i\), \(i=1, ..., N\) be a set of observations where \(Y_i \sim N(\mu_i, \sigma^2)\). For now we assume a shared variance across \(i\), although that could be changed later. We are interested in estimating \(\phi = \max_i(\mu_i)\). What is the MLE for \(\phi\) and what is \(Var(\hat\phi)\)

Multi-observation example

Let \(Y_{ij}\), \(i=1, ..., N_i\) and \(j=1, ..., N_j\) be observations on a continuous random variable. The data is generated by an underlying model specified as \(Y_{ij} \sim N(\mu_i, \sigma^2)\), assuming a shared variance across \(i\). The \(\mu_i\) are fixed but unknown values. We wish to estimate these values, and specifically, we are interested in estimating the maximum of the means, say, \(\phi = \max_i(\mu_i)\).

(In the context of our particle filtering example, \(N_i\) could be the number of grid-points for a second parameter, and \(N_j\) could be the number of filters at each grid-point.)

We propose two estimators for \(\phi\). First, we start with a “naive” estimator \(\hat \phi^{A} = max_i(\bar Y_i)\) where \(\bar Y_i = \frac{1}{N_j}\sum_{j=1}^{N_j}Y_{ij}\). This estimator of \(\phi\) is simply the largest observed group mean within the data.
Second, we propose a “shrinkage” estimator \(\hat \phi^B\) that comes from a random effects model. Specifically, if we fit the model \[Y_ij = \alpha + \mu_i + \epsilon_{ij}\] with \(\mu_i \sim N(0, \tau)\) and \(\epsilon_{ij} \sim N(0, \sigma)\), then \(\hat\phi^B = \hat\alpha + max_i(\hat\mu_i)\) where the \(\hat \mu_i\) are taken as the standard BLUP estimators for the random effects.

We fix the following parameters and values for our model:

mu <- c(3, 2, 1)              ## group-specific means ==> phi = max(mu) = 3
sigma <- rep(10, length(mu))  ## fixed within-group variance
Nj <- 5                       ## number of observations per group
nsim <- 1000                  ## number of simulation iterations

Then, we run a simulation to compare the estimators:

library(dplyr)
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.3.2

library(lme4)
library(tidyr)
theme_set(theme_bw())
set.seed(189)

## data structure
x <- expand.grid(ni=1:length(mu), nj=1:Nj)
x$ni <- factor(x$ni)

sim_data <- data.frame(
    id = 1:nsim,
    phi_hatA = NA,
    phi_hatB = NA,
    phi_hatA_sd = NA
)
for(i in 1:nsim) {
    ## generate the data
    x$y <- rnorm(nrow(x), mu[x$ni], sigma)
    y_sums <- group_by(x, ni) %>% 
        summarize(
            meany = mean(y),
            sdy = sd(y)
            )
    fit <- lm(y~factor(ni), data=x)
    sim_data[i,"phi_hatA"] <- max(y_sums$meany)
    sim_data[i,"phi_hatA_sd"] <- y_sums$sdy[which(y_sums$meany == sim_data[i,"phi_hatA"])]
    lme_fit <- lmer(y ~ (1|ni), data=x)
    sim_data[i,"phi_hatB"] <- max(unlist(ranef(lme_fit))) + fixef(lme_fit)
}

The following boxplot shows the estimates for each estimator. The horizontal red dashed line is the true value of \(\phi\). The shrinkage estimator looks virtually unbiased.

sim_data_long <- gather(sim_data, key=param, value = phi, c(-id, -phi_hatA_sd))
qplot(x=param, y=phi, data=sim_data_long, geom="boxplot") + 
    geom_hline(yintercept=max(mu), color="red", linetype=2)

The following plot shows an example of one simulated dataset. The x-axis indicates the group number and the black points are simulated observations. The red points are the true underlying \(\mu_i\) and the red dashed line is the true maximum \(\mu_i\). The blue crosses are the raw group means used to estimate \(\hat\phi^A\). The green x’s indicate the group means calcualted by the random effects model used to estimate \(\hat\phi^B\). Note that the shrinkage estimator is closer to the true mean than the naive estimator.

In the context of particle filtering, the black points represent estimates of the likelihood from particle filters at a single value of the parameter of interest and across length(ni) values of a second parameter.

ggplot() +
    geom_hline(yintercept=max(mu), color="red", linetype=2) +
    geom_point(aes(y=y, x=factor(ni)), data=x, alpha=I(.5)) +
    geom_point(aes(x=factor(1:3), y=mu), shape=2, color="red") +
    geom_point(aes(x=factor(ni), y=meany), data=y_sums, shape=3, color="blue") +
    geom_point(aes(x=factor(ni), y=unlist(ranef(lme_fit)) + fixef(lme_fit)), data=y_sums, shape=4, color="green")

Open questions

This simulation showed a nice bias correction for the naive estimator in one very specific setting. Are there settings in which the shrinkage estimator would fail, i.e. be biased?
Is there a situation where assuming that the \(\mu_i\) are distributed normally would hurt this estimation? What if we generate the fixed \(\mu_i\) from a not normal distribution?
What is the estimate of the variance for \(\hat\phi\)? If we implemented this estimator across each value of our parameter of interest, could we then use the shrinkage estimators of \(\phi\) and the variance estimates to generate Gaussian samples from which we then use the quadratic smoothing method to calculate a profile likelihood confidence interval?
How sensitive is this method to the assumption of constant variance across groups? Would estimating group-specific variances, even with small sample sizes, help in this setting?

Estimating the maximum

Nicholas G Reich

6/30/2017

Simple example

Multi-observation example

Open questions