MCLE Adjustment After Positive Phase 2 Selection

In this note, we show that

MCLE can be derived using sufficient statistics only;
Instead of solving a two-parameter equation, MCLE can be solved from a 1-dim equation;
MCLE of \(\mu\) approaches to \(-\infty\) when observed phase 2 mean barely exceeds the go/no-go threshold
- We give approximated \(\widehat \mu\) when that happens
A fast simulation method exists
MCLE is more likely to be ill-posed when true effect is small
MCLE is applicable in practice when go decision is made, even if true effect is unknown

1. Problem setup

Here we consider the one-sample problem. The result can be applied to the two-sample problem as in a two-arm randomized trial by replacing \(n\) with \(n/2\).

Suppose

\[ X_1,\ldots,X_n \overset{iid}{\sim} N(\mu,\sigma^2). \]

Only the summary statistics are available:

\[ Y=\bar X,\qquad S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2. \]

Let the observed values be

\[ y=\bar x,\qquad s^2=S^2_{\text{obs}}. \]

The phase 2 trial moves to phase 3 only if

\[ Y>c, \]

where \(c\) is the go/no-go threshold. The objective is to estimate \((\mu,\sigma^2)\) conditional on the selection event

\[ A=\{Y>c\}. \]

Assume throughout that

\[ y>c,\qquad s^2>0,\qquad n>1. \]

2. Conditional likelihood based on \((\bar X,S^2)\)

Under the normal model,

\[ Y\sim N\left(\mu,\frac{\sigma^2}{n}\right), \qquad \frac{(n-1)S^2}{\sigma^2}\sim \chi^2_{n-1}, \]

and \(Y\) and \(S^2\) are independent.

Because the selection event \(A=\{Y>c\}\) depends only on \(Y\), the conditional likelihood based on \((y,s^2)\) is

\[ L_c(\mu,\sigma\mid y,s^2,Y>c) = \frac{ f_Y(y\mid \mu,\sigma) f_{S^2}(s^2\mid \sigma) }{ P_{\mu,\sigma}(Y>c) }. \]

Define

\[ a=\frac{\sqrt n(c-\mu)}{\sigma}. \]

Then

\[ P_{\mu,\sigma}(Y>c)=1-\Phi(a). \]

Ignoring constants not involving \((\mu,\sigma)\),

\[ L_c(\mu,\sigma) \propto \frac{ \sigma^{-n} \exp\left[ -\frac{(n-1)s^2+n(y-\mu)^2}{2\sigma^2} \right] }{ 1-\Phi(a) }. \]

Equivalently,

\[ \ell_c(\mu,\sigma) = -n\log\sigma - \frac{(n-1)s^2+n(y-\mu)^2}{2\sigma^2} - \log\{1-\Phi(a)\} + \text{constant}. \]

3. Fast simulation of summary statistics conditional on \(A=\{Y>c\}\)

To simulate selected phase 2 trials efficiently, there is no need to generate individual observations \(X_1,\ldots,X_n\). It is enough to simulate the sufficient statistics \((Y,S^2)\) directly.

Under the normal model,

\[ Y\sim N\left(\mu,\frac{\sigma^2}{n}\right), \qquad \frac{(n-1)S^2}{\sigma^2}\sim \chi^2_{n-1}, \]

and \(Y\) is independent of \(S^2\). Since the selection event

\[ A=\{Y>c\} \]

depends only on \(Y\), conditioning on \(A\) only changes the distribution of \(Y\). The conditional distribution of \(S^2\) remains unchanged.

Therefore,

\[ \boxed{ Y\mid A \sim N\left(\mu,\frac{\sigma^2}{n}\right) \text{ truncated below at } c } \tag{1} \]

and

\[ \boxed{ \frac{(n-1)S^2}{\sigma^2}\mid A \sim \chi^2_{n-1}. } \tag{2} \]

Moreover,

\[ \boxed{ Y\mid A \quad\text{and}\quad S^2\mid A \text{ are independent.} } \tag{3} \]

Thus, selected summary statistics can be simulated as follows:

Simulate

\[ Y \sim N\left(\mu,\frac{\sigma^2}{n}\right) \]

conditional on \(Y>c\).

Independently simulate

\[ W\sim \chi^2_{n-1}. \]

\[ S^2=\frac{\sigma^2 W}{n-1}. \]

Equivalently, let

\[ \alpha=\frac{\sqrt n(c-\mu)}{\sigma}. \]

If \(U\sim \mathrm{Uniform}(\Phi(\alpha),1)\), then

\[ \boxed{ Y = \mu+\frac{\sigma}{\sqrt n}\Phi^{-1}(U) } \tag{4} \]

has the desired conditional distribution \(Y\mid Y>c\). Independently,

\[ \boxed{ S^2 = \frac{\sigma^2}{n-1}\chi^2_{n-1}. } \tag{5} \]

This directly simulates \((Y,S^2)\mid Y>c\). It avoids rejection sampling and is much faster when \(P(Y>c)\) is small.

Using the method derived in sections below, We simulate the following scenarios to assess MCLE’s performance:

\(n = 25\), equivalent to 50 patients per arm in the two-sample problem
\(\sigma^2 = 1\)
\(c = .33\)
\(\mu = 0, 0.05, 0.1, \ldots, 0.95, 1.0\)
1000 replicates for each \(\mu\)

The results are summarized below.

A few observations:

Left: Ill-posed MCLE is defined as MCLE \(< -10\). The proportion of ill-posed increases dramatically when the true effect is away from the go threshold \(c = 0.33\). Intuitively, the more closer to the null, the higher chance that the observed effect \(Y\) is close to the threshold, given a go decision is made. Note that it is more likely to have \(Y\) near than far away from \(c\), which leads to ill-posed MCLE.
- The percentage of ill-posed is lower than 50% across simulated scenarios. Thus, the calculation of median bias of MCLE is not affected by ill-posed estimates.
Middle: As expected, the observed effect is highly biased when the true effect is close to null.
Right: a few conclusions can be made for MCLE.
- MCLE tends to underestimate the effect.
- MCLE is less biased with greater true effect.
- MCLE is low biased (< -14% = -0.05/0.35) when true effect is greater than go threshold. This make MCLE usable when planning confirmatory trials.
- MCLE is highly biased when close to null. However, in practice this is good as “go” is a false positive.

The plot of the percentage of ill-posed MCLE reflects an interesting observation that we will discuss further in section 4.

4. Conditional Density Ratio of \(Y\) After Selection

Recall that

\[ Y\sim N\left(\mu,\frac{\sigma^2}{n}\right). \]

The phase 2 trial is selected only if

\[ Y>c. \]

For a fixed value of \(\mu\), the conditional density of \(Y\) given \(Y>c\) is

\[ f_\mu(y\mid Y>c) = \frac{ \frac{\sqrt n}{\sigma} \phi\left(\frac{\sqrt n(y-\mu)}{\sigma}\right) }{ 1-\Phi\left(\frac{\sqrt n(c-\mu)}{\sigma}\right) }, \qquad y>c. \]

We compare the conditional density under \(\mu=0\) with that under \(\mu>0\), while keeping \(n\), \(\sigma\), and \(c\) fixed.

Main conclusion

Define the density ratio

\[ R(y) = \frac{ f_0(y\mid Y>c) }{ f_\mu(y\mid Y>c) }, \qquad y>c,\quad \mu>0. \]

Then \(R(y)\) is strictly decreasing in \(y\). Moreover, \(R(c)>1\) and \(R(y)\to 0\) as \(y\to\infty\). Therefore, there exists a unique crossing point \(y^\star>c\) such that

\[ f_0(y\mid Y>c)>f_\mu(y\mid Y>c), \qquad c<y<y^\star, \]

while

\[ f_0(y\mid Y>c)<f_\mu(y\mid Y>c), \qquad y>y^\star. \]

In words, after conditioning on \(Y>c\), the model with \(\mu=0\) puts relatively more mass near the threshold \(c\), while the model with \(\mu>0\) puts relatively more mass farther into the right tail. This is observed in the above plot in the left.

Proof

From the conditional density formula,

\[ R(y) = \frac{ 1-\Phi\left(\frac{\sqrt n(c-\mu)}{\sigma}\right) }{ 1-\Phi\left(\frac{\sqrt n c}{\sigma}\right) } \exp\left\{ -\frac{n\mu y}{\sigma^2} + \frac{n\mu^2}{2\sigma^2} \right\}. \]

Hence

\[ \log R(y) = \text{constant} - \frac{n\mu}{\sigma^2}y. \]

Since \(\mu>0\),

\[ \frac{d}{dy}\log R(y) = -\frac{n\mu}{\sigma^2} <0. \]

Therefore, \(R(y)\) is strictly decreasing in \(y\).

Next, evaluate the ratio at \(y=c\). We have

\[ \log R(c) = \log\left\{1-\Phi\left(\frac{\sqrt n(c-\mu)}{\sigma}\right)\right\} - \log\left\{1-\Phi\left(\frac{\sqrt n c}{\sigma}\right)\right\} - \frac{n\mu c}{\sigma^2} + \frac{n\mu^2}{2\sigma^2}. \]

Using

\[ \lambda(a)=\frac{\phi(a)}{1-\Phi(a)}, \]

and

\[ \frac{d}{d\mu} \log\left\{ 1-\Phi\left(\frac{\sqrt n(c-\mu)}{\sigma}\right) \right\} = \frac{\sqrt n}{\sigma} \lambda\left(\frac{\sqrt n(c-\mu)}{\sigma}\right), \]

we can write

\[ \begin{aligned} \log R(c) &= \int_0^\mu \frac{\sqrt n}{\sigma} \lambda\left(\frac{\sqrt n(c-u)}{\sigma}\right) \,du - \int_0^\mu \frac{n(c-u)}{\sigma^2} \,du \\ &= \int_0^\mu \frac{\sqrt n}{\sigma} \left[ \lambda\left(\frac{\sqrt n(c-u)}{\sigma}\right) - \frac{\sqrt n(c-u)}{\sigma} \right] \,du. \end{aligned} \]

For the standard normal inverse Mills ratio,

\[ \lambda(a)>a \qquad \text{for all } a\in\mathbb R. \]

Thus the integrand is positive for every \(u\in[0,\mu]\). Therefore,

\[ \log R(c)>0, \]

which implies

\[ R(c)>1. \]

Finally, since

\[ \log R(y) = \text{constant} - \frac{n\mu}{\sigma^2}y, \]

we have

\[ R(y)\to 0 \qquad \text{as } y\to\infty. \]

Since \(R(y)\) is continuous and strictly decreasing, with \(R(c)>1\) and \(R(y)\to 0\), there is a unique crossing point \(y^\star>c\).

Interpretation

If \(\mu=0\), passing the threshold \(Y>c\) is relatively surprising. Conditional on this event, \(Y\) is more likely to be just above \(c\), resulting a higher chance for ill-posed MCLE.

If \(\mu>0\), passing the threshold is less surprising, and the conditional distribution puts relatively more mass farther above \(c\).

This provides a useful intuition for selection bias adjustment: a phase 2 result that barely exceeds the go/no-go threshold is more consistent with a smaller true effect combined with selection-induced upward fluctuation than with a genuinely larger true effect.

5. Score equations

Now we illustrate how to obtain MCLE using sufficient statistics \((Y, S^2)\). Let

\[ \lambda(a)=\frac{\phi(a)}{1-\Phi(a)} \]

be the inverse Mills ratio.

Taking score equations with respect to \(\mu\) and \(\sigma\) gives

\[ \boxed{ \frac{\sqrt n(y-\mu)}{\sigma} = \lambda(a) } \tag{6} \]

and

\[ \boxed{ \sigma^2 = \frac{(n-1)s^2}{n+a\lambda(a)-\lambda(a)^2} } \tag{7} \]

Therefore, once \(a\) is known, \(\mu\) and \(\sigma^2\) can be recovered directly.

6. One-dimensional equation for \(a\)

Equations (6) and (7) are subject to two parameters \((\mu, \sigma^2)\). Here we show that MCLE can be solved from a 1-dim equation instead.

From

\[ a=\frac{\sqrt n(c-\mu)}{\sigma} \]

and the first score equation,

\[ \lambda(a)=\frac{\sqrt n(y-\mu)}{\sigma}, \]

subtracting gives

\[ \lambda(a)-a = \frac{\sqrt n(y-c)}{\sigma}. \]

Thus

\[ \sigma^2 = \frac{n(y-c)^2}{\{\lambda(a)-a\}^2}. \]

Combining this with the score-based expression for \(\sigma^2\) yields the one-dimensional equation

\[ \boxed{ n(y-c)^2\{n+a\lambda(a)-\lambda(a)^2\} - (n-1)s^2\{\lambda(a)-a\}^2 = 0 } \tag{8} \]

Let \(\widehat{a}\) be the solution. Then the conditional MLEs are

\[ \boxed{ \widehat{\sigma}^2 = \frac{(n-1)s^2}{n+\widehat{a}\lambda(\widehat{a})-\lambda(\widehat{a})^2} } \tag{9} \]

Equivalently,

\[ \boxed{ \widehat{\sigma}^2 = \frac{n(y-c)^2}{\{\lambda(\widehat{a})-\widehat{a}\}^2} } \tag{10} \]

The corresponding estimator of \(\mu\) is

\[ \boxed{ \widehat{\mu} = c-\frac{\widehat{a}\widehat{\sigma}}{\sqrt n} } \tag{11} \]

Equivalently,

\[ \boxed{ \widehat{\mu} = y-\frac{\widehat{\sigma}}{\sqrt n}\lambda(\widehat{a}) } \tag{12} \]

7. Behavior as \(y\to c^+\)

Now consider the case where the observed phase 2 mean barely exceeds the go/no-go threshold:

\[ y-c\downarrow 0. \]

Then

\[ r= \frac{(n-1)s^2}{n(y-c)^2} \to\infty. \]

Let

\[ m(a)=\lambda(a)-a. \]

Since \(\lambda(a)=a+m(a)\), the one-dimensional equation is equivalent to

\[ \boxed{ (r+1)m(a)^2+a m(a)-n=0 } \tag{16} \]

This form makes the boundary behavior transparent.

First, \(\widehat{a}\) cannot remain bounded as \(r\to\infty\). If \(a\) were bounded, then \(m(a)=\lambda(a)-a\) would be bounded away from zero, so the term \((r+1)m(a)^2\) would diverge. Therefore, any solution must diverge.

It cannot diverge to \(-\infty\), because when \(a\to-\infty\),

\[ \lambda(a)\to 0, \qquad m(a)=\lambda(a)-a\to\infty, \]

which again makes \((r+1)m(a)^2\) diverge. Hence the relevant solution satisfies

\[ \boxed{ \widehat{a}\to+\infty } \tag{17} \]

For large positive \(a\),

\[ \lambda(a)-a = \frac{1}{a}+O(a^{-3}). \]

Therefore,

\[ m(a)\sim \frac{1}{a}, \qquad a m(a)\to 1. \]

Substituting this into the equation above gives

\[ \frac{r+1}{a^2}+1-n\approx 0. \]

Thus

\[ \widehat{a}^2 \sim \frac{r}{n-1}. \]

Using the definition of \(r\),

\[ \boxed{ \widehat{a} \sim \frac{s}{\sqrt n(y-c)} } \tag{18} \]

Now,

\[ \widehat{\sigma} = \frac{\sqrt n(y-c)}{\lambda(\widehat{a})-\widehat{a}}. \]

Since

\[ \lambda(\widehat{a})-\widehat{a} \sim \frac{1}{\widehat{a}}, \]

we obtain

\[ \widehat{\sigma} \sim \sqrt n(y-c)\widehat{a} \sim s. \]

Therefore,

\[ \boxed{ \widehat{\sigma}^2\to s^2 } \tag{19} \]

Finally,

\[ \widehat{\mu} = c-\frac{\widehat{a}\widehat{\sigma}}{\sqrt n}. \]

Using the asymptotic expression for \(\widehat{a}\) and \(\widehat{\sigma}\to s\), we obtain

\[ \boxed{ \widehat{\mu} \sim c-\frac{s^2}{n(y-c)} } \tag{20} \]

Hence,

\[ \boxed{ \widehat{\mu}\to-\infty \qquad \text{as } y\to c^+ } \tag{21} \]

In words, when the observed phase 2 mean barely passes the threshold, the conditional MLE attributes most of the apparent success to the conditioning event \(Y>c\). The estimate of \(\sigma^2\) remains close to the observed sample variance, but the estimate of \(\mu\) is pulled sharply downward and has no finite limit as \(y\downarrow c\).

MCLE Adjustment After Positive Phase 2 Selection

Han Zhang

2026-04-29

1. Problem setup

2. Conditional likelihood based on \((\bar X,S^2)\)

3. Fast simulation of summary statistics conditional on \(A=\{Y>c\}\)

4. Conditional Density Ratio of \(Y\) After Selection

Main conclusion

Proof

Interpretation

5. Score equations

6. One-dimensional equation for \(a\)

7. Behavior as \(y\to c^+\)