Two-Sample MCLE Adjustment After Positive Phase 2 Selection

In this note, we show that

MCLE can be derived using two-sample sufficient statistics only;
Instead of solving a multi-parameter equation, MCLE can be solved from a 1-dim equation in \(a\);
MCLE of the treatment effect \(\delta\) approaches to \(-\infty\) when the observed phase 2 treatment effect barely exceeds the go/no-go threshold;
- We give approximated \(\widehat\delta\) when that happens
A fast simulation method exists;
MCLE is more likely to be ill-posed when the true treatment effect is small;
MCLE is applicable in practice when a go decision is made, even if the true treatment effect is unknown.

1. Problem setup

Here we consider the two-sample problem under the equal-variance assumption.

Suppose

\[ X_{Ti}\overset{iid}{\sim}N(\mu_T,\sigma^2), \qquad i=1,\ldots,n_T, \]

and

\[ X_{Cj}\overset{iid}{\sim}N(\mu_C,\sigma^2), \qquad j=1,\ldots,n_C. \]

The treatment effect is

\[ \delta=\mu_T-\mu_C. \]

Only the two-sample summary statistics are available. Define

\[ Y=\bar X_T-\bar X_C, \]

and the pooled variance

\[ S_p^2 = \frac{ (n_T-1)S_T^2+(n_C-1)S_C^2 }{ n_T+n_C-2 }. \]

Let the observed values be

\[ y=\bar x_T-\bar x_C, \qquad s_p^2=S_{p,\text{obs}}^2. \]

Define

\[ \kappa=\frac{1}{n_T}+\frac{1}{n_C}, \qquad \nu=n_T+n_C-2. \]

The phase 2 trial moves to phase 3 only if

\[ Y>c, \]

where \(c\) is the go/no-go threshold. The objective is to estimate \((\delta,\sigma^2)\) conditional on the selection event

\[ A=\{Y>c\}. \]

Assume throughout that

\[ y>c,\qquad s_p^2>0,\qquad \nu>0. \]

2. Conditional likelihood based on \((Y,S_p^2)\)

Under the two-sample normal model with common variance,

\[ Y\sim N(\delta,\sigma^2\kappa), \qquad \frac{\nu S_p^2}{\sigma^2}\sim\chi^2_{\nu}, \]

and \(Y\) and \(S_p^2\) are independent.

Because the selection event \(A=\{Y>c\}\) depends only on \(Y\), the conditional likelihood based on \((y,s_p^2)\) is

\[ L_c(\delta,\sigma\mid y,s_p^2,Y>c) = \frac{ f_Y(y\mid \delta,\sigma) f_{S_p^2}(s_p^2\mid \sigma) }{ P_{\delta,\sigma}(Y>c) }. \]

Define

\[ a=\frac{c-\delta}{\sigma\sqrt{\kappa}}. \]

Then

\[ P_{\delta,\sigma}(Y>c)=1-\Phi(a). \]

Ignoring constants not involving \((\delta,\sigma)\),

\[ L_c(\delta,\sigma) \propto \frac{ \sigma^{-(\nu+1)} \exp\left[ -\frac{\nu s_p^2+(y-\delta)^2/\kappa}{2\sigma^2} \right] }{ 1-\Phi(a) }. \]

Equivalently,

\[ \ell_c(\delta,\sigma) = -(\nu+1)\log\sigma - \frac{\nu s_p^2+(y-\delta)^2/\kappa}{2\sigma^2} - \log\{1-\Phi(a)\} + \text{constant}. \]

The first term is \(-(\nu+1)\log\sigma\), because the density of \(Y\) contributes one factor of \(\sigma^{-1}\), while the density of \(S_p^2\) contributes \(\sigma^{-\nu}\).

3. Fast simulation of summary statistics conditional on \(A=\{Y>c\}\)

To simulate selected phase 2 trials efficiently, there is no need to generate individual observations in the treatment and control groups. It is enough to simulate the sufficient statistics \((Y,S_p^2)\) directly.

Under the equal-variance normal model,

\[ Y\sim N(\delta,\sigma^2\kappa), \qquad \frac{\nu S_p^2}{\sigma^2}\sim\chi^2_\nu, \]

and \(Y\) is independent of \(S_p^2\). Since the selection event

\[ A=\{Y>c\} \]

depends only on \(Y\), conditioning on \(A\) only changes the distribution of \(Y\). The conditional distribution of \(S_p^2\) remains unchanged.

Therefore,

\[ \boxed{ Y\mid A \sim N(\delta,\sigma^2\kappa) \text{ truncated below at } c } \tag{1} \]

and

\[ \boxed{ \frac{\nu S_p^2}{\sigma^2}\mid A \sim \chi^2_\nu. } \tag{2} \]

Moreover,

\[ \boxed{ Y\mid A \quad\text{and}\quad S_p^2\mid A \text{ are independent.} } \tag{3} \]

Thus, selected summary statistics can be simulated as follows:

Simulate

\[ Y\sim N(\delta,\sigma^2\kappa) \]

conditional on \(Y>c\).

Independently simulate

\[ W\sim\chi^2_\nu. \]

\[ S_p^2=\frac{\sigma^2 W}{\nu}. \]

Equivalently, let

\[ \alpha=\frac{c-\delta}{\sigma\sqrt{\kappa}}. \]

If \(U\sim\mathrm{Uniform}(\Phi(\alpha),1)\), then

\[ \boxed{ Y = \delta+\sigma\sqrt{\kappa}\Phi^{-1}(U) } \tag{4} \]

has the desired conditional distribution \(Y\mid Y>c\). Independently,

\[ \boxed{ S_p^2 = \frac{\sigma^2}{\nu}\chi^2_\nu. } \tag{5} \]

This directly simulates \((Y,S_p^2)\mid Y>c\). It avoids rejection sampling and is much faster when \(P(Y>c)\) is small.

For balanced randomization with \(n_T=n_C=N\),

\[ \kappa=\frac{2}{N}, \qquad \nu=2N-2. \]

Thus, for the treatment effect estimator \(Y=\bar X_T-\bar X_C\), the role of the one-sample effective sample size is

\[ \frac{1}{\kappa}=\frac{N}{2}. \]

This explains why the one-sample simulation with \(n=25\) corresponds approximately to a balanced two-arm trial with 50 patients per arm, when the endpoint variance is the same in both arms.

Using the method derived in sections below, one can simulate the following two-sample scenarios to assess MCLE’s performance:

\(n_T=n_C=50\)
\(\sigma^2=1\)
\(c=0.33\)
\(\delta=0,0.05,0.1,\ldots,0.95,1.0\)
1000 replicates for each \(\delta\)

The results are summarized below.

The same interpretation applies as in the one-sample case:

Ill-posed MCLE is more frequent when the true treatment effect is close to the null.
The observed selected effect \(Y\) is upward biased when the true effect is small.
MCLE can be strongly downward-adjusted when the true effect is small, remedying the false positive go decision.
When the true effect is sufficiently above the go threshold, MCLE is typically more stable and more useful for confirmatory trial planning.

4. Conditional Density Ratio of \(Y\) After Selection

Recall that

\[ Y\sim N(\delta,\sigma^2\kappa). \]

The phase 2 trial is selected only if

\[ Y>c. \]

For a fixed value of \(\delta\), the conditional density of \(Y\) given \(Y>c\) is

\[ f_\delta(y\mid Y>c) = \frac{ \frac{1}{\sigma\sqrt{\kappa}} \phi\left(\frac{y-\delta}{\sigma\sqrt{\kappa}}\right) }{ 1-\Phi\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right) }, \qquad y>c. \]

We compare the conditional density under \(\delta=0\) with that under \(\delta>0\), while keeping \(n_T\), \(n_C\), \(\sigma\), and \(c\) fixed.

Main conclusion

Define the density ratio

\[ R(y) = \frac{ f_0(y\mid Y>c) }{ f_\delta(y\mid Y>c) }, \qquad y>c,\quad \delta>0. \]

Then \(R(y)\) is strictly decreasing in \(y\). Moreover, \(R(c)>1\) and \(R(y)\to 0\) as \(y\to\infty\). Therefore, there exists a unique crossing point \(y^\star>c\) such that

\[ f_0(y\mid Y>c)>f_\delta(y\mid Y>c), \qquad c<y<y^\star, \]

while

\[ f_0(y\mid Y>c)<f_\delta(y\mid Y>c), \qquad y>y^\star. \]

In words, after conditioning on \(Y>c\), the model with \(\delta=0\) puts relatively more mass near the threshold \(c\), while the model with \(\delta>0\) puts relatively more mass farther into the right tail. This explains why ill-posed MCLE is more likely when the true treatment effect is small.

Proof

From the conditional density formula,

\[ R(y) = \frac{ 1-\Phi\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right) }{ 1-\Phi\left(\frac{c}{\sigma\sqrt{\kappa}}\right) } \exp\left\{ -\frac{\delta y}{\sigma^2\kappa} + \frac{\delta^2}{2\sigma^2\kappa} \right\}. \]

Hence

\[ \log R(y) = \text{constant} - \frac{\delta}{\sigma^2\kappa}y. \]

Since \(\delta>0\),

\[ \frac{d}{dy}\log R(y) = -\frac{\delta}{\sigma^2\kappa} <0. \]

Therefore, \(R(y)\) is strictly decreasing in \(y\).

Next, evaluate the ratio at \(y=c\). We have

\[ \log R(c) = \log\left\{ 1-\Phi\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right) \right\} - \log\left\{ 1-\Phi\left(\frac{c}{\sigma\sqrt{\kappa}}\right) \right\} - \frac{\delta c}{\sigma^2\kappa} + \frac{\delta^2}{2\sigma^2\kappa}. \]

Using

\[ \lambda(a)=\frac{\phi(a)}{1-\Phi(a)}, \]

and

\[ \frac{d}{d\delta} \log\left\{ 1-\Phi\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right) \right\} = \frac{1}{\sigma\sqrt{\kappa}} \lambda\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right), \]

we can write

\[ \begin{aligned} \log R(c) &= \int_0^\delta \frac{1}{\sigma\sqrt{\kappa}} \lambda\left(\frac{c-u}{\sigma\sqrt{\kappa}}\right) \,du - \int_0^\delta \frac{c-u}{\sigma^2\kappa} \,du \\ &= \int_0^\delta \frac{1}{\sigma\sqrt{\kappa}} \left[ \lambda\left(\frac{c-u}{\sigma\sqrt{\kappa}}\right) - \frac{c-u}{\sigma\sqrt{\kappa}} \right] \,du. \end{aligned} \]

For the standard normal inverse Mills ratio,

\[ \lambda(a)>a \qquad \text{for all } a\in\mathbb R. \]

Thus the integrand is positive for every \(u\in[0,\delta]\). Therefore,

\[ \log R(c)>0, \]

which implies

\[ R(c)>1. \]

Finally, since

\[ \log R(y) = \text{constant} - \frac{\delta}{\sigma^2\kappa}y, \]

we have

\[ R(y)\to 0 \qquad \text{as } y\to\infty. \]

Since \(R(y)\) is continuous and strictly decreasing, with \(R(c)>1\) and \(R(y)\to 0\), there is a unique crossing point \(y^\star>c\).

Interpretation

If \(\delta=0\), passing the threshold \(Y>c\) is relatively surprising. Conditional on this event, \(Y\) is more likely to be just above \(c\), resulting in a higher chance for ill-posed MCLE.

If \(\delta>0\), passing the threshold is less surprising, and the conditional distribution puts relatively more mass farther above \(c\).

This provides a useful intuition for selection bias adjustment: a phase 2 result that barely exceeds the go/no-go threshold is more consistent with a smaller true treatment effect combined with selection-induced upward fluctuation than with a genuinely larger treatment effect.

5. Score equations

Now we illustrate how to obtain MCLE using sufficient statistics \((Y,S_p^2)\). Let

\[ \lambda(a)=\frac{\phi(a)}{1-\Phi(a)} \]

be the inverse Mills ratio.

Taking score equations with respect to \(\delta\) and \(\sigma\) gives

\[ \boxed{ \frac{y-\delta}{\sigma\sqrt{\kappa}} = \lambda(a) } \tag{6} \]

and

\[ \boxed{ \sigma^2 = \frac{\nu s_p^2}{\nu+1+a\lambda(a)-\lambda(a)^2} } \tag{7} \]

Therefore, once \(a\) is known, \(\delta\) and \(\sigma^2\) can be recovered directly.

6. One-dimensional equation for \(a\)

Equations (6) and (7) are subject to two parameters \((\delta,\sigma^2)\). Here we show that MCLE can be solved from a 1-dim equation instead.

From

\[ a=\frac{c-\delta}{\sigma\sqrt{\kappa}} \]

and the first score equation,

\[ \lambda(a)=\frac{y-\delta}{\sigma\sqrt{\kappa}}, \]

subtracting gives

\[ \lambda(a)-a = \frac{y-c}{\sigma\sqrt{\kappa}}. \]

Thus

\[ \sigma^2 = \frac{(y-c)^2}{\kappa\{\lambda(a)-a\}^2}. \]

Combining this with the score-based expression for \(\sigma^2\) yields the one-dimensional equation

\[ \boxed{ \frac{(y-c)^2}{\kappa} \{\nu+1+a\lambda(a)-\lambda(a)^2\} - \nu s_p^2\{\lambda(a)-a\}^2 = 0 } \tag{8} \]

Let \(\widehat a\) be the solution. Then the conditional MLEs are

\[ \boxed{ \widehat\sigma^2 = \frac{\nu s_p^2}{\nu+1+\widehat a\lambda(\widehat a)-\lambda(\widehat a)^2} } \tag{9} \]

Equivalently,

\[ \boxed{ \widehat\sigma^2 = \frac{(y-c)^2}{\kappa\{\lambda(\widehat a)-\widehat a\}^2} } \tag{10} \]

The corresponding estimator of \(\delta\) is

\[ \boxed{ \widehat\delta = c-\widehat a\,\widehat\sigma\sqrt{\kappa} } \tag{11} \]

Equivalently,

\[ \boxed{ \widehat\delta = y-\widehat\sigma\sqrt{\kappa}\lambda(\widehat a) } \tag{12} \]

7. Behavior as \(y\to c^+\)

Now consider the case where the observed phase 2 treatment effect barely exceeds the go/no-go threshold:

\[ y-c\downarrow 0. \]

Then

\[ r= \frac{\nu s_p^2\kappa}{(y-c)^2} \to\infty. \]

Let

\[ m(a)=\lambda(a)-a. \]

Since \(\lambda(a)=a+m(a)\), the one-dimensional equation is equivalent to

\[ \boxed{ (r+1)m(a)^2+a m(a)-(\nu+1)=0 } \tag{13} \]

This form makes the boundary behavior transparent.

First, \(\widehat a\) cannot remain bounded as \(r\to\infty\). If \(a\) were bounded, then \(m(a)=\lambda(a)-a\) would be bounded away from zero, so the term \((r+1)m(a)^2\) would diverge. Therefore, any solution must diverge.

It cannot diverge to \(-\infty\), because when \(a\to-\infty\),

\[ \lambda(a)\to 0, \qquad m(a)=\lambda(a)-a\to\infty, \]

which again makes \((r+1)m(a)^2\) diverge. Hence the relevant solution satisfies

\[ \boxed{ \widehat a\to+\infty } \tag{14} \]

For large positive \(a\),

\[ \lambda(a)-a = \frac{1}{a}+O(a^{-3}). \]

Therefore,

\[ m(a)\sim \frac{1}{a}, \qquad a m(a)\to 1. \]

Substituting this into the equation above gives

\[ \frac{r+1}{a^2}+1-(\nu+1)\approx 0. \]

Thus

\[ \widehat a^2 \sim \frac{r}{\nu}. \]

Using the definition of \(r\),

\[ \boxed{ \widehat a \sim \frac{s_p\sqrt{\kappa}}{y-c} } \tag{15} \]

Now,

\[ \widehat\sigma = \frac{y-c}{\sqrt{\kappa}\{\lambda(\widehat a)-\widehat a\}}. \]

Since

\[ \lambda(\widehat a)-\widehat a \sim \frac{1}{\widehat a}, \]

we obtain

\[ \widehat\sigma \sim \frac{(y-c)\widehat a}{\sqrt{\kappa}} \sim s_p. \]

Therefore,

\[ \boxed{ \widehat\sigma^2\to s_p^2 } \tag{16} \]

Finally,

\[ \widehat\delta = c-\widehat a\,\widehat\sigma\sqrt{\kappa}. \]

Using the asymptotic expression for \(\widehat a\) and \(\widehat\sigma\to s_p\), we obtain

\[ \boxed{ \widehat\delta \sim c-\frac{s_p^2\kappa}{y-c} } \tag{17} \]

Hence,

\[ \boxed{ \widehat\delta\to-\infty \qquad \text{as } y\to c^+ } \tag{18} \]

In words, when the observed phase 2 treatment effect barely passes the threshold, the conditional MLE attributes most of the apparent success to the conditioning event \(Y>c\). The estimate of \(\sigma^2\) remains close to the observed pooled sample variance, but the estimate of the treatment effect \(\delta\) is pulled sharply downward and has no finite limit as \(y\downarrow c\).

8. Practical implementation mapping from the one-sample code

The one-sample implementation can be adapted directly by replacing the one-sample summary inputs with the two-sample effective quantities.

The core replacements are:

\[ \mu \longrightarrow \delta, \qquad s^2 \longrightarrow s_p^2, \qquad n \longrightarrow \frac{1}{\kappa}, \qquad n-1 \longrightarrow \nu. \]

However, the replacement \(n\to1/\kappa\) should be applied carefully. In the likelihood, the power of \(\sigma\) is controlled by \(\nu+1\), not by \(1/\kappa\). Therefore, the safest implementation is to use the two-sample equations directly:

\[ \frac{(y-c)^2}{\kappa} \{\nu+1+a\lambda(a)-\lambda(a)^2\} - \nu s_p^2\{\lambda(a)-a\}^2 = 0. \]

Then recover

\[ \widehat\sigma = \frac{y-c}{\sqrt{\kappa}\{\lambda(\widehat a)-\widehat a\}}, \]

and

\[ \widehat\delta = c-\widehat a\,\widehat\sigma\sqrt{\kappa}. \]

For fast simulation under a selected two-sample phase 2 trial, use

\[ Y = \delta+\sigma\sqrt{\kappa}\Phi^{-1}(U), \qquad U\sim\mathrm{Uniform}\left( \Phi\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right), 1 \right), \]

and independently

\[ S_p^2 = \frac{\sigma^2}{\nu}\chi^2_\nu. \]

Two-Sample MCLE Adjustment After Positive Phase 2 Selection

Han Zhang

2026-04-29

1. Problem setup

2. Conditional likelihood based on \((Y,S_p^2)\)

3. Fast simulation of summary statistics conditional on \(A=\{Y>c\}\)

4. Conditional Density Ratio of \(Y\) After Selection

Main conclusion

Proof

Interpretation

5. Score equations

6. One-dimensional equation for \(a\)

7. Behavior as \(y\to c^+\)

8. Practical implementation mapping from the one-sample code