In this note, we show that
- MCLE can be derived using two-sample sufficient statistics
only;
- Instead of solving a multi-parameter equation, MCLE can be solved
from a 1-dim equation in \(a\);
- MCLE of the treatment effect \(\delta\) approaches to \(-\infty\) when the observed phase 2
treatment effect barely exceeds the go/no-go threshold;
- We give approximated \(\widehat\delta\) when that happens
- A fast simulation method exists;
- MCLE is more likely to be ill-posed when the true treatment effect
is small;
- MCLE is applicable in practice when a go decision is made, even if
the true treatment effect is unknown.
1. Problem setup
Here we consider the two-sample problem under the equal-variance
assumption.
Suppose
\[
X_{Ti}\overset{iid}{\sim}N(\mu_T,\sigma^2),
\qquad
i=1,\ldots,n_T,
\]
and
\[
X_{Cj}\overset{iid}{\sim}N(\mu_C,\sigma^2),
\qquad
j=1,\ldots,n_C.
\]
The treatment effect is
\[
\delta=\mu_T-\mu_C.
\]
Only the two-sample summary statistics are available. Define
\[
Y=\bar X_T-\bar X_C,
\]
and the pooled variance
\[
S_p^2
=
\frac{
(n_T-1)S_T^2+(n_C-1)S_C^2
}{
n_T+n_C-2
}.
\]
Let the observed values be
\[
y=\bar x_T-\bar x_C,
\qquad
s_p^2=S_{p,\text{obs}}^2.
\]
Define
\[
\kappa=\frac{1}{n_T}+\frac{1}{n_C},
\qquad
\nu=n_T+n_C-2.
\]
The phase 2 trial moves to phase 3 only if
\[
Y>c,
\]
where \(c\) is the go/no-go
threshold. The objective is to estimate \((\delta,\sigma^2)\) conditional on the
selection event
\[
A=\{Y>c\}.
\]
Assume throughout that
\[
y>c,\qquad s_p^2>0,\qquad \nu>0.
\]
2. Conditional likelihood based on \((Y,S_p^2)\)
Under the two-sample normal model with common variance,
\[
Y\sim N(\delta,\sigma^2\kappa),
\qquad
\frac{\nu S_p^2}{\sigma^2}\sim\chi^2_{\nu},
\]
and \(Y\) and \(S_p^2\) are independent.
Because the selection event \(A=\{Y>c\}\) depends only on \(Y\), the conditional likelihood based on
\((y,s_p^2)\) is
\[
L_c(\delta,\sigma\mid y,s_p^2,Y>c)
=
\frac{
f_Y(y\mid \delta,\sigma) f_{S_p^2}(s_p^2\mid \sigma)
}{
P_{\delta,\sigma}(Y>c)
}.
\]
Define
\[
a=\frac{c-\delta}{\sigma\sqrt{\kappa}}.
\]
Then
\[
P_{\delta,\sigma}(Y>c)=1-\Phi(a).
\]
Ignoring constants not involving \((\delta,\sigma)\),
\[
L_c(\delta,\sigma)
\propto
\frac{
\sigma^{-(\nu+1)}
\exp\left[
-\frac{\nu s_p^2+(y-\delta)^2/\kappa}{2\sigma^2}
\right]
}{
1-\Phi(a)
}.
\]
Equivalently,
\[
\ell_c(\delta,\sigma)
=
-(\nu+1)\log\sigma
-
\frac{\nu s_p^2+(y-\delta)^2/\kappa}{2\sigma^2}
-
\log\{1-\Phi(a)\}
+
\text{constant}.
\]
The first term is \(-(\nu+1)\log\sigma\), because the density
of \(Y\) contributes one factor of
\(\sigma^{-1}\), while the density of
\(S_p^2\) contributes \(\sigma^{-\nu}\).
3. Fast simulation of summary statistics conditional on \(A=\{Y>c\}\)
To simulate selected phase 2 trials efficiently, there is no need to
generate individual observations in the treatment and control groups. It
is enough to simulate the sufficient statistics \((Y,S_p^2)\) directly.
Under the equal-variance normal model,
\[
Y\sim N(\delta,\sigma^2\kappa),
\qquad
\frac{\nu S_p^2}{\sigma^2}\sim\chi^2_\nu,
\]
and \(Y\) is independent of \(S_p^2\). Since the selection event
\[
A=\{Y>c\}
\]
depends only on \(Y\), conditioning
on \(A\) only changes the distribution
of \(Y\). The conditional distribution
of \(S_p^2\) remains unchanged.
Therefore,
\[
\boxed{
Y\mid A
\sim
N(\delta,\sigma^2\kappa)
\text{ truncated below at } c
}
\tag{1}
\]
and
\[
\boxed{
\frac{\nu S_p^2}{\sigma^2}\mid A
\sim
\chi^2_\nu.
}
\tag{2}
\]
Moreover,
\[
\boxed{
Y\mid A
\quad\text{and}\quad
S_p^2\mid A
\text{ are independent.}
}
\tag{3}
\]
Thus, selected summary statistics can be simulated as follows:
- Simulate
\[
Y\sim N(\delta,\sigma^2\kappa)
\]
conditional on \(Y>c\).
- Independently simulate
\[
W\sim\chi^2_\nu.
\]
- Set
\[
S_p^2=\frac{\sigma^2 W}{\nu}.
\]
Equivalently, let
\[
\alpha=\frac{c-\delta}{\sigma\sqrt{\kappa}}.
\]
If \(U\sim\mathrm{Uniform}(\Phi(\alpha),1)\),
then
\[
\boxed{
Y
=
\delta+\sigma\sqrt{\kappa}\Phi^{-1}(U)
}
\tag{4}
\]
has the desired conditional distribution \(Y\mid Y>c\). Independently,
\[
\boxed{
S_p^2
=
\frac{\sigma^2}{\nu}\chi^2_\nu.
}
\tag{5}
\]
This directly simulates \((Y,S_p^2)\mid
Y>c\). It avoids rejection sampling and is much faster when
\(P(Y>c)\) is small.
For balanced randomization with \(n_T=n_C=N\),
\[
\kappa=\frac{2}{N},
\qquad
\nu=2N-2.
\]
Thus, for the treatment effect estimator \(Y=\bar X_T-\bar X_C\), the role of the
one-sample effective sample size is
\[
\frac{1}{\kappa}=\frac{N}{2}.
\]
This explains why the one-sample simulation with \(n=25\) corresponds approximately to a
balanced two-arm trial with 50 patients per arm, when the endpoint
variance is the same in both arms.
Using the method derived in sections below, one can simulate the
following two-sample scenarios to assess MCLE’s performance:
- \(n_T=n_C=50\)
- \(\sigma^2=1\)
- \(c=0.33\)
- \(\delta=0,0.05,0.1,\ldots,0.95,1.0\)
- 1000 replicates for each \(\delta\)
The results are summarized below.

The same interpretation applies as in the one-sample case:
- Ill-posed MCLE is more frequent when the true treatment effect is
close to the null.
- The observed selected effect \(Y\)
is upward biased when the true effect is small.
- MCLE can be strongly downward-adjusted when the true effect is
small, remedying the false positive go decision.
- When the true effect is sufficiently above the go threshold, MCLE is
typically more stable and more useful for confirmatory trial
planning.
4. Conditional Density Ratio of \(Y\) After Selection
Recall that
\[
Y\sim N(\delta,\sigma^2\kappa).
\]
The phase 2 trial is selected only if
\[
Y>c.
\]
For a fixed value of \(\delta\), the
conditional density of \(Y\) given
\(Y>c\) is
\[
f_\delta(y\mid Y>c)
=
\frac{
\frac{1}{\sigma\sqrt{\kappa}}
\phi\left(\frac{y-\delta}{\sigma\sqrt{\kappa}}\right)
}{
1-\Phi\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right)
},
\qquad y>c.
\]
We compare the conditional density under \(\delta=0\) with that under \(\delta>0\), while keeping \(n_T\), \(n_C\), \(\sigma\), and \(c\) fixed.
Main conclusion
Define the density ratio
\[
R(y)
=
\frac{
f_0(y\mid Y>c)
}{
f_\delta(y\mid Y>c)
},
\qquad y>c,\quad \delta>0.
\]
Then \(R(y)\) is strictly decreasing
in \(y\). Moreover, \(R(c)>1\) and \(R(y)\to 0\) as \(y\to\infty\). Therefore, there exists a
unique crossing point \(y^\star>c\)
such that
\[
f_0(y\mid Y>c)>f_\delta(y\mid Y>c),
\qquad c<y<y^\star,
\]
while
\[
f_0(y\mid Y>c)<f_\delta(y\mid Y>c),
\qquad y>y^\star.
\]
In words, after conditioning on \(Y>c\), the model with \(\delta=0\) puts relatively more mass near
the threshold \(c\), while the model
with \(\delta>0\) puts relatively
more mass farther into the right tail. This explains why ill-posed MCLE
is more likely when the true treatment effect is small.
Proof
From the conditional density formula,
\[
R(y)
=
\frac{
1-\Phi\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right)
}{
1-\Phi\left(\frac{c}{\sigma\sqrt{\kappa}}\right)
}
\exp\left\{
-\frac{\delta y}{\sigma^2\kappa}
+
\frac{\delta^2}{2\sigma^2\kappa}
\right\}.
\]
Hence
\[
\log R(y)
=
\text{constant}
-
\frac{\delta}{\sigma^2\kappa}y.
\]
Since \(\delta>0\),
\[
\frac{d}{dy}\log R(y)
=
-\frac{\delta}{\sigma^2\kappa}
<0.
\]
Therefore, \(R(y)\) is strictly
decreasing in \(y\).
Next, evaluate the ratio at \(y=c\).
We have
\[
\log R(c)
=
\log\left\{
1-\Phi\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right)
\right\}
-
\log\left\{
1-\Phi\left(\frac{c}{\sigma\sqrt{\kappa}}\right)
\right\}
-
\frac{\delta c}{\sigma^2\kappa}
+
\frac{\delta^2}{2\sigma^2\kappa}.
\]
Using
\[
\lambda(a)=\frac{\phi(a)}{1-\Phi(a)},
\]
and
\[
\frac{d}{d\delta}
\log\left\{
1-\Phi\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right)
\right\}
=
\frac{1}{\sigma\sqrt{\kappa}}
\lambda\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right),
\]
we can write
\[
\begin{aligned}
\log R(c)
&=
\int_0^\delta
\frac{1}{\sigma\sqrt{\kappa}}
\lambda\left(\frac{c-u}{\sigma\sqrt{\kappa}}\right)
\,du
-
\int_0^\delta
\frac{c-u}{\sigma^2\kappa}
\,du \\
&=
\int_0^\delta
\frac{1}{\sigma\sqrt{\kappa}}
\left[
\lambda\left(\frac{c-u}{\sigma\sqrt{\kappa}}\right)
-
\frac{c-u}{\sigma\sqrt{\kappa}}
\right]
\,du.
\end{aligned}
\]
For the standard normal inverse Mills ratio,
\[
\lambda(a)>a
\qquad \text{for all } a\in\mathbb R.
\]
Thus the integrand is positive for every \(u\in[0,\delta]\). Therefore,
\[
\log R(c)>0,
\]
which implies
\[
R(c)>1.
\]
Finally, since
\[
\log R(y)
=
\text{constant}
-
\frac{\delta}{\sigma^2\kappa}y,
\]
we have
\[
R(y)\to 0
\qquad \text{as } y\to\infty.
\]
Since \(R(y)\) is continuous and
strictly decreasing, with \(R(c)>1\)
and \(R(y)\to 0\), there is a unique
crossing point \(y^\star>c\).
Interpretation
If \(\delta=0\), passing the
threshold \(Y>c\) is relatively
surprising. Conditional on this event, \(Y\) is more likely to be just above \(c\), resulting in a higher chance for
ill-posed MCLE.
If \(\delta>0\), passing the
threshold is less surprising, and the conditional distribution puts
relatively more mass farther above \(c\).
This provides a useful intuition for selection bias adjustment: a
phase 2 result that barely exceeds the go/no-go threshold is more
consistent with a smaller true treatment effect combined with
selection-induced upward fluctuation than with a genuinely larger
treatment effect.
5. Score equations
Now we illustrate how to obtain MCLE using sufficient statistics
\((Y,S_p^2)\). Let
\[
\lambda(a)=\frac{\phi(a)}{1-\Phi(a)}
\]
be the inverse Mills ratio.
Taking score equations with respect to \(\delta\) and \(\sigma\) gives
\[
\boxed{
\frac{y-\delta}{\sigma\sqrt{\kappa}}
=
\lambda(a)
}
\tag{6}
\]
and
\[
\boxed{
\sigma^2
=
\frac{\nu s_p^2}{\nu+1+a\lambda(a)-\lambda(a)^2}
}
\tag{7}
\]
Therefore, once \(a\) is known,
\(\delta\) and \(\sigma^2\) can be recovered directly.
6. One-dimensional equation for \(a\)
Equations (6) and (7) are subject to two parameters \((\delta,\sigma^2)\). Here we show that MCLE
can be solved from a 1-dim equation instead.
From
\[
a=\frac{c-\delta}{\sigma\sqrt{\kappa}}
\]
and the first score equation,
\[
\lambda(a)=\frac{y-\delta}{\sigma\sqrt{\kappa}},
\]
subtracting gives
\[
\lambda(a)-a
=
\frac{y-c}{\sigma\sqrt{\kappa}}.
\]
Thus
\[
\sigma^2
=
\frac{(y-c)^2}{\kappa\{\lambda(a)-a\}^2}.
\]
Combining this with the score-based expression for \(\sigma^2\) yields the one-dimensional
equation
\[
\boxed{
\frac{(y-c)^2}{\kappa}
\{\nu+1+a\lambda(a)-\lambda(a)^2\}
-
\nu s_p^2\{\lambda(a)-a\}^2
=
0
}
\tag{8}
\]
Let \(\widehat a\) be the solution.
Then the conditional MLEs are
\[
\boxed{
\widehat\sigma^2
=
\frac{\nu s_p^2}{\nu+1+\widehat a\lambda(\widehat a)-\lambda(\widehat
a)^2}
}
\tag{9}
\]
Equivalently,
\[
\boxed{
\widehat\sigma^2
=
\frac{(y-c)^2}{\kappa\{\lambda(\widehat a)-\widehat a\}^2}
}
\tag{10}
\]
The corresponding estimator of \(\delta\) is
\[
\boxed{
\widehat\delta
=
c-\widehat a\,\widehat\sigma\sqrt{\kappa}
}
\tag{11}
\]
Equivalently,
\[
\boxed{
\widehat\delta
=
y-\widehat\sigma\sqrt{\kappa}\lambda(\widehat a)
}
\tag{12}
\]
7. Behavior as \(y\to c^+\)
Now consider the case where the observed phase 2 treatment effect
barely exceeds the go/no-go threshold:
\[
y-c\downarrow 0.
\]
Then
\[
r=
\frac{\nu s_p^2\kappa}{(y-c)^2}
\to\infty.
\]
Let
\[
m(a)=\lambda(a)-a.
\]
Since \(\lambda(a)=a+m(a)\), the
one-dimensional equation is equivalent to
\[
\boxed{
(r+1)m(a)^2+a m(a)-(\nu+1)=0
}
\tag{13}
\]
This form makes the boundary behavior transparent.
First, \(\widehat a\) cannot remain
bounded as \(r\to\infty\). If \(a\) were bounded, then \(m(a)=\lambda(a)-a\) would be bounded away
from zero, so the term \((r+1)m(a)^2\)
would diverge. Therefore, any solution must diverge.
It cannot diverge to \(-\infty\),
because when \(a\to-\infty\),
\[
\lambda(a)\to 0,
\qquad
m(a)=\lambda(a)-a\to\infty,
\]
which again makes \((r+1)m(a)^2\)
diverge. Hence the relevant solution satisfies
\[
\boxed{
\widehat a\to+\infty
}
\tag{14}
\]
For large positive \(a\),
\[
\lambda(a)-a
=
\frac{1}{a}+O(a^{-3}).
\]
Therefore,
\[
m(a)\sim \frac{1}{a},
\qquad
a m(a)\to 1.
\]
Substituting this into the equation above gives
\[
\frac{r+1}{a^2}+1-(\nu+1)\approx 0.
\]
Thus
\[
\widehat a^2
\sim
\frac{r}{\nu}.
\]
Using the definition of \(r\),
\[
\boxed{
\widehat a
\sim
\frac{s_p\sqrt{\kappa}}{y-c}
}
\tag{15}
\]
Now,
\[
\widehat\sigma
=
\frac{y-c}{\sqrt{\kappa}\{\lambda(\widehat a)-\widehat a\}}.
\]
Since
\[
\lambda(\widehat a)-\widehat a
\sim
\frac{1}{\widehat a},
\]
we obtain
\[
\widehat\sigma
\sim
\frac{(y-c)\widehat a}{\sqrt{\kappa}}
\sim
s_p.
\]
Therefore,
\[
\boxed{
\widehat\sigma^2\to s_p^2
}
\tag{16}
\]
Finally,
\[
\widehat\delta
=
c-\widehat a\,\widehat\sigma\sqrt{\kappa}.
\]
Using the asymptotic expression for \(\widehat a\) and \(\widehat\sigma\to s_p\), we obtain
\[
\boxed{
\widehat\delta
\sim
c-\frac{s_p^2\kappa}{y-c}
}
\tag{17}
\]
Hence,
\[
\boxed{
\widehat\delta\to-\infty
\qquad \text{as } y\to c^+
}
\tag{18}
\]
In words, when the observed phase 2 treatment effect barely passes
the threshold, the conditional MLE attributes most of the apparent
success to the conditioning event \(Y>c\). The estimate of \(\sigma^2\) remains close to the observed
pooled sample variance, but the estimate of the treatment effect \(\delta\) is pulled sharply downward and has
no finite limit as \(y\downarrow
c\).
8. Practical implementation mapping from the one-sample code
The one-sample implementation can be adapted directly by replacing
the one-sample summary inputs with the two-sample effective
quantities.
The core replacements are:
\[
\mu \longrightarrow \delta,
\qquad
s^2 \longrightarrow s_p^2,
\qquad
n \longrightarrow \frac{1}{\kappa},
\qquad
n-1 \longrightarrow \nu.
\]
However, the replacement \(n\to1/\kappa\) should be applied carefully.
In the likelihood, the power of \(\sigma\) is controlled by \(\nu+1\), not by \(1/\kappa\). Therefore, the safest
implementation is to use the two-sample equations directly:
\[
\frac{(y-c)^2}{\kappa}
\{\nu+1+a\lambda(a)-\lambda(a)^2\}
-
\nu s_p^2\{\lambda(a)-a\}^2
=
0.
\]
Then recover
\[
\widehat\sigma
=
\frac{y-c}{\sqrt{\kappa}\{\lambda(\widehat a)-\widehat a\}},
\]
and
\[
\widehat\delta
=
c-\widehat a\,\widehat\sigma\sqrt{\kappa}.
\]
For fast simulation under a selected two-sample phase 2 trial,
use
\[
Y
=
\delta+\sigma\sqrt{\kappa}\Phi^{-1}(U),
\qquad
U\sim\mathrm{Uniform}\left(
\Phi\left(\frac{c-\delta}{\sigma\sqrt{\kappa}}\right),
1
\right),
\]
and independently
\[
S_p^2
=
\frac{\sigma^2}{\nu}\chi^2_\nu.
\]