In this note, we show that
Here we consider the one-sample problem. The result can be applied to the two-sample problem as in a two-arm randomized trial by replacing \(n\) with \(n/2\).
Suppose
\[ X_1,\ldots,X_n \overset{iid}{\sim} N(\mu,\sigma^2). \]
Only the summary statistics are available:
\[ Y=\bar X,\qquad S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2. \]
Let the observed values be
\[ y=\bar x,\qquad s^2=S^2_{\text{obs}}. \]
The phase 2 trial moves to phase 3 only if
\[ Y>c, \]
where \(c\) is the go/no-go threshold. The objective is to estimate \((\mu,\sigma^2)\) conditional on the selection event
\[ A=\{Y>c\}. \]
Assume throughout that
\[ y>c,\qquad s^2>0,\qquad n>1. \]
Under the normal model,
\[ Y\sim N\left(\mu,\frac{\sigma^2}{n}\right), \qquad \frac{(n-1)S^2}{\sigma^2}\sim \chi^2_{n-1}, \]
and \(Y\) and \(S^2\) are independent.
Because the selection event \(A=\{Y>c\}\) depends only on \(Y\), the conditional likelihood based on \((y,s^2)\) is
\[ L_c(\mu,\sigma\mid y,s^2,Y>c) = \frac{ f_Y(y\mid \mu,\sigma) f_{S^2}(s^2\mid \sigma) }{ P_{\mu,\sigma}(Y>c) }. \]
Define
\[ a=\frac{\sqrt n(c-\mu)}{\sigma}. \]
Then
\[ P_{\mu,\sigma}(Y>c)=1-\Phi(a). \]
Ignoring constants not involving \((\mu,\sigma)\),
\[ L_c(\mu,\sigma) \propto \frac{ \sigma^{-n} \exp\left[ -\frac{(n-1)s^2+n(y-\mu)^2}{2\sigma^2} \right] }{ 1-\Phi(a) }. \]
Equivalently,
\[ \ell_c(\mu,\sigma) = -n\log\sigma - \frac{(n-1)s^2+n(y-\mu)^2}{2\sigma^2} - \log\{1-\Phi(a)\} + \text{constant}. \]
To simulate selected phase 2 trials efficiently, there is no need to generate individual observations \(X_1,\ldots,X_n\). It is enough to simulate the sufficient statistics \((Y,S^2)\) directly.
Under the normal model,
\[ Y\sim N\left(\mu,\frac{\sigma^2}{n}\right), \qquad \frac{(n-1)S^2}{\sigma^2}\sim \chi^2_{n-1}, \]
and \(Y\) is independent of \(S^2\). Since the selection event
\[ A=\{Y>c\} \]
depends only on \(Y\), conditioning on \(A\) only changes the distribution of \(Y\). The conditional distribution of \(S^2\) remains unchanged.
Therefore,
\[ \boxed{ Y\mid A \sim N\left(\mu,\frac{\sigma^2}{n}\right) \text{ truncated below at } c } \tag{1} \]
and
\[ \boxed{ \frac{(n-1)S^2}{\sigma^2}\mid A \sim \chi^2_{n-1}. } \tag{2} \]
Moreover,
\[ \boxed{ Y\mid A \quad\text{and}\quad S^2\mid A \text{ are independent.} } \tag{3} \]
Thus, selected summary statistics can be simulated as follows:
\[ Y \sim N\left(\mu,\frac{\sigma^2}{n}\right) \]
conditional on \(Y>c\).
\[ W\sim \chi^2_{n-1}. \]
\[ S^2=\frac{\sigma^2 W}{n-1}. \]
Equivalently, let
\[ \alpha=\frac{\sqrt n(c-\mu)}{\sigma}. \]
If \(U\sim \mathrm{Uniform}(\Phi(\alpha),1)\), then
\[ \boxed{ Y = \mu+\frac{\sigma}{\sqrt n}\Phi^{-1}(U) } \tag{4} \]
has the desired conditional distribution \(Y\mid Y>c\). Independently,
\[ \boxed{ S^2 = \frac{\sigma^2}{n-1}\chi^2_{n-1}. } \tag{5} \]
This directly simulates \((Y,S^2)\mid Y>c\). It avoids rejection sampling and is much faster when \(P(Y>c)\) is small.
Using the method derived in sections below, I simulate the following scenarios to assess MCLE’s performance:
The results are summarized below.
A few observations:
Left: Ill-posed MCLE is defined as MCLE \(< -10\). The proportion of ill-posed increases dramatically when the true effect is away from the go threshold \(c = 0.33\). Intuitively, the more closer to the null, the higher chance that the observed effect \(Y\) is close to the threshold, given a go decision is made. Note that it is more likely to have \(Y\) near than far away from \(c\), which leads to ill-posed MCLE.
Middle: As expected, the observed effect is highly biased when the true effect is close to null.
Right: a few conclusions can be made for MCLE.
Now we illustrate how to obtain MCLE using sufficient statistics \((Y, S^2)\). Let
\[ \lambda(a)=\frac{\phi(a)}{1-\Phi(a)} \]
be the inverse Mills ratio.
Taking score equations with respect to \(\mu\) and \(\sigma\) gives
\[ \boxed{ \frac{\sqrt n(y-\mu)}{\sigma} = \lambda(a) } \tag{6} \]
and
\[ \boxed{ \sigma^2 = \frac{(n-1)s^2}{n+a\lambda(a)-\lambda(a)^2} } \tag{7} \]
Therefore, once \(a\) is known, \(\mu\) and \(\sigma^2\) can be recovered directly.
Equations (6) and (7) are subject to two parameters \((\mu, \sigma^2)\). Here we show that MCLE can be solved from a 1-dim equation instead.
From
\[ a=\frac{\sqrt n(c-\mu)}{\sigma} \]
and the first score equation,
\[ \lambda(a)=\frac{\sqrt n(y-\mu)}{\sigma}, \]
subtracting gives
\[ \lambda(a)-a = \frac{\sqrt n(y-c)}{\sigma}. \]
Thus
\[ \sigma^2 = \frac{n(y-c)^2}{\{\lambda(a)-a\}^2}. \]
Combining this with the score-based expression for \(\sigma^2\) yields the one-dimensional equation
\[ \boxed{ n(y-c)^2\{n+a\lambda(a)-\lambda(a)^2\} - (n-1)s^2\{\lambda(a)-a\}^2 = 0 } \tag{8} \]
Let \(\widehat{a}\) be the solution. Then the conditional MLEs are
\[ \boxed{ \widehat{\sigma}^2 = \frac{(n-1)s^2}{n+\widehat{a}\lambda(\widehat{a})-\lambda(\widehat{a})^2} } \tag{9} \]
Equivalently,
\[ \boxed{ \widehat{\sigma}^2 = \frac{n(y-c)^2}{\{\lambda(\widehat{a})-\widehat{a}\}^2} } \tag{10} \]
The corresponding estimator of \(\mu\) is
\[ \boxed{ \widehat{\mu} = c-\frac{\widehat{a}\widehat{\sigma}}{\sqrt n} } \tag{11} \]
Equivalently,
\[ \boxed{ \widehat{\mu} = y-\frac{\widehat{\sigma}}{\sqrt n}\lambda(\widehat{a}) } \tag{12} \]
Now consider the case where the observed phase 2 mean barely exceeds the go/no-go threshold:
\[ y-c\downarrow 0. \]
Then
\[ r= \frac{(n-1)s^2}{n(y-c)^2} \to\infty. \]
Let
\[ m(a)=\lambda(a)-a. \]
Since \(\lambda(a)=a+m(a)\), the one-dimensional equation is equivalent to
\[ \boxed{ (r+1)m(a)^2+a m(a)-n=0 } \tag{16} \]
This form makes the boundary behavior transparent.
First, \(\widehat{a}\) cannot remain bounded as \(r\to\infty\). If \(a\) were bounded, then \(m(a)=\lambda(a)-a\) would be bounded away from zero, so the term \((r+1)m(a)^2\) would diverge. Therefore, any solution must diverge.
It cannot diverge to \(-\infty\), because when \(a\to-\infty\),
\[ \lambda(a)\to 0, \qquad m(a)=\lambda(a)-a\to\infty, \]
which again makes \((r+1)m(a)^2\) diverge. Hence the relevant solution satisfies
\[ \boxed{ \widehat{a}\to+\infty } \tag{17} \]
For large positive \(a\),
\[ \lambda(a)-a = \frac{1}{a}+O(a^{-3}). \]
Therefore,
\[ m(a)\sim \frac{1}{a}, \qquad a m(a)\to 1. \]
Substituting this into the equation above gives
\[ \frac{r+1}{a^2}+1-n\approx 0. \]
Thus
\[ \widehat{a}^2 \sim \frac{r}{n-1}. \]
Using the definition of \(r\),
\[ \boxed{ \widehat{a} \sim \frac{s}{\sqrt n(y-c)} } \tag{18} \]
Now,
\[ \widehat{\sigma} = \frac{\sqrt n(y-c)}{\lambda(\widehat{a})-\widehat{a}}. \]
Since
\[ \lambda(\widehat{a})-\widehat{a} \sim \frac{1}{\widehat{a}}, \]
we obtain
\[ \widehat{\sigma} \sim \sqrt n(y-c)\widehat{a} \sim s. \]
Therefore,
\[ \boxed{ \widehat{\sigma}^2\to s^2 } \tag{19} \]
Finally,
\[ \widehat{\mu} = c-\frac{\widehat{a}\widehat{\sigma}}{\sqrt n}. \]
Using the asymptotic expression for \(\widehat{a}\) and \(\widehat{\sigma}\to s\), we obtain
\[ \boxed{ \widehat{\mu} \sim c-\frac{s^2}{n(y-c)} } \tag{20} \]
Hence,
\[ \boxed{ \widehat{\mu}\to-\infty \qquad \text{as } y\to c^+ } \tag{21} \]
In words, when the observed phase 2 mean barely passes the threshold, the conditional MLE attributes most of the apparent success to the conditioning event \(Y>c\). The estimate of \(\sigma^2\) remains close to the observed sample variance, but the estimate of \(\mu\) is pulled sharply downward and has no finite limit as \(y\downarrow c\).