In this note, we show that

1. Problem setup

Here we consider the one-sample problem. The result can be applied to the two-sample problem as in a two-arm randomized trial by replacing \(n\) with \(n/2\).

Suppose

\[ X_1,\ldots,X_n \overset{iid}{\sim} N(\mu,\sigma^2). \]

Only the summary statistics are available:

\[ Y=\bar X,\qquad S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2. \]

Let the observed values be

\[ y=\bar x,\qquad s^2=S^2_{\text{obs}}. \]

The phase 2 trial moves to phase 3 only if

\[ Y>c, \]

where \(c\) is the go/no-go threshold. The objective is to estimate \((\mu,\sigma^2)\) conditional on the selection event

\[ A=\{Y>c\}. \]

Assume throughout that

\[ y>c,\qquad s^2>0,\qquad n>1. \]

2. Conditional likelihood based on \((\bar X,S^2)\)

Under the normal model,

\[ Y\sim N\left(\mu,\frac{\sigma^2}{n}\right), \qquad \frac{(n-1)S^2}{\sigma^2}\sim \chi^2_{n-1}, \]

and \(Y\) and \(S^2\) are independent.

Because the selection event \(A=\{Y>c\}\) depends only on \(Y\), the conditional likelihood based on \((y,s^2)\) is

\[ L_c(\mu,\sigma\mid y,s^2,Y>c) = \frac{ f_Y(y\mid \mu,\sigma) f_{S^2}(s^2\mid \sigma) }{ P_{\mu,\sigma}(Y>c) }. \]

Define

\[ a=\frac{\sqrt n(c-\mu)}{\sigma}. \]

Then

\[ P_{\mu,\sigma}(Y>c)=1-\Phi(a). \]

Ignoring constants not involving \((\mu,\sigma)\),

\[ L_c(\mu,\sigma) \propto \frac{ \sigma^{-n} \exp\left[ -\frac{(n-1)s^2+n(y-\mu)^2}{2\sigma^2} \right] }{ 1-\Phi(a) }. \]

Equivalently,

\[ \ell_c(\mu,\sigma) = -n\log\sigma - \frac{(n-1)s^2+n(y-\mu)^2}{2\sigma^2} - \log\{1-\Phi(a)\} + \text{constant}. \]

3. Fast simulation of summary statistics conditional on \(A=\{Y>c\}\)

To simulate selected phase 2 trials efficiently, there is no need to generate individual observations \(X_1,\ldots,X_n\). It is enough to simulate the sufficient statistics \((Y,S^2)\) directly.

Under the normal model,

\[ Y\sim N\left(\mu,\frac{\sigma^2}{n}\right), \qquad \frac{(n-1)S^2}{\sigma^2}\sim \chi^2_{n-1}, \]

and \(Y\) is independent of \(S^2\). Since the selection event

\[ A=\{Y>c\} \]

depends only on \(Y\), conditioning on \(A\) only changes the distribution of \(Y\). The conditional distribution of \(S^2\) remains unchanged.

Therefore,

\[ \boxed{ Y\mid A \sim N\left(\mu,\frac{\sigma^2}{n}\right) \text{ truncated below at } c } \tag{1} \]

and

\[ \boxed{ \frac{(n-1)S^2}{\sigma^2}\mid A \sim \chi^2_{n-1}. } \tag{2} \]

Moreover,

\[ \boxed{ Y\mid A \quad\text{and}\quad S^2\mid A \text{ are independent.} } \tag{3} \]

Thus, selected summary statistics can be simulated as follows:

  1. Simulate

\[ Y \sim N\left(\mu,\frac{\sigma^2}{n}\right) \]

conditional on \(Y>c\).

  1. Independently simulate

\[ W\sim \chi^2_{n-1}. \]

  1. Set

\[ S^2=\frac{\sigma^2 W}{n-1}. \]

Equivalently, let

\[ \alpha=\frac{\sqrt n(c-\mu)}{\sigma}. \]

If \(U\sim \mathrm{Uniform}(\Phi(\alpha),1)\), then

\[ \boxed{ Y = \mu+\frac{\sigma}{\sqrt n}\Phi^{-1}(U) } \tag{4} \]

has the desired conditional distribution \(Y\mid Y>c\). Independently,

\[ \boxed{ S^2 = \frac{\sigma^2}{n-1}\chi^2_{n-1}. } \tag{5} \]

This directly simulates \((Y,S^2)\mid Y>c\). It avoids rejection sampling and is much faster when \(P(Y>c)\) is small.

Using the method derived in sections below, I simulate the following scenarios to assess MCLE’s performance:

The results are summarized below.

A few observations:

4. Score equations

Now we illustrate how to obtain MCLE using sufficient statistics \((Y, S^2)\). Let

\[ \lambda(a)=\frac{\phi(a)}{1-\Phi(a)} \]

be the inverse Mills ratio.

Taking score equations with respect to \(\mu\) and \(\sigma\) gives

\[ \boxed{ \frac{\sqrt n(y-\mu)}{\sigma} = \lambda(a) } \tag{6} \]

and

\[ \boxed{ \sigma^2 = \frac{(n-1)s^2}{n+a\lambda(a)-\lambda(a)^2} } \tag{7} \]

Therefore, once \(a\) is known, \(\mu\) and \(\sigma^2\) can be recovered directly.

5. One-dimensional equation for \(a\)

Equations (6) and (7) are subject to two parameters \((\mu, \sigma^2)\). Here we show that MCLE can be solved from a 1-dim equation instead.

From

\[ a=\frac{\sqrt n(c-\mu)}{\sigma} \]

and the first score equation,

\[ \lambda(a)=\frac{\sqrt n(y-\mu)}{\sigma}, \]

subtracting gives

\[ \lambda(a)-a = \frac{\sqrt n(y-c)}{\sigma}. \]

Thus

\[ \sigma^2 = \frac{n(y-c)^2}{\{\lambda(a)-a\}^2}. \]

Combining this with the score-based expression for \(\sigma^2\) yields the one-dimensional equation

\[ \boxed{ n(y-c)^2\{n+a\lambda(a)-\lambda(a)^2\} - (n-1)s^2\{\lambda(a)-a\}^2 = 0 } \tag{8} \]

Let \(\widehat{a}\) be the solution. Then the conditional MLEs are

\[ \boxed{ \widehat{\sigma}^2 = \frac{(n-1)s^2}{n+\widehat{a}\lambda(\widehat{a})-\lambda(\widehat{a})^2} } \tag{9} \]

Equivalently,

\[ \boxed{ \widehat{\sigma}^2 = \frac{n(y-c)^2}{\{\lambda(\widehat{a})-\widehat{a}\}^2} } \tag{10} \]

The corresponding estimator of \(\mu\) is

\[ \boxed{ \widehat{\mu} = c-\frac{\widehat{a}\widehat{\sigma}}{\sqrt n} } \tag{11} \]

Equivalently,

\[ \boxed{ \widehat{\mu} = y-\frac{\widehat{\sigma}}{\sqrt n}\lambda(\widehat{a}) } \tag{12} \]

6. Behavior as \(y\to c^+\)

Now consider the case where the observed phase 2 mean barely exceeds the go/no-go threshold:

\[ y-c\downarrow 0. \]

Then

\[ r= \frac{(n-1)s^2}{n(y-c)^2} \to\infty. \]

Let

\[ m(a)=\lambda(a)-a. \]

Since \(\lambda(a)=a+m(a)\), the one-dimensional equation is equivalent to

\[ \boxed{ (r+1)m(a)^2+a m(a)-n=0 } \tag{16} \]

This form makes the boundary behavior transparent.

First, \(\widehat{a}\) cannot remain bounded as \(r\to\infty\). If \(a\) were bounded, then \(m(a)=\lambda(a)-a\) would be bounded away from zero, so the term \((r+1)m(a)^2\) would diverge. Therefore, any solution must diverge.

It cannot diverge to \(-\infty\), because when \(a\to-\infty\),

\[ \lambda(a)\to 0, \qquad m(a)=\lambda(a)-a\to\infty, \]

which again makes \((r+1)m(a)^2\) diverge. Hence the relevant solution satisfies

\[ \boxed{ \widehat{a}\to+\infty } \tag{17} \]

For large positive \(a\),

\[ \lambda(a)-a = \frac{1}{a}+O(a^{-3}). \]

Therefore,

\[ m(a)\sim \frac{1}{a}, \qquad a m(a)\to 1. \]

Substituting this into the equation above gives

\[ \frac{r+1}{a^2}+1-n\approx 0. \]

Thus

\[ \widehat{a}^2 \sim \frac{r}{n-1}. \]

Using the definition of \(r\),

\[ \boxed{ \widehat{a} \sim \frac{s}{\sqrt n(y-c)} } \tag{18} \]

Now,

\[ \widehat{\sigma} = \frac{\sqrt n(y-c)}{\lambda(\widehat{a})-\widehat{a}}. \]

Since

\[ \lambda(\widehat{a})-\widehat{a} \sim \frac{1}{\widehat{a}}, \]

we obtain

\[ \widehat{\sigma} \sim \sqrt n(y-c)\widehat{a} \sim s. \]

Therefore,

\[ \boxed{ \widehat{\sigma}^2\to s^2 } \tag{19} \]

Finally,

\[ \widehat{\mu} = c-\frac{\widehat{a}\widehat{\sigma}}{\sqrt n}. \]

Using the asymptotic expression for \(\widehat{a}\) and \(\widehat{\sigma}\to s\), we obtain

\[ \boxed{ \widehat{\mu} \sim c-\frac{s^2}{n(y-c)} } \tag{20} \]

Hence,

\[ \boxed{ \widehat{\mu}\to-\infty \qquad \text{as } y\to c^+ } \tag{21} \]

In words, when the observed phase 2 mean barely passes the threshold, the conditional MLE attributes most of the apparent success to the conditioning event \(Y>c\). The estimate of \(\sigma^2\) remains close to the observed sample variance, but the estimate of \(\mu\) is pulled sharply downward and has no finite limit as \(y\downarrow c\).