Two-Sample MCLE Adjustment After Positive Phase 2 Selection Under Unequal Variance

In this note, we show that

MCLE can be derived using two-sample sufficient statistics only;
Under unequal variances, the likelihood depends on \((Y,S_T^2,S_C^2)\) rather than \((Y,S_p^2)\);
Unlike the equal-variance case, MCLE generally cannot be reduced to a single 1-dim equation in \(a\);
A fast simulation method still exists because the selection event depends only on \(Y\);
MCLE of the treatment effect \(\delta\) approaches to \(-\infty\) when the observed phase 2 treatment effect barely exceeds the go/no-go threshold;
MCLE is more likely to be ill-posed when the true treatment effect is small.

1. Problem setup

Here we consider the two-sample problem without assuming equal variances.

Suppose

\[ X_{Ti}\overset{iid}{\sim}N(\mu_T,\sigma_T^2), \qquad i=1,\ldots,n_T, \]

and

\[ X_{Cj}\overset{iid}{\sim}N(\mu_C,\sigma_C^2), \qquad j=1,\ldots,n_C. \]

The treatment effect is

\[ \delta=\mu_T-\mu_C. \]

Only the two-sample summary statistics are available. Define

\[ Y=\bar X_T-\bar X_C. \]

Let the observed values be

\[ y=\bar x_T-\bar x_C, \qquad s_T^2=S_{T,\text{obs}}^2, \qquad s_C^2=S_{C,\text{obs}}^2. \]

Define

\[ \nu_T=n_T-1, \qquad \nu_C=n_C-1. \]

The phase 2 trial moves to phase 3 only if

\[ Y>c, \]

where \(c\) is the go/no-go threshold. The objective is to estimate

\[ (\delta,\sigma_T^2,\sigma_C^2) \]

conditional on the selection event

\[ A=\{Y>c\}. \]

Assume throughout that

\[ y>c,\qquad s_T^2>0,\qquad s_C^2>0,\qquad n_T>1,\qquad n_C>1. \]

2. Conditional likelihood based on \((Y,S_T^2,S_C^2)\)

Under the two-sample normal model,

\[ Y\sim N(\delta,V), \]

where

\[ V= \frac{\sigma_T^2}{n_T} + \frac{\sigma_C^2}{n_C}. \]

Also,

\[ \frac{\nu_T S_T^2}{\sigma_T^2}\sim\chi^2_{\nu_T}, \qquad \frac{\nu_C S_C^2}{\sigma_C^2}\sim\chi^2_{\nu_C}. \]

Moreover,

\[ Y,\quad S_T^2,\quad S_C^2 \]

are mutually independent.

Because the selection event \(A=\{Y>c\}\) depends only on \(Y\), the conditional likelihood based on \((y,s_T^2,s_C^2)\) is

\[ L_c(\delta,\sigma_T^2,\sigma_C^2\mid y,s_T^2,s_C^2,Y>c) = \frac{ f_Y(y\mid \delta,\sigma_T^2,\sigma_C^2) f_{S_T^2}(s_T^2\mid \sigma_T^2) f_{S_C^2}(s_C^2\mid \sigma_C^2) }{ P_{\delta,\sigma_T,\sigma_C}(Y>c) }. \]

Define

\[ a=\frac{c-\delta}{\sqrt V}. \]

Then

\[ P_{\delta,\sigma_T,\sigma_C}(Y>c)=1-\Phi(a). \]

Ignoring constants not involving \((\delta,\sigma_T^2,\sigma_C^2)\),

\[ \boxed{ L_c(\delta,\sigma_T^2,\sigma_C^2) \propto \frac{ V^{-1/2} \exp\left[ -\frac{(y-\delta)^2}{2V} \right] (\sigma_T^2)^{-\nu_T/2} \exp\left[ -\frac{\nu_T s_T^2}{2\sigma_T^2} \right] (\sigma_C^2)^{-\nu_C/2} \exp\left[ -\frac{\nu_C s_C^2}{2\sigma_C^2} \right] }{ 1-\Phi(a) }. } \tag{1} \]

Equivalently,

\[ \boxed{ \begin{aligned} \ell_c(\delta,\sigma_T^2,\sigma_C^2) =& -\frac12\log V -\frac{(y-\delta)^2}{2V} \\ &-\frac{\nu_T}{2}\log\sigma_T^2 -\frac{\nu_Ts_T^2}{2\sigma_T^2} \\ &-\frac{\nu_C}{2}\log\sigma_C^2 -\frac{\nu_Cs_C^2}{2\sigma_C^2} \\ &-\log\{1-\Phi(a)\} +\text{constant}. \end{aligned} } \tag{2} \]

This is the main difference from the equal-variance case: there is no single pooled variance parameter. Instead, the likelihood depends separately on \(\sigma_T^2\) and \(\sigma_C^2\).

3. Fast simulation of summary statistics conditional on \(A=\{Y>c\}\)

To simulate selected phase 2 trials efficiently, there is no need to generate individual observations. It is enough to simulate the sufficient statistics

\[ (Y,S_T^2,S_C^2) \]

directly.

Under the unequal-variance normal model,

\[ Y\sim N(\delta,V), \qquad V= \frac{\sigma_T^2}{n_T} + \frac{\sigma_C^2}{n_C}, \]

and

\[ \frac{\nu_T S_T^2}{\sigma_T^2}\sim\chi^2_{\nu_T}, \qquad \frac{\nu_C S_C^2}{\sigma_C^2}\sim\chi^2_{\nu_C}. \]

Because \(Y,S_T^2,S_C^2\) are mutually independent and the selection event depends only on \(Y\), conditioning on \(A=\{Y>c\}\) only changes the distribution of \(Y\). The distributions of \(S_T^2\) and \(S_C^2\) remain unchanged.

Therefore,

\[ \boxed{ Y\mid A \sim N(\delta,V) \text{ truncated below at } c } \tag{3} \]

and

\[ \boxed{ \frac{\nu_T S_T^2}{\sigma_T^2}\mid A \sim \chi^2_{\nu_T}, \qquad \frac{\nu_C S_C^2}{\sigma_C^2}\mid A \sim \chi^2_{\nu_C}. } \tag{4} \]

Moreover,

\[ \boxed{ Y\mid A,\quad S_T^2\mid A,\quad S_C^2\mid A \text{ are mutually independent.} } \tag{5} \]

Thus, selected summary statistics can be simulated as follows:

Compute

\[ V= \frac{\sigma_T^2}{n_T} + \frac{\sigma_C^2}{n_C}, \qquad \alpha=\frac{c-\delta}{\sqrt V}. \]

Simulate

\[ U\sim \mathrm{Uniform}(\Phi(\alpha),1). \]

\[ \boxed{ Y = \delta+\sqrt V\,\Phi^{-1}(U). } \tag{6} \]

Independently simulate

\[ W_T\sim\chi^2_{\nu_T}, \qquad W_C\sim\chi^2_{\nu_C}. \]

\[ \boxed{ S_T^2=\frac{\sigma_T^2}{\nu_T}W_T, \qquad S_C^2=\frac{\sigma_C^2}{\nu_C}W_C. } \tag{7} \]

This directly simulates \((Y,S_T^2,S_C^2)\mid Y>c\). It avoids rejection sampling and is much faster when \(P(Y>c)\) is small.

4. Conditional Density Ratio of \(Y\) After Selection

Recall that

\[ Y\sim N(\delta,V), \qquad V= \frac{\sigma_T^2}{n_T} + \frac{\sigma_C^2}{n_C}. \]

The phase 2 trial is selected only if

\[ Y>c. \]

For fixed \((\delta,\sigma_T^2,\sigma_C^2)\), the conditional density of \(Y\) given \(Y>c\) is

\[ f_\delta(y\mid Y>c) = \frac{ V^{-1/2}\phi\left(\frac{y-\delta}{\sqrt V}\right) }{ 1-\Phi\left(\frac{c-\delta}{\sqrt V}\right) }, \qquad y>c. \]

For this comparison, keep \(V\) fixed and compare \(\delta=0\) with \(\delta>0\).

Main conclusion

Define the density ratio

\[ R(y) = \frac{ f_0(y\mid Y>c) }{ f_\delta(y\mid Y>c) }, \qquad y>c,\quad \delta>0. \]

Then \(R(y)\) is strictly decreasing in \(y\). Moreover, \(R(c)>1\) and \(R(y)\to0\) as \(y\to\infty\). Therefore, there exists a unique crossing point \(y^\star>c\). In words, after conditioning on \(Y>c\), the model with \(\delta=0\) puts relatively more mass near the threshold \(c\), while the model with \(\delta>0\) puts relatively more mass farther into the right tail.

Proof

From the conditional density formula,

\[ R(y) = \frac{ 1-\Phi\left(\frac{c-\delta}{\sqrt V}\right) }{ 1-\Phi\left(\frac{c}{\sqrt V}\right) } \exp\left[ -\frac{\delta y}{V} + \frac{\delta^2}{2V} \right]. \]

Hence

\[ \log R(y) = \text{constant} - \frac{\delta}{V}y. \]

Since \(\delta>0\),

\[ \frac{d}{dy}\log R(y) = -\frac{\delta}{V}<0. \]

Therefore, \(R(y)\) is strictly decreasing in \(y\).

To show \(R(c)>1\), use

\[ \lambda(a)=\frac{\phi(a)}{1-\Phi(a)} \]

and write

\[ \log R(c) = \int_0^\delta \frac{1}{\sqrt V} \left[ \lambda\left(\frac{c-u}{\sqrt V}\right) - \frac{c-u}{\sqrt V} \right] \,du. \]

Since \(\lambda(a)>a\) for all \(a\), the integrand is positive. Thus \(R(c)>1\). Finally, since \(\log R(y)=\text{constant}-\delta y/V\), we have \(R(y)\to0\) as \(y\to\infty\).

5. Score equations

Now we illustrate how to obtain MCLE using sufficient statistics \((Y,S_T^2,S_C^2)\). Let

\[ \lambda(a)=\frac{\phi(a)}{1-\Phi(a)} \]

be the inverse Mills ratio.

The score equation for \(\delta\) is

\[ \boxed{ \frac{y-\delta}{\sqrt V} = \lambda(a). } \tag{8} \]

This is the same form as in the one-sample and equal-variance two-sample cases, except that

\[ V= \frac{\sigma_T^2}{n_T} + \frac{\sigma_C^2}{n_C}. \]

For the variance score equations, write

\[ \theta_T=\sigma_T^2, \qquad \theta_C=\sigma_C^2. \]

For \(i\in\{T,C\}\), let \(n_i\), \(\nu_i\), \(s_i^2\), and \(\theta_i\) denote the corresponding group-specific quantities. Then

\[ \boxed{ -\frac{\nu_i}{2\theta_i} + \frac{\nu_i s_i^2}{2\theta_i^2} + \frac{1}{2n_iV} \left[ \frac{(y-\delta)^2}{V} -1 -a\lambda(a) \right] = 0. } \tag{9} \]

Using the \(\delta\)-score equation,

\[ \frac{(y-\delta)^2}{V} = \lambda(a)^2. \]

Therefore, the variance score equations can also be written as

\[ \boxed{ -\frac{\nu_i}{2\theta_i} + \frac{\nu_i s_i^2}{2\theta_i^2} + \frac{1}{2n_iV} \left[ \lambda(a)^2 -1 -a\lambda(a) \right] = 0, \qquad i=T,C. } \tag{10} \]

6. Reparameterization using \(a\)

As before,

\[ a=\frac{c-\delta}{\sqrt V} \]

and the \(\delta\)-score equation gives

\[ \lambda(a)=\frac{y-\delta}{\sqrt V}. \]

Subtracting the two equations gives

\[ \lambda(a)-a = \frac{y-c}{\sqrt V}. \]

Thus

\[ \boxed{ V = \frac{(y-c)^2}{\{\lambda(a)-a\}^2}. } \tag{11} \]

Given \(a\), the treatment effect is recovered as

\[ \boxed{ \delta = c-a\sqrt V. } \tag{12} \]

Equivalently,

\[ \boxed{ \delta = y-\sqrt V\,\lambda(a). } \tag{13} \]

However, unlike the equal-variance case, \(V\) does not identify the two variance components. It only imposes

\[ \frac{\sigma_T^2}{n_T} + \frac{\sigma_C^2}{n_C} = V. \]

Therefore, the unequal-variance MCLE generally cannot be reduced to a single 1-dim equation in \(a\). The safest implementation is to directly maximize the conditional log-likelihood in equation (2), for example over

\[ (\delta,\log\sigma_T,\log\sigma_C). \]

7. Behavior as \(y\to c^+\)

Now consider the case where the observed phase 2 treatment effect barely exceeds the go/no-go threshold:

\[ y-c\downarrow 0. \]

The same boundary mechanism still applies. The relevant solution has

\[ \boxed{ \widehat a\to+\infty. } \tag{14} \]

For large positive \(a\),

\[ \lambda(a)-a = \frac{1}{a}+O(a^{-3}). \]

From equation (11),

\[ V = \frac{(y-c)^2}{\{\lambda(a)-a\}^2}. \]

Therefore, near the boundary,

\[ V\sim (y-c)^2a^2. \]

In this regime, the variance adjustment due to selection vanishes asymptotically, and the variance estimates approach their observed sample variance values:

\[ \boxed{ \widehat\sigma_T^2\to s_T^2, \qquad \widehat\sigma_C^2\to s_C^2. } \tag{15} \]

Define the observed Welch-type variance of \(Y\) as

\[ V_{\text{obs}} = \frac{s_T^2}{n_T} + \frac{s_C^2}{n_C}. \]

Then

\[ \boxed{ \widehat V\to V_{\text{obs}}. } \tag{16} \]

Since

\[ \widehat V \sim (y-c)^2\widehat a^2, \]

we obtain

\[ \boxed{ \widehat a \sim \frac{\sqrt{V_{\text{obs}}}}{y-c}. } \tag{17} \]

Finally,

\[ \widehat\delta = c-\widehat a\sqrt{\widehat V}. \]

Using \(\widehat V\to V_{\text{obs}}\), we obtain

\[ \boxed{ \widehat\delta \sim c-\frac{V_{\text{obs}}}{y-c}. } \tag{18} \]

Hence,

\[ \boxed{ \widehat\delta\to-\infty \qquad \text{as } y\to c^+. } \tag{19} \]

In words, when the observed phase 2 treatment effect barely passes the threshold, the conditional MLE attributes most of the apparent success to the conditioning event \(Y>c\). The two variance estimates remain close to the observed sample variances, but the treatment effect estimate is pulled sharply downward and has no finite limit as \(y\downarrow c\).

8. Practical implementation

The unequal-variance case is best implemented by directly maximizing equation (2). A stable parameterization is

\[ (\delta,\log\sigma_T,\log\sigma_C), \]

which automatically enforces \(\sigma_T>0\) and \(\sigma_C>0\).

The same fast simulation idea remains valid:

\[ Y = \delta+\sqrt V\,\Phi^{-1}(U), \qquad U\sim\mathrm{Uniform}\left( \Phi\left(\frac{c-\delta}{\sqrt V}\right), 1 \right), \]

where

\[ V=\frac{\sigma_T^2}{n_T}+\frac{\sigma_C^2}{n_C}. \]

Independently,

\[ S_T^2=\frac{\sigma_T^2}{\nu_T}\chi^2_{\nu_T}, \qquad S_C^2=\frac{\sigma_C^2}{\nu_C}\chi^2_{\nu_C}. \]

The R script accompanying this note implements this approach and reports both the observed selected estimate and the unequal-variance MCLE across simulation scenarios.