10.3 - the Neyman-Pearson Lemma and most powerful tests

Introduction

We’ve seen how we test hypotheses using a test statistic \(T\), by specifying \(\alpha\) and determining a critical value and rejection region.
We’ve already seen how we can compare the power for different same-size tests (the same-sizeness being important!)
Is there an “optimal” test statistic to use?
Enter the Neyman-Pearson Lemma!

Jerzy Neyman and Egon Pearson

Jerzy Neyman

Image source: UC Berkeley

Egon Pearson

Image source: Wikipedia

On the problem of the most efficient tests of statistical hypotheses in the Philosophical Transactions, 1933
“Unexpectedly” favorably” refereed by R.A. Fisher (according to Neyman)
Formulated the idea of type-I and type-II errors and “optimality” under fixed type-I error rate

Hypothesis test: a general definition

Define \(\phi(y_1,...,y_n) = \phi(\mathbf{y})\):

\[\phi(\mathbf{y}) = \left\{\begin{array} {ll} 1 & \mathbf{y} \in RR \\ 0 & \mathbf{y} \notin RR\\ \end{array}\right.\]

Consider simple hypotheses of the form:

\[H_0: \theta = \theta_0\] \[H_a: \theta = \theta_a\]

Then:
- \(\alpha = P_{\theta_0}(\phi(\mathbf{y}) = 1)\)
- \(Power=P_{\theta_a}(\phi(\mathbf{y}) = 1)\)

The likelihood ratio test…

Define \(\phi(\mathbf{y})= 1\) when:

\[\frac{L(\theta_a)}{L(\theta_0)} = \frac{f(\mathbf{y};\theta_a)}{f(\mathbf{y};\theta_0)}> c,\]

and \(\phi(\mathbf{y})= 0\) when:

\[\frac{L(\theta_a)}{L(\theta_0)} = \frac{f(\mathbf{y};\theta_a)}{f(\mathbf{y};\theta_0)}< c,\]

for some \(c\) chosen such that \(P_{\theta_0}(\phi(\mathbf{y})=1) =\alpha\).

Intuition: reject \(H_0\) if the data are “convincingly more likely” under \(\theta_a\) than they are under \(\theta_0\).

… is the most powerful size-\(\alpha\) test!

Then, for any other test \(\phi'(\mathbf{y})\) that satisfies:

\[\alpha = P_{\theta_0}(\phi'(\mathbf{y})=1) = P_{\theta_0}(\phi(\mathbf{y})=1),\]

the power of \(\phi(\mathbf{y})\) (the likelihood ratio test) is at least as large as the power of \(\phi'(\mathbf{y})\); that is:

\[P_{\theta_a}(\phi(\mathbf{y})=1) \geq P_{\theta_a}(\phi'(\mathbf{y})=1).\]

Proof preliminary 1: expressing rejection probability as expectation

Note that for any \(\theta\):

\[P_\theta(\phi(\mathbf{y})=1) = 1\cdot P_\theta(\phi(\mathbf{y})=1) + 0\cdot P_\theta(\phi(\mathbf{y})=0)\]

\[ = E_\theta(\phi(\mathbf{y})) = \int \phi(\mathbf{y})\cdot f(\mathbf{y};\theta) d\mathbf{y}\]

Implications:

\(\alpha =P_{\theta_0}(\phi(\mathbf{y})=1) = E_{\theta_0}(\phi(\mathbf{y}))\)
\(Power =P_{\theta_a}(\phi(\mathbf{y})=1) = E_{\theta_a}(\phi(\mathbf{y}))\)

Equivalently for any other test:

\[P_\theta(\phi'(\mathbf{y})=1) = E_\theta(\phi'(\mathbf{y}))\]

Proof preliminary 2: re-expressing the likelihood ratio

As expressed, the most-powerful size-\(\alpha\) rejects the null (\(\phi(\mathbf{y}) = 1\)) when:

\[\frac{L(\theta_a)}{L(\theta_0)} = \frac{f(\mathbf{y};\theta_a)}{f(\mathbf{y};\theta_0)}> c.\]

By simple re-arrangement of the ratio, this is equivalent to rejecting when:

\[f(\mathbf{y};\theta_a) - cf(\mathbf{y};\theta_0) > 0.\]

Proof of Neyman-Pearson lemma

Define:

\[\delta(\mathbf{y}) = (\phi(\mathbf{y}) - \phi'(\mathbf{y}))(f(\mathbf{y};\theta_a)-cf(\mathbf{y};\theta_0))\]

Note that both \(\phi(\mathbf{y})\) and \(\phi'(\mathbf{y})\) are always either 1 or 0. We know the conditions under which \(\phi(\mathbf{y})\) takes on each value:

Case 1:

\[f(\mathbf{y};\theta_a)-cf(\mathbf{y};\theta_0) > 0 \Rightarrow \phi(\mathbf{y})=1 \Rightarrow (\phi(\mathbf{y}) - \phi'(\mathbf{y}))\ge 0 \Rightarrow \delta(\mathbf{y}) \ge 0\]

Case 2:

\[f(\mathbf{y};\theta_a)-cf(\mathbf{y};\theta_0) < 0 \Rightarrow \phi(\mathbf{y})=0\Rightarrow (\phi(\mathbf{y}) - \phi'(\mathbf{y})) \le 0 \Rightarrow \delta(\mathbf{y}) \ge 0\]

Case 3:

\[f(\mathbf{y};\theta_a)-cf(\mathbf{y};\theta_0)= 0 \Rightarrow \delta(\mathbf{y}) = 0\]

Implication: \(\delta(\mathbf{y})\ge 0\) always.

Proof of Neyman-Pearson lemma

Since \(\delta(\mathbf{y})\ge 0\) always:

\[0 \le \int \delta(\mathbf{y}) d\mathbf{y} = \int(\phi(\mathbf{y}) - \phi'(\mathbf{y}))(f(\mathbf{y};\theta_a)-cf(\mathbf{y};\theta_0)) d\mathbf{y}\]

\[= \int(\phi(\mathbf{y}) - \phi'(\mathbf{y}))f(\mathbf{y};\theta_a) d\mathbf{y} - c\int(\phi(\mathbf{y}) - \phi'(\mathbf{y}))f(\mathbf{y};\theta_0) d\mathbf{y}\]

\[= E_{\theta_a}(\phi(\mathbf{y})) - E_{\theta_a}(\phi'(\mathbf{y})) - c\left[\underbrace{E_{\theta_0}(\phi(\mathbf{y})) - E_{\theta_0}(\phi'(\mathbf{y}))}_{\alpha - \alpha} \right]\]

\[= E_{\theta_a}(\phi(\mathbf{y})) - E_{\theta_a}(\phi'(\mathbf{y})) \]

\[ \therefore E_{\theta_a}(\phi'(\mathbf{y})) \le E_{\theta_a}(\phi(\mathbf{y})) \]

Uniformly most powerful tests

Consider testing

\[H_0: \theta = \theta_0\]

\[H_a: \theta \in \Omega_a\]

“Upper tail” example: \(\Omega_a= \{\theta: \theta > \theta_0\}\)
“Lower tail” example: \(\Omega_a= \{\theta: \theta < \theta_0\}\)

A test \(\phi(\mathbf{y})\) is said to be a uniformly most powerful (UMP) size-\(\alpha\) test if for all \(\theta \in \Omega_a\):

\[P(\phi(\mathbf{y})=1 | \theta \in \Omega_a) \geq P(\phi'(\mathbf{y})=1 | \theta \in \Omega_a),\]

for any other size-\(\alpha\) test \(\phi'(\mathbf{y})\).

Using Neyman-Pearson to find UMP tests

As stated and proven, the N-P lemma shows a way to find a most powerful test of the simple hypotheses \(H_0: \theta = \theta_0\) vs \(H_a: \theta = \theta_a\).
It turns out, the Neyman-Pearson Lemma can be used to find most powerful composite tests of the form:

\[H_0: \theta = \theta_0\] \[H_a: \theta > \theta_0,\]

or:

\[H_0: \theta = \theta_0\] \[H_a: \theta < \theta_0.\]

Process

Set up a simple hypothesis for a specific \(\theta_a \in \Omega_a\);
Use the N-P lemma to find the most powerful size-\(\alpha\) test for this simple hypothesis;
Show that the rejection region does not depend on the specific value of \(\theta_a\) chosen, but only on the fact that \(\theta_a \in \Omega_a\).

Example: Finding UMP test for Poisson rate

Suppose we have an i.i.d. sample of size \(n\) where \(Y_i \sim POI(\lambda)\).
Derive a uniformly most powerful size-\(\alpha\) test for testing:

\[H_0: \lambda = \lambda_0\] \[H_a: \lambda > \lambda_0\]

Finding MP for simple \(H_a\)

For a value \(\lambda_a > \lambda_0\), N-P lemma says to reject when:

\[\frac{L(\lambda_a)}{L(\lambda_0)} = \frac{e^{-n\lambda_a }\lambda_a^{\sum_i y_i}/ \prod_i y_i !}{e^{-n\lambda_0 }\lambda_0^{\sum_i y_i}/ \prod_i y_i !}> c,\]

\[ \mbox{ i.e. when } e^{n(\lambda_0 - \lambda_a) }\left(\frac{\lambda_a}{\lambda_0}\right)^{\sum_i y_i}> c, \]

\[\mbox{ i.e. when }\left(\frac{\lambda_a}{\lambda_0}\right)^{\sum_i y_i}> c', \]

\[\mbox{ i.e. when }\sum_i y_i \underbrace{[\log(\lambda_a) -\log(\lambda_0)]}_{> 0}> c'', \]

\[\mbox{ i.e. when }\sum_i y_i > c'''.\]

Establishing UMP

Note that the decision:

\[\mbox{Reject the null when } \sum_i y_i \mbox{ is large}\]

depended not on the specific \(\lambda_a\) chosen, only on the fact that \(\lambda_a > \lambda_0\).

Thus, using the test statistic \(T = \sum_i Y_i\) and rejecting when it is large (will have to define) yields the UMP size-\(\alpha\) test of \(H_0: \lambda=\lambda_0\) vs \(H_a: \lambda > \lambda_0\).

Determining “large”

It remains to define “large” with respect to \(T = \sum_i Y_i\) so that we have \(\alpha\) as close to a specified value as possible (typical: 0.05)
Suppose \(\lambda_0 = 5\), and we are testing these hypotheses using a sample of size \(n=50\).
Want \(c\) such that:

\[P_{\lambda =5}(T>c)\approx \alpha\]

Under \(H_0: \lambda = 5\), \(T \sim POI(50\cdot 5)\)

qpois(0.95, lambda = 50*5)

[1] 276

1-ppois(276, lambda = 50*5)

[1] 0.04865865

1-ppois(275, lambda = 50*5)

[1] 0.05515271

\(\therefore\) rejecting \(H_0\) when \(T \ge 277\) yields the UMP size-\(\alpha = 0.049\) test.

Visualizing

Distribution of \(T \sim POI(50\cdot \lambda_0) = POI(250)\) with rejection region of 277 or larger; note mean of 250:

Example: Finding UMP test for geometric \(p\)

Suppose we have an i.i.d. sample of size \(n\) where \(Y_i \sim GEO(p)\) (number of failures version).
Derive a uniformly most powerful size-\(\alpha\) test for testing:

\[H_0: p = p_0\] \[H_a: p > p_0\]

Finding MP for simple \(H_a\)

For a specified value of \(p_a > p_0\), N-P lemma says to reject when:

\[\frac{L(p_a)}{L(p_0)} = \frac{p_a^n(1-p_a)^{\sum_i y_i}}{p_0^n(1-p_0)^{\sum_i y_i}}> c,\]

\[ \mbox{ i.e. when } \left(\frac{p_a}{p_0}\right)^n \left(\frac{1-p_a}{1-p_0}\right)^{\sum_i y_i}> c, \]

\[\mbox{ i.e. when } \left(\frac{1-p_a}{1-p_0}\right)^{\sum_i y_i}> c', \]

\[\mbox{ i.e. when }\sum_i y_i \cdot\underbrace{\log\left(\underbrace{\frac{1-p_a}{1-p_0}}_{<1}\right)}_{<0}> c'', \]

\[\mbox{ i.e. when }\sum_i y_i < c'''.\]

Establishing UMP

Note that the decision:

\[\mbox{Reject the null when } \sum_i y_i \mbox{ is small}\]

depended not on the specific \(p_a\) chosen, only on the fact that \(p_a > p_0\).

Thus, using the test statistic \(T = \sum_i Y_i\) and rejecting when it is small (will have to define) yields the UMP size-\(\alpha\) test of \(H_0: p=p_0\) vs \(H_a: p>p_0\).

Determining “small”

It remains to define “small” with respect to \(T = \sum_i Y_i\) so that we have \(\alpha\) as close to a specified value as possible (typical: 0.05)
Suppose \(p_0 = 0.5\), and we are testing these hypotheses using a sample of size \(n=50\).
Want \(c\) such that:

\[P_{p =5}(T<c)\approx \alpha\]

Under \(H_0: p = 0.5\), \(T \sim NB(50, 0.5)\)

qnbinom(0.05, size = 50, prob = 0.5)

[1] 34

pnbinom(34, size = 50, prob = 0.5)

[1] 0.05056817

pnbinom(33, size = 50, prob = 0.5)

[1] 0.03920981

\(\therefore\) rejecting \(H_0\) when \(T \le 34\) yields the UMP size-\(\alpha = 0.051\) test.

Visualizing

Distribution of \(T \sim NB(50, p_0)\) with rejection region of 34 or smaller:

Example: UMP for exponential rate

Recall a previous example, using i.i.d. \(EXP(\lambda)\) data to test:

\[H_0: \lambda = 1/5\] \[H_a: \lambda < 1/5.\]

We previously specified the form of the size-\(\alpha\) test using the test statistic \(Y_{(1)}\).
- For \(\alpha = 0.05\), this test, \(\phi'(\mathbf{y}) = 1\) when \(Y_{(1)} >c = \frac{\ln(20)}{0.2n}\): the 95th percentile of \(EXP(n\lambda_0)\) distribution.
- Power of this test: \(P_{\lambda_a}(\phi'(\mathbf{y})=1) = e^{-n\lambda_a \times \ln(20)/0.2}\): i.e., \(1-F_{\lambda_a}(c)\), with \(F_{\lambda_a}(\cdot)\) the CDF of an \(EXP(n\lambda_a)\) random variable.
Use the Neyman-Pearson Lemma to find the UMP size-\(\alpha=0.05\) test for testing these hypotheses, and use visualization to show it is UMP for a grid of \(n\) values.

Finding UMP test

For a specified value of \(\lambda_a < \lambda_0\), N-P lemma says to reject when:

\[\frac{L(\lambda_a)}{L(\lambda_0)} = \frac{\lambda_a^n e^{-\lambda_a\sum_i y_i}}{\lambda_0^ne^{-\lambda_0\sum_i y_i}}> c,\]

\[ \mbox{ i.e. when } e^{\underbrace{(\lambda_0-\lambda_a)}_{>0}\sum_i y_i}> c', \]

\[\mbox{ i.e. when } \sum_i y_i \mbox{ is large}. \]

Since the form of this test depended not on the specific \(\lambda_a\) chosen, only on the fact that \(\lambda_a < \lambda_0\), it is UMP for alternatives of the form \(H_a: \lambda < \lambda_0\).

Determining “large”

Under \(H_0: \lambda = 1/5\), \(T \sim GAM(n, 1/5)\).
Determine rejection region using cv <- qgamma(0.95, shape = n, rate = 1/5); has size exactly \(\alpha= 0.05\).
For any \(\lambda_a < 1/5\), determine power using 1-pgamma(cv, shape = n, rate = lambda_a)

Finding rejection probabilities over \(\{\lambda, n\}\) grid

Code for finding the analytic rejection probabilities for these two tests:

library(purrrfect)
library(tidyverse)

lambda_0 = 1/5

(two_rate_tests <- parameters(~lambda_true, ~n, 
                    seq(0.001, 0.2, l=100), c(10, 20, 40, 80))
         %>% mutate(crit_value_min = qexp(0.95, rate = n*lambda_0),
                    crit_value_sum = qgamma(0.95, shape = n, rate = lambda_0),
                    prob_min_rejects = 1-pexp(crit_value_min, rate = n*lambda_true),
                    prob_sum_rejects = 1-pgamma(crit_value_sum, shape = n, rate = lambda_true)
                    )
) %>% head

# A tibble: 6 × 6
  lambda_true     n crit_value_min crit_value_sum prob_min_rejects prob_sum_rejects
        <dbl> <dbl>          <dbl>          <dbl>            <dbl>            <dbl>
1     0.001      10          1.50            78.5            0.985            1    
2     0.001      20          0.749          139.             0.985            1    
3     0.001      40          0.374          255.             0.985            1    
4     0.001      80          0.187          476.             0.985            1    
5     0.00301    10          1.50            78.5            0.956            1.000
6     0.00301    20          0.749          139.             0.956            1

Visualizing

ggplot(data = two_rate_tests) + 
  geom_line(aes(x = lambda_true, y = prob_min_rejects, col = 'Test based on min')) +
  geom_line(aes(x = lambda_true, y = prob_sum_rejects, col = 'Test based on sum')) +
  facet_wrap(~n, labeller = label_both)+
  scale_y_continuous(breaks = c(0.05, seq(0.25, 1, by = 0.25)))+
  geom_hline(aes(yintercept= 0.05), linetype = 2) + 
  theme_classic() +
  labs(x=expression(lambda[true]),
       y='P(reject null)', color='')