10.3 - the Neyman-Pearson Lemma and most powerful tests

Introduction

  • We’ve seen how we test hypotheses using a test statistic \(T\), by specifying \(\alpha\) and determining a critical value and rejection region.
  • We’ve already seen how we can compare the power for different same-size tests (the same-sizeness being important!)
  • Is there an “optimal” test statistic to use?
  • Enter the Neyman-Pearson Lemma!

Jerzy Neyman and Egon Pearson

Jerzy Neyman

Image source: UC Berkeley

Egon Pearson

Image source: Wikipedia

  • On the problem of the most efficient tests of statistical hypotheses in the Philosophical Transactions, 1933
  • “Unexpectedly” favorably” refereed by R.A. Fisher (according to Neyman)
  • Formulated the idea of type-I and type-II errors and “optimality” under fixed type-I error rate

Hypothesis test: a general definition

  • Define \(\phi(y_1,...,y_n) = \phi(\mathbf{y})\):

\[\phi(\mathbf{y}) = \left\{\begin{array} {ll} 1 & \mathbf{y} \in RR \\ 0 & \mathbf{y} \notin RR\\ \end{array}\right.\]

  • Consider simple hypotheses of the form:

\[H_0: \theta = \theta_0\] \[H_a: \theta = \theta_a\]

  • Then:
    • \(\alpha = P_{\theta_0}(\phi(\mathbf{y}) = 1)\)
    • \(Power=P_{\theta_a}(\phi(\mathbf{y}) = 1)\)

The likelihood ratio test…

Define \(\phi(\mathbf{y})= 1\) when:

\[\frac{L(\theta_a)}{L(\theta_0)} = \frac{f(\mathbf{y};\theta_a)}{f(\mathbf{y};\theta_0)}> c,\]

and \(\phi(\mathbf{y})= 0\) when:

\[\frac{L(\theta_a)}{L(\theta_0)} = \frac{f(\mathbf{y};\theta_a)}{f(\mathbf{y};\theta_0)}< c,\]

for some \(c\) chosen such that \(P_{\theta_0}(\phi(\mathbf{y})=1) =\alpha\).

Intuition: reject \(H_0\) if the data are “convincingly more likely” under \(\theta_a\) than they are under \(\theta_0\).

… is the most powerful size-\(\alpha\) test!

Then, for any other test \(\phi'(\mathbf{y})\) that satisfies:

\[\alpha = P_{\theta_0}(\phi'(\mathbf{y})=1) = P_{\theta_0}(\phi(\mathbf{y})=1),\]

the power of \(\phi(\mathbf{y})\) (the likelihood ratio test) is at least as large as the power of \(\phi'(\mathbf{y})\); that is:

\[P_{\theta_a}(\phi(\mathbf{y})=1) \geq P_{\theta_a}(\phi'(\mathbf{y})=1).\]

Proof preliminary 1: expressing rejection probability as expectation

  • Note that for any \(\theta\):

\[P_\theta(\phi(\mathbf{y})=1) = 1\cdot P_\theta(\phi(\mathbf{y})=1) + 0\cdot P_\theta(\phi(\mathbf{y})=0)\]

\[ = E_\theta(\phi(\mathbf{y})) = \int \phi(\mathbf{y})\cdot f(\mathbf{y};\theta) d\mathbf{y}\]

Implications:

  • \(\alpha =P_{\theta_0}(\phi(\mathbf{y})=1) = E_{\theta_0}(\phi(\mathbf{y}))\)
  • \(Power =P_{\theta_a}(\phi(\mathbf{y})=1) = E_{\theta_a}(\phi(\mathbf{y}))\)
  • Equivalently for any other test:

\[P_\theta(\phi'(\mathbf{y})=1) = E_\theta(\phi'(\mathbf{y}))\]

Proof preliminary 2: re-expressing the likelihood ratio

  • As expressed, the most-powerful size-\(\alpha\) rejects the null (\(\phi(\mathbf{y}) = 1\)) when:

\[\frac{L(\theta_a)}{L(\theta_0)} = \frac{f(\mathbf{y};\theta_a)}{f(\mathbf{y};\theta_0)}> c.\]

  • By simple re-arrangement of the ratio, this is equivalent to rejecting when:

\[f(\mathbf{y};\theta_a) - cf(\mathbf{y};\theta_0) > 0.\]

Proof of Neyman-Pearson lemma

Define:

\[\delta(\mathbf{y}) = (\phi(\mathbf{y}) - \phi'(\mathbf{y}))(f(\mathbf{y};\theta_a)-cf(\mathbf{y};\theta_0))\]

Note that both \(\phi(\mathbf{y})\) and \(\phi'(\mathbf{y})\) are always either 1 or 0. We know the conditions under which \(\phi(\mathbf{y})\) takes on each value:

  • Case 1:

\[f(\mathbf{y};\theta_a)-cf(\mathbf{y};\theta_0) > 0 \Rightarrow \phi(\mathbf{y})=1 \Rightarrow (\phi(\mathbf{y}) - \phi'(\mathbf{y}))\ge 0 \Rightarrow \delta(\mathbf{y}) \ge 0\]

  • Case 2:

\[f(\mathbf{y};\theta_a)-cf(\mathbf{y};\theta_0) < 0 \Rightarrow \phi(\mathbf{y})=0\Rightarrow (\phi(\mathbf{y}) - \phi'(\mathbf{y})) \le 0 \Rightarrow \delta(\mathbf{y}) \ge 0\]

  • Case 3:

\[f(\mathbf{y};\theta_a)-cf(\mathbf{y};\theta_0)= 0 \Rightarrow \delta(\mathbf{y}) = 0\]

Implication: \(\delta(\mathbf{y})\ge 0\) always.

Proof of Neyman-Pearson lemma

Since \(\delta(\mathbf{y})\ge 0\) always:

\[0 \le \int \delta(\mathbf{y}) d\mathbf{y} = \int(\phi(\mathbf{y}) - \phi'(\mathbf{y}))(f(\mathbf{y};\theta_a)-cf(\mathbf{y};\theta_0)) d\mathbf{y}\]

\[= \int(\phi(\mathbf{y}) - \phi'(\mathbf{y}))f(\mathbf{y};\theta_a) d\mathbf{y} - c\int(\phi(\mathbf{y}) - \phi'(\mathbf{y}))f(\mathbf{y};\theta_0) d\mathbf{y}\]

\[= E_{\theta_a}(\phi(\mathbf{y})) - E_{\theta_a}(\phi'(\mathbf{y})) - c\left[\underbrace{E_{\theta_0}(\phi(\mathbf{y})) - E_{\theta_0}(\phi'(\mathbf{y}))}_{\alpha - \alpha} \right]\]

\[= E_{\theta_a}(\phi(\mathbf{y})) - E_{\theta_a}(\phi'(\mathbf{y})) \]

\[ \therefore E_{\theta_a}(\phi'(\mathbf{y})) \le E_{\theta_a}(\phi(\mathbf{y})) \]

Uniformly most powerful tests

  • Consider testing

\[H_0: \theta = \theta_0\]

\[H_a: \theta \in \Omega_a\]

  • “Upper tail” example: \(\Omega_a= \{\theta: \theta > \theta_0\}\)
  • “Lower tail” example: \(\Omega_a= \{\theta: \theta < \theta_0\}\)

A test \(\phi(\mathbf{y})\) is said to be a uniformly most powerful (UMP) size-\(\alpha\) test if for all \(\theta \in \Omega_a\):

\[P(\phi(\mathbf{y})=1 | \theta \in \Omega_a) \geq P(\phi'(\mathbf{y})=1 | \theta \in \Omega_a),\]

for any other size-\(\alpha\) test \(\phi'(\mathbf{y})\).

Using Neyman-Pearson to find UMP tests

  • As stated and proven, the N-P lemma shows a way to find a most powerful test of the simple hypotheses \(H_0: \theta = \theta_0\) vs \(H_a: \theta = \theta_a\).
  • It turns out, the Neyman-Pearson Lemma can be used to find most powerful composite tests of the form:

\[H_0: \theta = \theta_0\] \[H_a: \theta > \theta_0,\]

or:

\[H_0: \theta = \theta_0\] \[H_a: \theta < \theta_0.\]

Process

  1. Set up a simple hypothesis for a specific \(\theta_a \in \Omega_a\);
  2. Use the N-P lemma to find the most powerful size-\(\alpha\) test for this simple hypothesis;
  3. Show that the rejection region does not depend on the specific value of \(\theta_a\) chosen, but only on the fact that \(\theta_a \in \Omega_a\).

Example: Finding UMP test for Poisson rate

  • Suppose we have an i.i.d. sample of size \(n\) where \(Y_i \sim POI(\lambda)\).
  • Derive a uniformly most powerful size-\(\alpha\) test for testing:

\[H_0: \lambda = \lambda_0\] \[H_a: \lambda > \lambda_0\]

Finding MP for simple \(H_a\)

For a value \(\lambda_a > \lambda_0\), N-P lemma says to reject when:

\[\frac{L(\lambda_a)}{L(\lambda_0)} = \frac{e^{-n\lambda_a }\lambda_a^{\sum_i y_i}/ \prod_i y_i !}{e^{-n\lambda_0 }\lambda_0^{\sum_i y_i}/ \prod_i y_i !}> c,\]

\[ \mbox{ i.e. when } e^{n(\lambda_0 - \lambda_a) }\left(\frac{\lambda_a}{\lambda_0}\right)^{\sum_i y_i}> c, \]

\[\mbox{ i.e. when }\left(\frac{\lambda_a}{\lambda_0}\right)^{\sum_i y_i}> c', \]

\[\mbox{ i.e. when }\sum_i y_i \underbrace{[\log(\lambda_a) -\log(\lambda_0)]}_{> 0}> c'', \]

\[\mbox{ i.e. when }\sum_i y_i > c'''.\]

Establishing UMP

  • Note that the decision:

\[\mbox{Reject the null when } \sum_i y_i \mbox{ is large}\]

depended not on the specific \(\lambda_a\) chosen, only on the fact that \(\lambda_a > \lambda_0\).

  • Thus, using the test statistic \(T = \sum_i Y_i\) and rejecting when it is large (will have to define) yields the UMP size-\(\alpha\) test of \(H_0: \lambda=\lambda_0\) vs \(H_a: \lambda > \lambda_0\).

Determining “large”

  • It remains to define “large” with respect to \(T = \sum_i Y_i\) so that we have \(\alpha\) as close to a specified value as possible (typical: 0.05)
  • Suppose \(\lambda_0 = 5\), and we are testing these hypotheses using a sample of size \(n=50\).
  • Want \(c\) such that:

\[P_{\lambda =5}(T>c)\approx \alpha\]

  • Under \(H_0: \lambda = 5\), \(T \sim POI(50\cdot 5)\)
qpois(0.95, lambda = 50*5)
[1] 276
1-ppois(276, lambda = 50*5)
[1] 0.04865865
1-ppois(275, lambda = 50*5)
[1] 0.05515271
  • \(\therefore\) rejecting \(H_0\) when \(T \ge 277\) yields the UMP size-\(\alpha = 0.049\) test.

Visualizing

Distribution of \(T \sim POI(50\cdot \lambda_0) = POI(250)\) with rejection region of 277 or larger; note mean of 250:

Example: Finding UMP test for geometric \(p\)

  • Suppose we have an i.i.d. sample of size \(n\) where \(Y_i \sim GEO(p)\) (number of failures version).
  • Derive a uniformly most powerful size-\(\alpha\) test for testing:

\[H_0: p = p_0\] \[H_a: p > p_0\]

Finding MP for simple \(H_a\)

For a specified value of \(p_a > p_0\), N-P lemma says to reject when:

\[\frac{L(p_a)}{L(p_0)} = \frac{p_a^n(1-p_a)^{\sum_i y_i}}{p_0^n(1-p_0)^{\sum_i y_i}}> c,\]

\[ \mbox{ i.e. when } \left(\frac{p_a}{p_0}\right)^n \left(\frac{1-p_a}{1-p_0}\right)^{\sum_i y_i}> c, \]

\[\mbox{ i.e. when } \left(\frac{1-p_a}{1-p_0}\right)^{\sum_i y_i}> c', \]

\[\mbox{ i.e. when }\sum_i y_i \cdot\underbrace{\log\left(\underbrace{\frac{1-p_a}{1-p_0}}_{<1}\right)}_{<0}> c'', \]

\[\mbox{ i.e. when }\sum_i y_i < c'''.\]

Establishing UMP

  • Note that the decision:

\[\mbox{Reject the null when } \sum_i y_i \mbox{ is small}\]

depended not on the specific \(p_a\) chosen, only on the fact that \(p_a > p_0\).

  • Thus, using the test statistic \(T = \sum_i Y_i\) and rejecting when it is small (will have to define) yields the UMP size-\(\alpha\) test of \(H_0: p=p_0\) vs \(H_a: p>p_0\).

Determining “small”

  • It remains to define “small” with respect to \(T = \sum_i Y_i\) so that we have \(\alpha\) as close to a specified value as possible (typical: 0.05)
  • Suppose \(p_0 = 0.5\), and we are testing these hypotheses using a sample of size \(n=50\).
  • Want \(c\) such that:

\[P_{p =5}(T<c)\approx \alpha\]

  • Under \(H_0: p = 0.5\), \(T \sim NB(50, 0.5)\)
qnbinom(0.05, size = 50, prob = 0.5)
[1] 34
pnbinom(34, size = 50, prob = 0.5)
[1] 0.05056817
pnbinom(33, size = 50, prob = 0.5)
[1] 0.03920981
  • \(\therefore\) rejecting \(H_0\) when \(T \le 34\) yields the UMP size-\(\alpha = 0.051\) test.

Visualizing

Distribution of \(T \sim NB(50, p_0)\) with rejection region of 34 or smaller:

Example: UMP for exponential rate

  • Recall a previous example, using i.i.d. \(EXP(\lambda)\) data to test:

\[H_0: \lambda = 1/5\] \[H_a: \lambda < 1/5.\]

  • We previously specified the form of the size-\(\alpha\) test using the test statistic \(Y_{(1)}\).
    • For \(\alpha = 0.05\), this test, \(\phi'(\mathbf{y}) = 1\) when \(Y_{(1)} >c = \frac{\ln(20)}{0.2n}\): the 95th percentile of \(EXP(n\lambda_0)\) distribution.
    • Power of this test: \(P_{\lambda_a}(\phi'(\mathbf{y})=1) = e^{-n\lambda_a \times \ln(20)/0.2}\): i.e., \(1-F_{\lambda_a}(c)\), with \(F_{\lambda_a}(\cdot)\) the CDF of an \(EXP(n\lambda_a)\) random variable.
  • Use the Neyman-Pearson Lemma to find the UMP size-\(\alpha=0.05\) test for testing these hypotheses, and use visualization to show it is UMP for a grid of \(n\) values.

Finding UMP test

For a specified value of \(\lambda_a < \lambda_0\), N-P lemma says to reject when:

\[\frac{L(\lambda_a)}{L(\lambda_0)} = \frac{\lambda_a^n e^{-\lambda_a\sum_i y_i}}{\lambda_0^ne^{-\lambda_0\sum_i y_i}}> c,\]

\[ \mbox{ i.e. when } e^{\underbrace{(\lambda_0-\lambda_a)}_{>0}\sum_i y_i}> c', \]

\[\mbox{ i.e. when } \sum_i y_i \mbox{ is large}. \]

Since the form of this test depended not on the specific \(\lambda_a\) chosen, only on the fact that \(\lambda_a < \lambda_0\), it is UMP for alternatives of the form \(H_a: \lambda < \lambda_0\).

Determining “large”

  • Under \(H_0: \lambda = 1/5\), \(T \sim GAM(n, 1/5)\).
  • Determine rejection region using cv <- qgamma(0.95, shape = n, rate = 1/5); has size exactly \(\alpha= 0.05\).
  • For any \(\lambda_a < 1/5\), determine power using 1-pgamma(cv, shape = n, rate = lambda_a)

Finding rejection probabilities over \(\{\lambda, n\}\) grid

Code for finding the analytic rejection probabilities for these two tests:

library(purrrfect)
library(tidyverse)

lambda_0 = 1/5

(two_rate_tests <- parameters(~lambda_true, ~n, 
                    seq(0.001, 0.2, l=100), c(10, 20, 40, 80))
         %>% mutate(crit_value_min = qexp(0.95, rate = n*lambda_0),
                    crit_value_sum = qgamma(0.95, shape = n, rate = lambda_0),
                    prob_min_rejects = 1-pexp(crit_value_min, rate = n*lambda_true),
                    prob_sum_rejects = 1-pgamma(crit_value_sum, shape = n, rate = lambda_true)
                    )
) %>% head 
# A tibble: 6 × 6
  lambda_true     n crit_value_min crit_value_sum prob_min_rejects prob_sum_rejects
        <dbl> <dbl>          <dbl>          <dbl>            <dbl>            <dbl>
1     0.001      10          1.50            78.5            0.985            1    
2     0.001      20          0.749          139.             0.985            1    
3     0.001      40          0.374          255.             0.985            1    
4     0.001      80          0.187          476.             0.985            1    
5     0.00301    10          1.50            78.5            0.956            1.000
6     0.00301    20          0.749          139.             0.956            1    

Visualizing

ggplot(data = two_rate_tests) + 
  geom_line(aes(x = lambda_true, y = prob_min_rejects, col = 'Test based on min')) +
  geom_line(aes(x = lambda_true, y = prob_sum_rejects, col = 'Test based on sum')) +
  facet_wrap(~n, labeller = label_both)+
  scale_y_continuous(breaks = c(0.05, seq(0.25, 1, by = 0.25)))+
  geom_hline(aes(yintercept= 0.05), linetype = 2) + 
  theme_classic() +
  labs(x=expression(lambda[true]),
       y='P(reject null)', color='')