10.2 - power curves and simulations

W1

Suppose \(Y_1,...,Y_n \stackrel{i.i.d.}{\sim} N(\mu,\sigma^2 = 36)\). We want to test:

\[H_0: \mu = 3\] \[H_a: \mu > 3\]

We will reject \(H_0\) using the test statistic:

\[T = \frac{\bar Y -3}{\sigma/\sqrt{n}}\]

and will reject when \(T > c\).

A. Suppose we want to set \(c\) such that \(\alpha = 0.05\). Find \(c\).

B. Find an expression for the power as a function of \(n\) and the “effect size”, \(d=(\mu_a - \mu_0)/\sigma\).

C. Consider an effect size of \(d=1\). Determine the sample size necessary to achieve a power of at least 0.8.

D. Plot the analytic power function as a function of the effect size \(d\) for \(n = 20\).

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(purrrfect)


Attaching package: 'purrrfect'

The following objects are masked from 'package:base':

    replicate, tabulate

(ggplot()
  + geom_function(fun = \(x) 1 - pnorm(1.645 - sqrt(20)*x))
  + labs(x = 'Effect size', ylab = 'Rejection rate')
  + geom_hline(aes(yintercept = 0.05), linetype = 2)
  + theme_classic()
  )

Ignoring unknown labels:
• ylab : "Rejection rate"

E. Plot the analytic power function as a function of \(n\) if \(d = 1\).

(ggplot()
 + geom_function(fun = \(n) 1 - pnorm(1.645 - sqrt(n)*1))
 + labs(x = 'Sample size', y = 'Power')
 + xlim(c(1,20))
 + theme_classic()
  )

F. Although not necessary, use a simulation study to approximate the values of \(c\) for each \(n\in \{5,10,...,50\}\). Compare the simulated values of \(c\) to their analytic counterparts.

## Simulate distribution of T under H0: mu = 3, for various n
mu0 <- 3

(null_test_stats <- parameters(~n,
                               seq(5, 50, by = 5)
                               )
  %>% add_trials(10000)
  %>% mutate(Ysample = map(n, .f = \(n) rnorm(n, mean = mu0, sd = 6)),
             T = pmap_dbl(list(n, Ysample), \(n,y) (mean(y) - mu0)/(6/sqrt(n)))
             )
) %>% head()

# A tibble: 6 × 4
      n .trial Ysample         T
  <dbl>  <dbl> <list>      <dbl>
1     5      1 <dbl [5]> -1.22  
2     5      2 <dbl [5]> -0.398 
3     5      3 <dbl [5]> -1.76  
4     5      4 <dbl [5]> -0.0688
5     5      5 <dbl [5]>  1.11  
6     5      6 <dbl [5]> -1.46

(ggplot(null_test_stats)
  + geom_histogram(aes(x = T))
  + facet_wrap(~n)
  + theme_classic()
)

`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

(crit_values <- null_test_stats
    %>% summarize(c = quantile(T, 0.95), .by = n)
)

# A tibble: 10 × 2
       n     c
   <dbl> <dbl>
 1     5  1.61
 2    10  1.63
 3    15  1.64
 4    20  1.63
 5    25  1.67
 6    30  1.61
 7    35  1.66
 8    40  1.71
 9    45  1.63
10    50  1.66

G. Simulate the rejection rates of this test using the analytic rejection regions for \(\mu\in\{3,3.5,4,4.5\}\). Plot the rejection rates as a function of \(n\) faceted by the effect size \(d\).

(variety_test_stats <- parameters(~n, ~mu_true,
                               seq(5, 50, by = 5),
                               c(3,3.5, 4, 4.5)
                               )
  %>% add_trials(10000)
  %>% mutate(Ysample = pmap(list(mu_true, n), .f = \(m, n) rnorm(n, mean = m, sd = 6)),
             T = pmap_dbl(list(n, Ysample), \(n,y) (mean(y) - 3)/(6/sqrt(n)))
             )
 %>% inner_join(., crit_values, by = 'n')
 %>% mutate(reject = ifelse(T > c, 1, 0))
) %>% head()

# A tibble: 6 × 7
      n mu_true .trial Ysample        T     c reject
  <dbl>   <dbl>  <dbl> <list>     <dbl> <dbl>  <dbl>
1     5       3      1 <dbl [5]> -1.01   1.61      0
2     5       3      2 <dbl [5]> -0.383  1.61      0
3     5       3      3 <dbl [5]>  0.578  1.61      0
4     5       3      4 <dbl [5]>  0.900  1.61      0
5     5       3      5 <dbl [5]> -1.10   1.61      0
6     5       3      6 <dbl [5]> -0.493  1.61      0

(rejection_rates <- variety_test_stats
  %>% summarize(prob_reject = mean(reject), 
                .by = c(n, mu_true))
)

# A tibble: 40 × 3
       n mu_true prob_reject
   <dbl>   <dbl>       <dbl>
 1     5     3        0.0504
 2     5     3.5      0.0788
 3     5     4        0.109 
 4     5     4.5      0.150 
 5    10     3        0.0561
 6    10     3.5      0.0888
 7    10     4        0.133 
 8    10     4.5      0.202 
 9    15     3        0.0487
10    15     3.5      0.0943
# ℹ 30 more rows

(ggplot(data = rejection_rates)
  + geom_line(aes(x = n, y = prob_reject))
  + geom_hline(aes(yintercept = 0.05), linetype = 2)
  + facet_grid(~mu_true, labeller = label_both)
  + theme_classic()
)

Problem 1

Suppose \(Y_1,...,Y_n\) are an i.i.d. \(UNIF(0,\theta)\) sample. Consider two different tests of \(H_0: \theta = 1\) versus \(H_a: \theta > 1\).

Test 1 rejects \(H_0\) if \(Y_{(n)}> c_1\).
Test 2 rejects \(H_0\) if \(\bar Y> c_2\).

We want both tests to have 5% type-I error rate.

A. Show that \(c_1\) can be found analytically to be \(0.95^{1/n}\).

B. Find a general expression for the power function of Test 1 as a function of \(n\) and \(\theta_a\). Plot curve as a function of \(n\) for \(\theta = 1.2.\)

theta <- 1.2
(ggplot()
 + geom_function(fun = \(n) 1 - (0.95/theta^n))
 + labs(x = 'Sample size', y = 'Power')
 + xlim(c(1,20))
 + theme_classic()
  )

C. Explain why an exact value of \(c_2\) cannot be found analytically and must be simulated.
In order to find \(c_2\) analytically, we need to find the the 95th percentile of the sampling distribution \(\bar Y\), which is a scaled sum of uniform random variables. There is no well-known closed form for the CDF of \(\bar Y\), so we are unable to find the exact value of \(c_2\) analytically.

D. Simulate values of \(c_2\) that yield 5% type-I error rates for Test 2, for \(n \in \{5,10,...,50\}\).

## Simulate distribution of T under H0: theta = 1, for various n
theta <- 1

(null_test_stats <- parameters(~n,
                               seq(5, 50, by = 5))
  %>% add_trials(10000)
  %>% mutate(Ysample = map(n, .f = \(n) runif(n, 0, theta)),
             T = map_dbl(Ysample, mean))
)

# A tibble: 100,000 × 4
       n .trial Ysample       T
   <dbl>  <dbl> <list>    <dbl>
 1     5      1 <dbl [5]> 0.703
 2     5      2 <dbl [5]> 0.466
 3     5      3 <dbl [5]> 0.585
 4     5      4 <dbl [5]> 0.236
 5     5      5 <dbl [5]> 0.853
 6     5      6 <dbl [5]> 0.480
 7     5      7 <dbl [5]> 0.557
 8     5      8 <dbl [5]> 0.700
 9     5      9 <dbl [5]> 0.370
10     5     10 <dbl [5]> 0.569
# ℹ 99,990 more rows

(ggplot(data = null_test_stats)
  + geom_histogram(aes(x = T))
  + facet_wrap(~n)
  + theme_classic()
)

`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

(crit_values <- null_test_stats
  %>% summarize(c2 = quantile(T, 0.95),
                .by = n)
  )

# A tibble: 10 × 2
       n    c2
   <dbl> <dbl>
 1     5 0.711
 2    10 0.652
 3    15 0.622
 4    20 0.606
 5    25 0.595
 6    30 0.586
 7    35 0.580
 8    40 0.574
 9    45 0.568
10    50 0.568

E. Use a simulation study to compare the sizes and the power curves of the two tests as a function of \(n\) for \(\theta_{true} \in \{1,1.1,1.2,1.3\}.\) Use analytic rejection regions for \(Y_{(n)}\) and simulated rejection regions for \(\bar Y\), but simulate the rejection probabilities for both. Plot the simulated Type-I error rate/power as a function of \(n\) for each \(\theta_{true}\). Discuss the properties of these tests: does one seem preferable?

(variety_test_stats <- parameters(~n, ~true_theta,
                                  seq(5, 50, by = 5), c(1, 1.1, 1.2, 1.3)
                                  )
  %>% add_trials(10000)
  %>% inner_join(., crit_values, by = 'n')
  %>% mutate(c1 = 0.95^(1/n),
             Ysample = pmap(list(n, true_theta), .f = \(n, t) runif(n, 0, t)),
             T1 = map_dbl(Ysample, max),
             T2 = map_dbl(Ysample, mean)
             )
  %>% mutate(T1_decision = ifelse(T1 > c1, 1, 0),
             T2_decision = ifelse(T2 > c2, 1, 0))
  )

# A tibble: 400,000 × 10
       n true_theta .trial    c2    c1 Ysample      T1    T2 T1_decision
   <dbl>      <dbl>  <dbl> <dbl> <dbl> <list>    <dbl> <dbl>       <dbl>
 1     5          1      1 0.711 0.990 <dbl [5]> 0.689 0.447           0
 2     5          1      2 0.711 0.990 <dbl [5]> 0.720 0.311           0
 3     5          1      3 0.711 0.990 <dbl [5]> 0.813 0.343           0
 4     5          1      4 0.711 0.990 <dbl [5]> 0.794 0.532           0
 5     5          1      5 0.711 0.990 <dbl [5]> 0.998 0.620           1
 6     5          1      6 0.711 0.990 <dbl [5]> 0.877 0.506           0
 7     5          1      7 0.711 0.990 <dbl [5]> 0.832 0.555           0
 8     5          1      8 0.711 0.990 <dbl [5]> 0.977 0.475           0
 9     5          1      9 0.711 0.990 <dbl [5]> 0.893 0.594           0
10     5          1     10 0.711 0.990 <dbl [5]> 0.788 0.557           0
# ℹ 399,990 more rows
# ℹ 1 more variable: T2_decision <dbl>

(simulated_prob_reject <- variety_test_stats
  %>% summarize(power_T1 = mean(T1_decision),
                power_T2 = mean(T2_decision),
                .by = c(n, true_theta))
)

# A tibble: 40 × 4
       n true_theta power_T1 power_T2
   <dbl>      <dbl>    <dbl>    <dbl>
 1     5        1     0.0533   0.0538
 2     5        1.1   0.398    0.130 
 3     5        1.2   0.615    0.245 
 4     5        1.3   0.742    0.361 
 5    10        1     0.0535   0.0468
 6    10        1.1   0.631    0.160 
 7    10        1.2   0.851    0.319 
 8    10        1.3   0.932    0.494 
 9    15        1     0.0471   0.0498
10    15        1.1   0.768    0.194 
# ℹ 30 more rows

(ggplot(data = simulated_prob_reject)
  + geom_line(aes(x = n, y = power_T1, col = 'Test using maximum'))
  + geom_line(aes(x = n, y = power_T2, col = 'Test using mean'))
  + scale_y_continuous(breaks = c(0.05, seq(0.2, 1, by = 0.2)))
  + geom_hline(aes(yintercept = 0.05), linetype = 2)
  + labs(y = 'Proportion of replication rejecting null',
         x = 'n',
         color = '')
  + facet_wrap(~true_theta, labeller = label_bquote(theta == .(true_theta)))
  + theme_classic()
)

We see that when theta = 1 (the null value), both testing statistics have a 5% chance of rejecting the null at every n (which is what we expect to see). We see that for all alternative values of theta, the power of T1 and T2 both increase as n increases. We also see that the further theta is from the null hypothesis, the quicker both testing statistics converge towards 1. At every alternative theta value, we see that T1 (the test statistic using the maximum) has a higher power than T2 at every value of n. Thus, we would want to use T1.