10.2 - power curves and simulations

W1

Suppose \(Y_1,...,Y_n \stackrel{i.i.d.}{\sim} N(\mu,\sigma^2 = 36)\). We want to test:

\[H_0: \mu = 3\] \[H_a: \mu > 3\]

We will reject \(H_0\) using the test statistic:

\[T = \frac{\bar Y -3}{\sigma/\sqrt{n}}\]

and will reject when \(T > c\).

A. Suppose we want to set \(c\) such that \(\alpha = 0.05\). Find \(c\).

B. Find an expression for the power as a function of \(n\) and the “effect size”, \(d=(\mu_a - \mu_0)/\sigma\).

C. Consider an effect size of \(d=1\). Determine the sample size necessary to achieve a power of at least 0.8.

D. Plot the analytic power function as a function of the effect size \(d\) for \(n = 20\).

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.2
Warning: package 'ggplot2' was built under R version 4.5.2
Warning: package 'readr' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.2.0
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(purrrfect)

Attaching package: 'purrrfect'

The following objects are masked from 'package:base':

    replicate, tabulate
ggplot()+
  geom_function(fun = \(x) 1- pnorm(1.645 - sqrt(20)*x))+
  labs(x='effect size', y ='rejection rate')+
  geom_hline(aes(yintercept = 0.05), linetype= 2)

E. Plot the analytic power function as a function of \(n\) if \(d = 1\).

ggplot()+
  geom_function(fun = \(n) 1- pnorm(1.645 - sqrt(n)*1))+
  labs(x='sample size', y ='power')+
  xlim(c(1,20))

F. Although not necessary, use a simulation study to approximate the values of \(c\) for each \(n\in \{5,10,...,50\}\). Compare the simulated values of \(c\) to their analytic counterparts.

# sim dist of test stat under null where mu=3
mu0 <- 3


(null_study <- parameters(
                    ~n, seq(5,50, by =5)
                          ) %>% 
              add_trials(10000) %>% 
              mutate(
                y_sample = map(n, .f = \(n) rnorm(n, mean = mu0, sd = 6)),
                T = pmap_dbl(list(n,y_sample), \(n,y) ((mean(y)- mu0) / (6/sqrt(n))))
              )
    
) %>% head()
# A tibble: 6 × 4
      n .trial y_sample       T
  <dbl>  <dbl> <list>     <dbl>
1     5      1 <dbl [5]> -0.209
2     5      2 <dbl [5]>  0.316
3     5      3 <dbl [5]>  0.496
4     5      4 <dbl [5]>  1.40 
5     5      5 <dbl [5]>  0.795
6     5      6 <dbl [5]>  0.377
# finiding quantiles by n
(crit_values <- null_study
%>% summarize(c = quantile(T, 0.95), .by = n)
)
# A tibble: 10 × 2
       n     c
   <dbl> <dbl>
 1     5  1.64
 2    10  1.68
 3    15  1.64
 4    20  1.67
 5    25  1.67
 6    30  1.60
 7    35  1.63
 8    40  1.65
 9    45  1.64
10    50  1.66
ggplot(null_study)+
  geom_histogram(aes(x=T))+
  facet_wrap(~n)+
  theme_classic()
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

G. Simulate the rejection rates of this test using the analytic rejection regions for \(\mu\in\{3,3.5,4,4.5\}\). Plot the rejection rates as a function of \(n\) faceted by the effect size \(d\).

(various_test_study <- parameters(
                    ~n, ~mu_true,
                    seq(5,50, by =5), c(3,3.5,4,4.5)
                          ) %>% 
              add_trials(10000) %>% 
              mutate(
                y_sample = pmap(list(mu_true,n), .f = \(m,n) rnorm(n, mean = m, sd = 6)),
                T = pmap_dbl(list(n,y_sample), \(n,y) ((mean(y)- 3) / (6/sqrt(n))))
              ) %>% 
    inner_join(., crit_values, by ='n') %>% 
    mutate( reject = ifelse(T > c, 1,0))
) %>% head()
# A tibble: 6 × 7
      n mu_true .trial y_sample        T     c reject
  <dbl>   <dbl>  <dbl> <list>      <dbl> <dbl>  <dbl>
1     5       3      1 <dbl [5]> -0.0820  1.64      0
2     5       3      2 <dbl [5]> -0.790   1.64      0
3     5       3      3 <dbl [5]>  1.42    1.64      0
4     5       3      4 <dbl [5]>  0.434   1.64      0
5     5       3      5 <dbl [5]>  2.23    1.64      1
6     5       3      6 <dbl [5]> -0.286   1.64      0
(rejection_rates <- various_test_study %>% 
  summarize(prob_reject = mean(reject), .by = c(n, mu_true))
)
# A tibble: 40 × 3
       n mu_true prob_reject
   <dbl>   <dbl>       <dbl>
 1     5     3        0.053 
 2     5     3.5      0.0712
 3     5     4        0.105 
 4     5     4.5      0.138 
 5    10     3        0.0449
 6    10     3.5      0.0786
 7    10     4        0.126 
 8    10     4.5      0.194 
 9    15     3        0.0515
10    15     3.5      0.0967
# ℹ 30 more rows
ggplot(data = rejection_rates)+
  geom_line(aes(x=n, y = prob_reject))+
  facet_wrap(~mu_true)+
  geom_hline(aes(yintercept=0.05), linetype = 2)

Problem 1

Suppose \(Y_1,...,Y_n\) are an i.i.d. \(UNIF(0,\theta)\) sample. Consider two different tests of \(H_0: \theta = 1\) versus \(H_a: \theta > 1\).

  • Test 1 rejects \(H_0\) if \(Y_{(n)}> c_1\).
  • Test 2 rejects \(H_0\) if \(\bar Y> c_2\).

We want both tests to have 5% type-I error rate.

A. Show that \(c_1\) can be found analytically to be \(0.95^{1/n}\).

  1. Find a general expression for the power function of Test 1 as a function of \(n\) and \(\theta_a\). Plot curve as a function of \(n\) for \(\theta = 1.2.\)
# Plotting the analytic power curve as a function of n for theta = 1.2
ggplot() +
  geom_function(fun = \(n) 1 - (0.95 / (1.2^n)), color = "blue") +
  xlim(c(1, 50)) + 
  labs(
    x = "Sample Size (n)",
    y = "Power",
    )+
  theme_classic(base_size = 14)

C. Explain why an exact value of \(c_2\) cannot be found analytically and must be simulated.
We cannot find c2 analytically because the exact sampling distribution of the sample mean for uniform variables is very complex.

D. Simulate values of \(c_2\) that yield 5% type-I error rates for Test 2, for \(n \in \{5,10,...,50\}\).

theta_0 <- 1

(c2_sim_data <- parameters(~n, seq(5, 50, by = 5)) %>%
  add_trials(10000) %>%
  mutate(
    Ysample = map(n, .f = \(n) runif(n, min = 0, max = theta_0)),
    T2 = map_dbl(Ysample, mean)
  ) %>%
  summarize(
    c2_simulated = quantile(T2, 0.95),
    .by = n
  )
) %>% head()
# A tibble: 6 × 2
      n c2_simulated
  <dbl>        <dbl>
1     5        0.712
2    10        0.648
3    15        0.623
4    20        0.607
5    25        0.595
6    30        0.587

E. Use a simulation study to compare the sizes and the power curves of the two tests as a function of \(n\) for \(\theta_{true} \in \{1,1.1,1.2,1.3\}.\) Use analytic rejection regions for \(Y_{(n)}\) and simulated rejection regions for \(\bar Y\), but simulate the rejection probabilities for both. Plot the simulated Type-I error rate/power as a function of \(n\) for each \(\theta_{true}\). Discuss the properties of these tests: does one seem preferable?

(part_e_decisions <- parameters(~n, ~true_theta,
                               seq(5, 50, by = 5),
                               c(1, 1.1, 1.2, 1.3)) %>%
  add_trials(10000) %>%
  inner_join(c2_sim_data, by = "n") %>%
  mutate(
    Ysample = pmap(list(n, true_theta), .f = \(n, theta) runif(n, min = 0, max = theta)),
    T1 = map_dbl(Ysample, max),
    T2 = map_dbl(Ysample, mean),
    c1_analytic = 0.95^(1/n),
    T1_decision = ifelse(T1 > c1_analytic, 1, 0),
    T2_decision = ifelse(T2 > c2_simulated, 1, 0)
  )
)
# A tibble: 400,000 × 10
       n true_theta .trial c2_simulated Ysample      T1    T2 c1_analytic
   <dbl>      <dbl>  <dbl>        <dbl> <list>    <dbl> <dbl>       <dbl>
 1     5          1      1        0.712 <dbl [5]> 0.907 0.453       0.990
 2     5          1      2        0.712 <dbl [5]> 0.912 0.486       0.990
 3     5          1      3        0.712 <dbl [5]> 0.990 0.724       0.990
 4     5          1      4        0.712 <dbl [5]> 0.969 0.576       0.990
 5     5          1      5        0.712 <dbl [5]> 0.743 0.468       0.990
 6     5          1      6        0.712 <dbl [5]> 0.998 0.570       0.990
 7     5          1      7        0.712 <dbl [5]> 0.979 0.549       0.990
 8     5          1      8        0.712 <dbl [5]> 0.968 0.642       0.990
 9     5          1      9        0.712 <dbl [5]> 0.407 0.284       0.990
10     5          1     10        0.712 <dbl [5]> 0.877 0.646       0.990
# ℹ 399,990 more rows
# ℹ 2 more variables: T1_decision <dbl>, T2_decision <dbl>
(simulated_power <- part_e_decisions %>%
  summarize(
    power_T1 = mean(T1_decision),
    power_T2 = mean(T2_decision),
    .by = c(n, true_theta)
  )
)
# A tibble: 40 × 4
       n true_theta power_T1 power_T2
   <dbl>      <dbl>    <dbl>    <dbl>
 1     5        1     0.0474   0.0522
 2     5        1.1   0.407    0.130 
 3     5        1.2   0.617    0.241 
 4     5        1.3   0.749    0.360 
 5    10        1     0.0475   0.0491
 6    10        1.1   0.627    0.164 
 7    10        1.2   0.845    0.334 
 8    10        1.3   0.933    0.503 
 9    15        1     0.047    0.0507
10    15        1.1   0.772    0.187 
# ℹ 30 more rows
ggplot(data = simulated_power) +
  geom_line(aes(x = n, y = power_T1, color = "Test 1 (Max)")) +
  geom_line(aes(x = n, y = power_T2, color = "Test 2 (Mean)")) +
  geom_hline(yintercept = 0.05, linetype = 2) +
  facet_wrap(~true_theta, labeller = label_bquote(theta == .(true_theta))) +
  labs(
    x = "Sample Size",
    y = "Simulated Rejection Probability",
    color = "Test Statistic"
  ) +
  theme_classic(base_size = 14)

We can see both tests fall on the 5% rejection line when theta = 1, regardless of sample size, which makes sense, it is our null hypothesis value. For all other thetas (1.1, 1.2, and 1.3), we can see that the Test 1, the maximum order stat does much better than test 2, the mean. i.e. Test 1 goes to 100% power much quicker when increasing sample size, regardless of the theta value. Also, the larger the value of theta, the quicker the power goes to 100%.