10.2 - power curves and simulations

W1

Suppose \(Y_1,...,Y_n \stackrel{i.i.d.}{\sim} N(\mu,\sigma^2 = 36)\). We want to test:

\[H_0: \mu = 3\] \[H_a: \mu > 3\]

We will reject \(H_0\) using the test statistic:

\[T = \frac{\bar Y -3}{\sigma/\sqrt{n}}\]

and will reject when \(T > c\).

A. Suppose we want to set \(c\) such that \(\alpha = 0.05\). Find \(c\).

B. Find an expression for the power as a function of \(n\) and the “effect size”, \(d=(\mu_a - \mu_0)/\sigma\).

C. Consider an effect size of \(d=1\). Determine the sample size necessary to achieve a power of at least 0.8.

D. Plot the analytic power function as a function of the effect size \(d\) for \(n = 20\).

E. Plot the analytic power function as a function of \(n\) if \(d = 1\).

F. Although not necessary, use a simulation study to approximate the values of \(c\) for each \(n\in \{5,10,...,50\}\). Compare the simulated values of \(c\) to their analytic counterparts.

G. Simulate the rejection rates of this test using the analytic rejection regions for \(\mu\in\{3,3.5,4,4.5\}\). Plot the rejection rates as a function of \(n\) faceted by the effect size \(d\).

Problem 1

Suppose \(Y_1,...,Y_n\) are an i.i.d. \(UNIF(0,\theta)\) sample. Consider two different tests of \(H_0: \theta = 1\) versus \(H_a: \theta > 1\).

  • Test 1 rejects \(H_0\) if \(Y_{(n)}> c_1\).
  • Test 2 rejects \(H_0\) if \(\bar Y> c_2\).

We want both tests to have 5% type-I error rate.

A. Show that \(c_1\) can be found analytically to be \(0.95^{1/n}\).

  1. Find a general expression for the power function of Test 1 as a function of \(n\) and \(\theta_a\). Plot curve as a function of \(n\) for $\theta

    library(tidyverse)
    Warning: package 'tidyverse' was built under R version 4.4.3
    Warning: package 'ggplot2' was built under R version 4.4.3
    Warning: package 'tidyr' was built under R version 4.4.3
    Warning: package 'readr' was built under R version 4.4.3
    Warning: package 'purrr' was built under R version 4.4.3
    Warning: package 'dplyr' was built under R version 4.4.3
    Warning: package 'stringr' was built under R version 4.4.3
    Warning: package 'forcats' was built under R version 4.4.3
    Warning: package 'lubridate' was built under R version 4.4.3
    ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
    ✔ dplyr     1.1.4     ✔ readr     2.1.5
    ✔ forcats   1.0.0     ✔ stringr   1.5.1
    ✔ ggplot2   4.0.1     ✔ tibble    3.2.1
    ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
    ✔ purrr     1.0.4     
    ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
    ✖ dplyr::filter() masks stats::filter()
    ✖ dplyr::lag()    masks stats::lag()
    ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
    theta_a <- 1.2
    
    ggplot() + 
      geom_function(fun = \(n) 1 - 0.95/(theta_a^n))+
      xlim(c(1,50))+
      geom_hline(aes(yintercept = 0.05), linetype = 2)+
      labs (y = "Power", x = "n",
            title = "Power Curve for Test 1 (theta = 1.2)")+
      theme_classic(base_size = 14)

  2. C. Explain why an exact value of \(c_2\) cannot be found analytically and must be simulated.

D. Simulate values of \(c_2\) that yield 5% type-I error rates for Test 2, for \(n \in \{5,10,...,50\}\).

library(tidyverse)
library(purrrfect)

Attaching package: 'purrrfect'
The following objects are masked from 'package:base':

    replicate, tabulate
(sim_data <- parameters(~n, seq(5,50,by = 5))%>%
    add_trials(100000)%>%
    mutate(
      Ysample = map(n, \(n) runif(n,0,1)),
      Ybar = map_dbl(Ysample, mean)
    )
  )
# A tibble: 1,000,000 × 4
       n .trial Ysample    Ybar
   <dbl>  <dbl> <list>    <dbl>
 1     5      1 <dbl [5]> 0.613
 2     5      2 <dbl [5]> 0.834
 3     5      3 <dbl [5]> 0.472
 4     5      4 <dbl [5]> 0.251
 5     5      5 <dbl [5]> 0.190
 6     5      6 <dbl [5]> 0.715
 7     5      7 <dbl [5]> 0.617
 8     5      8 <dbl [5]> 0.597
 9     5      9 <dbl [5]> 0.606
10     5     10 <dbl [5]> 0.523
# ℹ 999,990 more rows
(c2_values <- sim_data %>%
    group_by(n) %>%
    summarize(
      c2_simulated = quantile(Ybar,0.95),
      .groups = "drop"
    )
  )
# A tibble: 10 × 2
       n c2_simulated
   <dbl>        <dbl>
 1     5        0.713
 2    10        0.651
 3    15        0.623
 4    20        0.606
 5    25        0.595
 6    30        0.587
 7    35        0.580
 8    40        0.575
 9    45        0.571
10    50        0.567
(type1_check <- sim_data %>% 
    inner_join(c2_values, by = "n") %>%
    mutate(reject = ifelse(Ybar > c2_simulated, 1, 0))%>%
    group_by(n) %>% 
    summarize(
      type1 = mean(reject),
      .groups = "drop"
    )
)
# A tibble: 10 × 2
       n type1
   <dbl> <dbl>
 1     5  0.05
 2    10  0.05
 3    15  0.05
 4    20  0.05
 5    25  0.05
 6    30  0.05
 7    35  0.05
 8    40  0.05
 9    45  0.05
10    50  0.05
ggplot(c2_values) + 
  geom_line(aes(x = n, y = c2_simulated))+
  geom_hline(yintercept = 0.5, linetype = 2)+
  labs(
    y = expression(c[2]),
    x = "n",
    title = "Simulated critical values for Test 2"
  )+ 
  theme_classic(base_size = 14)


E. Use a simulation study to compare the sizes and the power curves of the two tests as a function of \(n\) for \(\theta_{true} \in \{1,1.1,1.2,1.3\}.\) Use analytic rejection regions for \(Y_{(n)}\) and simulated rejection regions for \(\bar Y\), but simulate the rejection probabilities for both. Plot the simulated Type-I error rate/power as a function of \(n\) for each \(\theta_{true}\). Discuss the properties of these tests: does one seem preferable?

library(tidyverse)
library(purrrfect)

options(width = 10000)
(exp_decisions <- parameters(~n, ~true_theta,
                             seq(5, 50, by = 5),
                             c(1, 1.1, 1.2, 1.3)) %>%
  add_trials(2000) %>%
  inner_join(c2_values, by = "n") %>%
  mutate(
    
    Ysample = pmap(list(n, true_theta),
                   \(n, theta) runif(n, 0, theta)),
    
    T1 = map_dbl(Ysample, max),
    c1 = 0.95^(1/n),
    T1_decision = ifelse(T1 > c1, 1, 0),
    
    Ybar = map_dbl(Ysample, mean),
    T2_decision = ifelse(Ybar > c2_simulated, 1, 0)
  )
  )
# A tibble: 80,000 × 10
       n true_theta .trial c2_simulated Ysample      T1    c1 T1_decision  Ybar T2_decision
   <dbl>      <dbl>  <dbl>        <dbl> <list>    <dbl> <dbl>       <dbl> <dbl>       <dbl>
 1     5          1      1        0.713 <dbl [5]> 0.699 0.990           0 0.313           0
 2     5          1      2        0.713 <dbl [5]> 0.851 0.990           0 0.575           0
 3     5          1      3        0.713 <dbl [5]> 0.960 0.990           0 0.361           0
 4     5          1      4        0.713 <dbl [5]> 0.951 0.990           0 0.275           0
 5     5          1      5        0.713 <dbl [5]> 0.978 0.990           0 0.544           0
 6     5          1      6        0.713 <dbl [5]> 0.922 0.990           0 0.672           0
 7     5          1      7        0.713 <dbl [5]> 0.777 0.990           0 0.501           0
 8     5          1      8        0.713 <dbl [5]> 0.707 0.990           0 0.405           0
 9     5          1      9        0.713 <dbl [5]> 0.771 0.990           0 0.352           0
10     5          1     10        0.713 <dbl [5]> 0.788 0.990           0 0.428           0
# ℹ 79,990 more rows
simulated_prob_reject <- exp_decisions %>% 
  group_by(n, true_theta) %>%
  summarize(
    power_T1 = mean(T1_decision),
    power_T2 = mean(T2_decision),
    .groups = "drop"
  )

ggplot(data = simulated_prob_reject)+
  geom_line(aes(x = n, y = power_T1, col = 'Test using max'))+
  geom_line(aes(x = n, y = power_T2, col = 'Test using mean'))+
  scale_y_continuous(breaks = c(0.05, seq(0.2, 1, by = .2)))+
  geom_hline(yintercept = 0.05, linetype = 2) + 
  labs(
    y = 'Proportion of replications rejecting null',
    x = 'n',
    color = ''
  )+
  facet_wrap(~true_theta,
             labeller = label_bquote(theta == .(true_theta)))+
  theme_classic(base_size = 14)

The Maximum Y(n) is the most informative statistic for theta. Test 1 direclty targets the parameter which makes it more efficient than Test 2. This makes Test 1 more preferable because it has uniromly higher power whie=le maintaining the corret Type 1 error rate. Test 1 has a more better analtic rejection region then Test 2.