10.2 - power curves and simulations

W1

Suppose $Y_1,...,Y_n \stackrel{i.i.d.}{\sim} N(\mu,\sigma^2 = 36)$. We want to test:

\[H_0: \mu = 3\] \[H_a: \mu > 3\]

We will reject $H_0$ using the test statistic:

\[T = \frac{\bar Y -3}{\sigma/\sqrt{n}}\]

and will reject when $T > c$.

A. Suppose we want to set $c$ such that $\alpha = 0.05$. Find $c$.

B. Find an expression for the power as a function of $n$ and the “effect size”, $d=(\mu_a - \mu_0)/\sigma$.

C. Consider an effect size of $d=1$. Determine the sample size necessary to achieve a power of at least 0.8.

D. Plot the analytic power function as a function of the effect size $d$ for $n = 20$.

E. Plot the analytic power function as a function of $n$ if $d = 1$.

F. Although not necessary, use a simulation study to approximate the values of $c$ for each $n\in \{5,10,...,50\}$. Compare the simulated values of $c$ to their analytic counterparts.

G. Simulate the rejection rates of this test using the analytic rejection regions for $\mu\in\{3,3.5,4,4.5\}$. Plot the rejection rates as a function of $n$ faceted by the effect size $d$.

Problem 1

Suppose $Y_1,...,Y_n$ are an i.i.d. $UNIF(0,\theta)$ sample. Consider two different tests of $H_0: \theta = 1$ versus $H_a: \theta > 1$.

Test 1 rejects $H_0$ if $Y_{(n)}> c_1$.
Test 2 rejects $H_0$ if $\bar Y> c_2$.

We want both tests to have 5% type-I error rate.

A. Show that $c_1$ can be found analytically to be $0.95^{1/n}$.

Find a general expression for the power function of Test 1 as a function of $n$ and $\theta_a$. Plot curve as a function of $n$ for $\theta

library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.4.3

Warning: package 'ggplot2' was built under R version 4.4.3

Warning: package 'tidyr' was built under R version 4.4.3

Warning: package 'readr' was built under R version 4.4.3

Warning: package 'purrr' was built under R version 4.4.3

Warning: package 'dplyr' was built under R version 4.4.3

Warning: package 'stringr' was built under R version 4.4.3

Warning: package 'forcats' was built under R version 4.4.3

Warning: package 'lubridate' was built under R version 4.4.3

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

theta_a <- 1.2

ggplot() + 
  geom_function(fun = \(n) 1 - 0.95/(theta_a^n))+
  xlim(c(1,50))+
  geom_hline(aes(yintercept = 0.05), linetype = 2)+
  labs (y = "Power", x = "n",
        title = "Power Curve for Test 1 (theta = 1.2)")+
  theme_classic(base_size = 14)

C. Explain why an exact value of $c_2$ cannot be found analytically and must be simulated.

D. Simulate values of $c_2$ that yield 5% type-I error rates for Test 2, for $n \in \{5,10,...,50\}$.

library(tidyverse)
library(purrrfect)


Attaching package: 'purrrfect'

The following objects are masked from 'package:base':

    replicate, tabulate

(sim_data <- parameters(~n, seq(5,50,by = 5))%>%
    add_trials(100000)%>%
    mutate(
      Ysample = map(n, \(n) runif(n,0,1)),
      Ybar = map_dbl(Ysample, mean)
    )
  )

# A tibble: 1,000,000 × 4
       n .trial Ysample    Ybar
   <dbl>  <dbl> <list>    <dbl>
 1     5      1 <dbl [5]> 0.613
 2     5      2 <dbl [5]> 0.834
 3     5      3 <dbl [5]> 0.472
 4     5      4 <dbl [5]> 0.251
 5     5      5 <dbl [5]> 0.190
 6     5      6 <dbl [5]> 0.715
 7     5      7 <dbl [5]> 0.617
 8     5      8 <dbl [5]> 0.597
 9     5      9 <dbl [5]> 0.606
10     5     10 <dbl [5]> 0.523
# ℹ 999,990 more rows

(c2_values <- sim_data %>%
    group_by(n) %>%
    summarize(
      c2_simulated = quantile(Ybar,0.95),
      .groups = "drop"
    )
  )

# A tibble: 10 × 2
       n c2_simulated
   <dbl>        <dbl>
 1     5        0.713
 2    10        0.651
 3    15        0.623
 4    20        0.606
 5    25        0.595
 6    30        0.587
 7    35        0.580
 8    40        0.575
 9    45        0.571
10    50        0.567

(type1_check <- sim_data %>% 
    inner_join(c2_values, by = "n") %>%
    mutate(reject = ifelse(Ybar > c2_simulated, 1, 0))%>%
    group_by(n) %>% 
    summarize(
      type1 = mean(reject),
      .groups = "drop"
    )
)

# A tibble: 10 × 2
       n type1
   <dbl> <dbl>
 1     5  0.05
 2    10  0.05
 3    15  0.05
 4    20  0.05
 5    25  0.05
 6    30  0.05
 7    35  0.05
 8    40  0.05
 9    45  0.05
10    50  0.05

ggplot(c2_values) + 
  geom_line(aes(x = n, y = c2_simulated))+
  geom_hline(yintercept = 0.5, linetype = 2)+
  labs(
    y = expression(c[2]),
    x = "n",
    title = "Simulated critical values for Test 2"
  )+ 
  theme_classic(base_size = 14)

E. Use a simulation study to compare the sizes and the power curves of the two tests as a function of $n$ for $\theta_{true} \in \{1,1.1,1.2,1.3\}.$ Use analytic rejection regions for $Y_{(n)}$ and simulated rejection regions for $\bar Y$, but simulate the rejection probabilities for both. Plot the simulated Type-I error rate/power as a function of $n$ for each $\theta_{true}$. Discuss the properties of these tests: does one seem preferable?

library(tidyverse)
library(purrrfect)

options(width = 10000)
(exp_decisions <- parameters(~n, ~true_theta,
                             seq(5, 50, by = 5),
                             c(1, 1.1, 1.2, 1.3)) %>%
  add_trials(2000) %>%
  inner_join(c2_values, by = "n") %>%
  mutate(
    
    Ysample = pmap(list(n, true_theta),
                   \(n, theta) runif(n, 0, theta)),
    
    T1 = map_dbl(Ysample, max),
    c1 = 0.95^(1/n),
    T1_decision = ifelse(T1 > c1, 1, 0),
    
    Ybar = map_dbl(Ysample, mean),
    T2_decision = ifelse(Ybar > c2_simulated, 1, 0)
  )
  )

# A tibble: 80,000 × 10
       n true_theta .trial c2_simulated Ysample      T1    c1 T1_decision  Ybar T2_decision
   <dbl>      <dbl>  <dbl>        <dbl> <list>    <dbl> <dbl>       <dbl> <dbl>       <dbl>
 1     5          1      1        0.713 <dbl [5]> 0.699 0.990           0 0.313           0
 2     5          1      2        0.713 <dbl [5]> 0.851 0.990           0 0.575           0
 3     5          1      3        0.713 <dbl [5]> 0.960 0.990           0 0.361           0
 4     5          1      4        0.713 <dbl [5]> 0.951 0.990           0 0.275           0
 5     5          1      5        0.713 <dbl [5]> 0.978 0.990           0 0.544           0
 6     5          1      6        0.713 <dbl [5]> 0.922 0.990           0 0.672           0
 7     5          1      7        0.713 <dbl [5]> 0.777 0.990           0 0.501           0
 8     5          1      8        0.713 <dbl [5]> 0.707 0.990           0 0.405           0
 9     5          1      9        0.713 <dbl [5]> 0.771 0.990           0 0.352           0
10     5          1     10        0.713 <dbl [5]> 0.788 0.990           0 0.428           0
# ℹ 79,990 more rows

simulated_prob_reject <- exp_decisions %>% 
  group_by(n, true_theta) %>%
  summarize(
    power_T1 = mean(T1_decision),
    power_T2 = mean(T2_decision),
    .groups = "drop"
  )

ggplot(data = simulated_prob_reject)+
  geom_line(aes(x = n, y = power_T1, col = 'Test using max'))+
  geom_line(aes(x = n, y = power_T2, col = 'Test using mean'))+
  scale_y_continuous(breaks = c(0.05, seq(0.2, 1, by = .2)))+
  geom_hline(yintercept = 0.05, linetype = 2) + 
  labs(
    y = 'Proportion of replications rejecting null',
    x = 'n',
    color = ''
  )+
  facet_wrap(~true_theta,
             labeller = label_bquote(theta == .(true_theta)))+
  theme_classic(base_size = 14)

The Maximum Y(n) is the most informative statistic for theta. Test 1 direclty targets the parameter which makes it more efficient than Test 2. This makes Test 1 more preferable because it has uniromly higher power whie=le maintaining the corret Type 1 error rate. Test 1 has a more better analtic rejection region then Test 2.