10.2 - power curves and simulations

Problem 1

Suppose \(Y_1,...,Y_n\) are an i.i.d. \(UNIF(0,\theta)\) sample. Consider two different tests of \(H_0: \theta = 1\) versus \(H_a: \theta > 1\).

Test 1 rejects \(H_0\) if \(Y_{(n)}> c_1\).
Test 2 rejects \(H_0\) if \(\bar Y> c_2\).

We want both tests to have 5% type-I error rate.

A. Show that \(c_1\) can be found analytically to be \(0.95^{1/n}\).

Find a general expression for the power function of Test 1 as a function of \(n\) and \(\theta_a\). Plot curve as a function of \(n\) for \(\theta = 1.2.\)

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(purrrfect)


Attaching package: 'purrrfect'

The following objects are masked from 'package:base':

    replicate, tabulate

library(dplyr)
ggplot() + 
  geom_function(fun = \(n) 1 - (0.95 / (1.2^n))) + 
  labs(x = 'Sample Size', y = 'Power') + 
  xlim(c(1,20))

C. Explain why an exact value of \(c_2\) cannot be found analytically and must be simulated.

An exact value of c2 can’t be found because it doesn’t have closed/simple form like c1 where you could find a form that can be computed analytically and therefore it must be simulated to find it.

D. Simulate values of \(c_2\) that yield 5% type-I error rates for Test 2, for \(n \in \{5,10,...,50\}\).

(null_test_2 <- parameters(~n,
                               seq(5,50, by =5)
                               ) %>%
  add_trials(10000) %>%
  mutate(Ysample = map(n, .f = \(n) runif(n, 0, 1)),
         T = map_dbl(Ysample, mean)
  )
)

# A tibble: 100,000 × 4
       n .trial Ysample       T
   <dbl>  <dbl> <list>    <dbl>
 1     5      1 <dbl [5]> 0.495
 2     5      2 <dbl [5]> 0.654
 3     5      3 <dbl [5]> 0.240
 4     5      4 <dbl [5]> 0.447
 5     5      5 <dbl [5]> 0.441
 6     5      6 <dbl [5]> 0.344
 7     5      7 <dbl [5]> 0.505
 8     5      8 <dbl [5]> 0.682
 9     5      9 <dbl [5]> 0.420
10     5     10 <dbl [5]> 0.653
# ℹ 99,990 more rows

ggplot(null_test_2) + 
  geom_histogram(aes(x = T)) + 
  facet_wrap(~n)

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

(crit_values <- null_test_2 %>%
  summarize(c2 = quantile(T, 0.95), .by = n)
)

# A tibble: 10 × 2
       n    c2
   <dbl> <dbl>
 1     5 0.712
 2    10 0.649
 3    15 0.621
 4    20 0.605
 5    25 0.595
 6    30 0.585
 7    35 0.579
 8    40 0.573
 9    45 0.571
10    50 0.568

E. Use a simulation study to compare the sizes and the power curves of the two tests as a function of \(n\) for \(\theta_{true} \in \{1,1.1,1.2,1.3\}.\) Use analytic rejection regions for \(Y_{(n)}\) and simulated rejection regions for \(\bar Y\), but simulate the rejection probabilities for both. Plot the simulated Type-I error rate/power as a function of \(n\) for each \(\theta_{true}\). Discuss the properties of these tests: does one seem preferable?

(exp_crit_vals <- parameters(~n ,
            seq(5,50,by=5)
            )
  %>% add_trials(1000)
  %>% mutate(Ysample = map(n,.f = \(n) runif(n, 0, 1)),
             T1 = map_dbl(Ysample, max),
             T2 = map_dbl(Ysample, mean)
             )
  %>% summarize(c1_simulated = quantile(T1, 0.95),   
                c2_simulated = quantile(T2, 0.95), 
                .by = n
                )
  %>% mutate(c1_analytic = 0.95^(1/n))
)  %>% head

# A tibble: 6 × 4
      n c1_simulated c2_simulated c1_analytic
  <dbl>        <dbl>        <dbl>       <dbl>
1     5        0.991        0.702       0.990
2    10        0.995        0.649       0.995
3    15        0.997        0.620       0.997
4    20        0.997        0.602       0.997
5    25        0.998        0.592       0.998
6    30        0.998        0.590       0.998

options(width = 10000)

(exp_decisions <- parameters(~n , ~true_theta, 
            seq(5,50,by=5), c(1,1.1,1.2,1.3)
            )
  %>% add_trials(1000)
  %>% inner_join(., exp_crit_vals, by ='n')
  %>% mutate(Ysample = pmap(list(n,true_theta),.f = \(n,theta) runif(n, 0, theta)),
             T1 = map_dbl(Ysample, max),
             T2 = map_dbl(Ysample, mean),
             T1_decision = ifelse(T1 > c1_analytic, 1,0),
             T2_decision = ifelse(T2 > c2_simulated, 1,0)
             )
)  %>% head()

# A tibble: 6 × 11
      n true_theta .trial c1_simulated c2_simulated c1_analytic Ysample      T1    T2 T1_decision T2_decision
  <dbl>      <dbl>  <dbl>        <dbl>        <dbl>       <dbl> <list>    <dbl> <dbl>       <dbl>       <dbl>
1     5          1      1        0.991        0.702       0.990 <dbl [5]> 0.777 0.327           0           0
2     5          1      2        0.991        0.702       0.990 <dbl [5]> 0.911 0.506           0           0
3     5          1      3        0.991        0.702       0.990 <dbl [5]> 0.965 0.609           0           0
4     5          1      4        0.991        0.702       0.990 <dbl [5]> 0.946 0.765           0           1
5     5          1      5        0.991        0.702       0.990 <dbl [5]> 0.439 0.223           0           0
6     5          1      6        0.991        0.702       0.990 <dbl [5]> 0.924 0.535           0           0

(simulated_prob_reject <- exp_decisions
 %>% summarize(power_T1 = mean(T1_decision),
               power_T2 = mean(T2_decision),
               .by=c(n,true_theta)
               )
) %>% head

# A tibble: 6 × 4
      n true_theta power_T1 power_T2
  <dbl>      <dbl>    <dbl>    <dbl>
1     5        1     0.0450   0.06  
2     5        1.1   0.425    0.176 
3     5        1.2   0.632    0.263 
4     5        1.3   0.748    0.381 
5    10        1     0.0460   0.0500
6    10        1.1   0.636    0.180

ggplot(data = simulated_prob_reject) + 
  geom_line(aes(x=n, y = power_T1, col='Test 1')) + 
  geom_line(aes(x=n, y = power_T2, col='Test 2')) + 
  scale_y_continuous(breaks = c(0.05, seq(0.2,1,by=.2)))+
  geom_hline(aes(yintercept = 0.05), linetype=2)+ 
  labs(y='Rejection Rate', 
       x='n',
       color='')+
  facet_wrap(~true_theta, labeller = label_bquote(theta == .(true_theta)))+
  theme_classic(base_size=14)

We can see using the max that at theta equal to 1 our null we have around a 5% no matter the sample size which is what we would expect from our null theta to be around a 5% rejection rate we can see though as our theta grows through 1.1, 1.2, and 1.3 that our test 1 has a higher rejection rate than our test 2 and that our test 2 is a more effective test for a lower rejection rate than test 1. If we were to use one of these tests over the others I would use the test 2 for its lower rejection rate at every value of n.