E. Plot the analytic power function as a function of \(n\) if \(d = 1\).
(ggplot()+geom_function(fun = \(n) 1-pnorm(1.645-sqrt(n)*1))+labs(x ='Sample size', y ='Power')+xlim(c(1,20))+theme_classic() )
F. Although not necessary, use a simulation study to approximate the values of \(c\) for each \(n\in \{5,10,...,50\}\). Compare the simulated values of \(c\) to their analytic counterparts.
## Simulate distribution of T under H0: mu = 3, for various nmu0 <-3(null_test_stats <-parameters(~n,seq(5, 50, by =5) )%>%add_trials(10000)%>%mutate(Ysample =map(n, .f = \(n) rnorm(n, mean = mu0, sd =6)),T =pmap_dbl(list(n, Ysample), \(n,y) (mean(y) - mu0)/(6/sqrt(n))) )) %>%head()
G. Simulate the rejection rates of this test using the analytic rejection regions for \(\mu\in\{3,3.5,4,4.5\}\). Plot the rejection rates as a function of \(n\) faceted by the effect size \(d\).
(variety_test_stats <-parameters(~n, ~mu_true,seq(5, 50, by =5),c(3,3.5, 4, 4.5) )%>%add_trials(10000)%>%mutate(Ysample =pmap(list(mu_true, n), .f = \(m, n) rnorm(n, mean = m, sd =6)),T =pmap_dbl(list(n, Ysample), \(n,y) (mean(y) -3)/(6/sqrt(n))) )%>%inner_join(., crit_values, by ='n')%>%mutate(reject =ifelse(T > c, 1, 0))) %>%head()
(ggplot(data = rejection_rates)+geom_line(aes(x = n, y = prob_reject))+geom_hline(aes(yintercept =0.05), linetype =2)+facet_grid(~mu_true, labeller = label_both)+theme_classic())
Problem 1
Suppose \(Y_1,...,Y_n\) are an i.i.d. \(UNIF(0,\theta)\) sample. Consider two different tests of \(H_0: \theta = 1\) versus \(H_a: \theta > 1\).
Test 1 rejects \(H_0\) if \(Y_{(n)}> c_1\).
Test 2 rejects \(H_0\) if \(\bar Y> c_2\).
We want both tests to have 5% type-I error rate.
A. Show that \(c_1\) can be found analytically to be \(0.95^{1/n}\).
B. Find a general expression for the power function of Test 1 as a function of \(n\) and \(\theta_a\). Plot curve as a function of \(n\) for \(\theta = 1.2.\)
C. Explain why an exact value of \(c_2\) cannot be found analytically and must be simulated. In order to find\(c_2\)analytically, we need to find the the 95th percentile of the sampling distribution\(\bar Y\), which is a scaled sum of uniform random variables. There is no well-known closed form for the CDF of\(\bar Y\), so we are unable to find the exact value of\(c_2\)analytically.
D. Simulate values of \(c_2\) that yield 5% type-I error rates for Test 2, for \(n \in \{5,10,...,50\}\).
## Simulate distribution of T under H0: theta = 1, for various ntheta <-1(null_test_stats <-parameters(~n,seq(5, 50, by =5))%>%add_trials(10000)%>%mutate(Ysample =map(n, .f = \(n) runif(n, 0, theta)),T =map_dbl(Ysample, mean)))
E. Use a simulation study to compare the sizes and the power curves of the two tests as a function of \(n\) for \(\theta_{true} \in \{1,1.1,1.2,1.3\}.\) Use analytic rejection regions for \(Y_{(n)}\) and simulated rejection regions for \(\bar Y\), but simulate the rejection probabilities for both. Plot the simulated Type-I error rate/power as a function of \(n\) for each \(\theta_{true}\). Discuss the properties of these tests: does one seem preferable?
(ggplot(data = simulated_prob_reject)+geom_line(aes(x = n, y = power_T1, col ='Test using maximum'))+geom_line(aes(x = n, y = power_T2, col ='Test using mean'))+scale_y_continuous(breaks =c(0.05, seq(0.2, 1, by =0.2)))+geom_hline(aes(yintercept =0.05), linetype =2)+labs(y ='Proportion of replication rejecting null',x ='n',color ='')+facet_wrap(~true_theta, labeller =label_bquote(theta == .(true_theta)))+theme_classic())
We see that when theta = 1 (the null value), both testing statistics have a 5% chance of rejecting the null at every n (which is what we expect to see). We see that for all alternative values of theta, the power of T1 and T2 both increase as n increases. We also see that the further theta is from the null hypothesis, the quicker both testing statistics converge towards 1. At every alternative theta value, we see that T1 (the test statistic using the maximum) has a higher power than T2 at every value of n. Thus, we would want to use T1.