Power = 0.8
Significance Level = 0.05
\(p_0\) is the expected prevalence of 0.36
\(p_1\) is our sample’s prevalence
Testing: \(H_0: p_1=p_0=0.36\) versus \(H_1: p_1>p_0\)
Null Hypothesis: The prevalance of H.pylori in our sample \(p_1\) is equal to what we would expect \(p_0= .36\)
Alternative Hypothesis: The prevalence of H.pylori in our sample \(p_1\) is greater than 0.36 by at least X%. Observe figure below to consider (X) the minimum difference to detect.
library(ggplot2)
# One proportion sample size calculation
difference = seq(from = 0.05, to = 0.20, by = 0.025)
p0 = 0.36
raw_sample_size = numeric(length(difference))
for (i in 1:length(difference)){
size = pwr.p.test(
h = ES.h(p0+difference[i], p0), # Effect size (h) based on expected proportion (p1) and null proportion (p2)
sig.level = 0.05,
power = 0.8,
alternative = "greater"
)
raw_sample_size[i] = ceiling(size$n)
}
df = data.frame(cbind(difference, raw_sample_size))
df$attrition50 = df$raw_sample_size/.5
df$attrition25 = df$raw_sample_size/.25
ggplot(df, aes(y = difference, x = raw_sample_size)) + geom_point() + geom_text(aes(label = raw_sample_size), vjust = -0.5, hjust=-.3) + geom_line() + labs(x = "Raw Sample Size (kits that are processed)", y = "(X) Difference to Detect")+xlim(0,620) + ylim(0.05,.225)
Note this figure is not yet adjusted for attrition.
\(\frac{\text{raw sample size}}{\text{% kits expected to be returned}}\)
Example interpretation: We want to detect at least a 10% greater prevalence; we expect 25% of all kits to be returned. The adjusted sample size is 596.
df
## difference raw_sample_size attrition50 attrition25
## 1 0.050 585 1170 2340
## 2 0.075 263 526 1052
## 3 0.100 149 298 596
## 4 0.125 96 192 384
## 5 0.150 67 134 268
## 6 0.175 50 100 200
## 7 0.200 38 76 152
\(P(\text{A or B or C})=P(A)+P(B)+P(C)-P(\text{A and B}) -P(\text{A and C}) -P(\text{B and C}) + P(\text{A + B + C})\)
Note that I only have info on the first three components (from past meeting):
\(\frac{1}{416.5}+\frac{1}{279}+\frac{1}{250}=0.009985\) is also close to \(0.013\) prevalence Luis mentioned to use
difference = seq(from = 0.002, to = 0.018, by = 0.001)
p0 = 0.013
raw_sample_size = numeric(length(difference))
for (i in 1:length(difference)){
size = pwr.p.test(
h = ES.h(p0+difference[i], p0), # Effect size (h) based on expected proportion (p1) and null proportion (p2)
sig.level = 0.05,
power = 0.8,
alternative = "two.sided"
)
raw_sample_size[i] = ceiling(size$n)
}
df = data.frame(cbind(difference, raw_sample_size))
df$label = paste0("D=", df$difference, ", ", "n=",df$raw_sample_size)
df$attrition50 = df$raw_sample_size/.5
df$attrition25 = df$raw_sample_size/.25
ggplot(df, aes(y = difference, x = raw_sample_size)) + geom_point() + geom_text(aes(label = label), vjust = -0.5, hjust=-.1, size = 3.5) + geom_line() + labs(x = "Raw Sample Size", y = "(X) Difference in Prevalence to Detect\n(+/- away from 0.013)")+xlim(450,32000)# + ylim(0.05,.225)
Note: Consider further adjustment for attrition.