H. Pylori Testing Pilot

Sample Size for One Proportion, One Sided Hypothesis

Power = 0.8

Significance Level = 0.05

\(p_0\) is the expected prevalence of 0.36

\(p_1\) is our sample’s prevalence

Testing: \(H_0: p_1=p_0=0.36\) versus \(H_1: p_1>p_0\)

Null Hypothesis: The prevalance of H.pylori in our sample \(p_1\) is equal to what we would expect \(p_0= .36\)

Alternative Hypothesis: The prevalence of H.pylori in our sample \(p_1\) is greater than 0.36 by at least X%. Observe figure below to consider (X) the minimum difference to detect.

library(ggplot2)
# One proportion sample size calculation

difference = seq(from = 0.05, to = 0.20, by = 0.025)
p0 = 0.36
raw_sample_size = numeric(length(difference))

for (i in 1:length(difference)){
  size = pwr.p.test(
  h = ES.h(p0+difference[i], p0), # Effect size (h) based on expected proportion (p1) and null proportion (p2)
  sig.level = 0.05,            
  power = 0.8,                 
  alternative = "greater"     
)
raw_sample_size[i] = ceiling(size$n)
}

df = data.frame(cbind(difference, raw_sample_size))

df$attrition50 = df$raw_sample_size/.5
df$attrition25 = df$raw_sample_size/.25

ggplot(df, aes(y = difference, x = raw_sample_size)) + geom_point() +  geom_text(aes(label = raw_sample_size), vjust = -0.5, hjust=-.3) + geom_line() + labs(x = "Raw Sample Size (kits that are processed)", y = "(X) Difference to Detect")+xlim(0,620) + ylim(0.05,.225)

Note this figure is not yet adjusted for attrition.

Attrition Adjustment

\(\frac{\text{raw sample size}}{\text{% kits expected to be returned}}\)

Example interpretation: We want to detect at least a 10% greater prevalence; we expect 25% of all kits to be returned. The adjusted sample size is 596.

df
##   difference raw_sample_size attrition50 attrition25
## 1      0.050             585        1170        2340
## 2      0.075             263         526        1052
## 3      0.100             149         298         596
## 4      0.125              96         192         384
## 5      0.150              67         134         268
## 6      0.175              50         100         200
## 7      0.200              38          76         152

Universal Testing Sample Size

\(P(\text{A or B or C})=P(A)+P(B)+P(C)-P(\text{A and B}) -P(\text{A and C}) -P(\text{B and C}) + P(\text{A + B + C})\)

Note that I only have info on the first three components (from past meeting):

\(\frac{1}{416.5}+\frac{1}{279}+\frac{1}{250}=0.009985\) is also close to \(0.013\) prevalence Luis mentioned to use

difference = seq(from = 0.002, to = 0.018, by = 0.001)
p0 = 0.013
raw_sample_size = numeric(length(difference))

for (i in 1:length(difference)){
  size = pwr.p.test(
  h = ES.h(p0+difference[i], p0), # Effect size (h) based on expected proportion (p1) and null proportion (p2)
  sig.level = 0.05,            
  power = 0.8,                 
  alternative = "two.sided"     
)
raw_sample_size[i] = ceiling(size$n)
}

df = data.frame(cbind(difference, raw_sample_size))
df$label = paste0("D=", df$difference, ", ", "n=",df$raw_sample_size)

df$attrition50 = df$raw_sample_size/.5
df$attrition25 = df$raw_sample_size/.25

ggplot(df, aes(y = difference, x = raw_sample_size)) + geom_point() +  geom_text(aes(label = label), vjust = -0.5, hjust=-.1, size = 3.5) + geom_line() + labs(x = "Raw Sample Size", y = "(X) Difference in Prevalence to Detect\n(+/- away from 0.013)")+xlim(450,32000)# + ylim(0.05,.225)

Note: Consider further adjustment for attrition.