Power Calculations for Breast Cancer Precision Medicine Project

Version 1 (10.28.22) April Vang

We have two groups \(1\) (systemic population-based approach) and \(2\) (family history approach). Let \(p_i\) be the prevalence rate (number of individuals having the mutations divided by total number of individuals in the group) for \(i =\) {\(1,2\)}. We will conduct a hypothesis testing for two proportions.

Four criteria to consider for power calculations:

\(\alpha = 0.05\) where \(\alpha\) is the Type I error rate
power = \(1-\beta = 0.8\), \(\beta = 0.2\) where \(\beta\) is the Type II error rate
We will vary sample sizes
We will vary effect sizes (Cohen’s h)

We will need to make assumptions on what is the prevalent rate of group \(1\) and group \(2\). This report considers different values of possible prevalent rates we may see in the study.

Note that for Cohen’s h, an \(h\) near 0.2 is a \(small\) effect, an \(h\) near 0.5 is a \(medium\) effect, and an \(h\) near 0.8 is a \(large\) effect.

We calculate Cohen’s h effect size in the following way:

\(h = \varphi_1 - \varphi_2\) where \(\varphi_i = 2sin^{-1}(\sqrt{p_i})\) referred to as the acrsine root (angular transformation).

Then our one-sided hypothesis testing we consider is

\(H_0:h=0\) vs. \(H_a: h>0\). We want to test that group \(1\) the systemic population-based approach detected a significanlty greater proportion of individuals with the mutations compared to group \(2\) in the family based approach.

The following calculation considers the same number of individuals per group.

# Library for power calculations using test of proportions
library(pwr)

h = data.frame(seq(from=0.08, to=0.30, by = .02))
names(h)="h"
sample_size = seq(from=100, to = 250, by=10)

for(i in 1:length(sample_size)){
  h[,i+1] = pwr.2p.test(h = h[,1], n = sample_size[i], alternative = "greater")$power
}
names(h)[-1] = sample_size

df = data.frame(t(h))
names(df) = as.character(df[1,])
df$size = rownames(df)
df = df[-1,]

Next, make some figures.

library(data.table)
df2  = melt(setDT(df), id.vars = "size", variable.name = "h")
df2$h = factor(df2$h)
names(df2)[3] = "power"

library(ggplot2)
ggplot(df2, aes(x = size, y = power, group = h, color = h)) + 
  geom_point() + 
  geom_line() + 
  guides(color = guide_legend(title = "Cohen's h")) +
  labs(x = "Sample Size (per group)", y = "Power") +
  geom_hline(aes(yintercept=0.8))

Example Case

Note that we may observe a small effect size in our project. Consider the case where \(p_1 = 0.20\) as the proportion of individuals with mutations in group \(1\) systemic population-based) and \(p_2 = 0.15\) as the proportion of people with mutations in group \(2\) family history approach, then we observe a small effect size with cohen’s h at 0.13. With small effect size, we need a larger sample size.

p1 = 0.2
p2 = 0.15
ES.h(p1, p2)

## [1] 0.1318964

Other Considerations

This analysis is for all individuals that have consented and we have data for. We may consider target sample size as \(n\times1.2\) if we assume a 20% dropout rate.