Power Analysis and Sample Size Estimation

RPubs

Sources

Triola, Mario F., and Laura Iossi. Elementary Statistics. Pearson, 2018
Rosner, Bernard. Fundamentals of Biostatistics. Cengage Learning, 2017
Robert I. Kabacoff, R in Action, 2015

Learning Objectives

Understand the definition and relationship between power, sample size, effect size, and significance level
Learn how to estimate power and sample size for selected statistical tests

Load Libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openxlsx)
library(pwr)
## Warning: package 'pwr' was built under R version 4.5.2

Errors in Hypothesis Testing

Despite applying the proper statistical procedures, we might sometimes arrive at the wrong conclusion of rejecting or failing to reject the null hypothesis. There are two different types of errors that can be distinguished by calling them Type I and Type II errors:

Type I error: The mistake of rejecting the null hypothesis when it is actually true. Type I error is equal to \(\alpha\).
Type II error: The mistake of failing to reject the null hypothesis when it is false. Type II error is represented by the symbol \(\beta\) and it is equal to 1-power.

image title

Power and Sample Size

When designing a new study, researchers pay special attention to the power, effect size, sample size, and significance level. These quantities are interconnected. Given any three, you can calculate the fourth. Failing to account for these variables in study design will inadvertently result in wrong conclusions or study failure.

Power is the probability that the test correctly rejects the null hypothesis when the alternative hypothesis is true. The power of a test is \(1-\beta\).

image title https://en.wikipedia.org/wiki/File:Statistical_test,_significance_level,_power.png

Effect size is the magnitude of the difference between groups. Determining the effect size answers “what is the magnitude of the difference or relationship between two variables?” instead of just relying on the P-Value. A major advantage of effect size is that it allows to the compare findings of different studies that investigate the same outcome. The formula for effect size depends on the statistical methodology employed in the hypothesis testing.

image title

Alpha (\(\alpha\)) is the probability of rejecting the null hypothesis when it is actually true.
Sample size refers to the number of observations in each in each experimental group.

To calculate power and sample size, we are going to use the pwr package. This package covers several statistical tests. Below is a summary of the pwr package functions.

image title

T-test of two independent samples

Example 1: You are tasked with conducting a study to compare the mean height of adult males and females in the United States. In a previous survey, the heights of adult men in the United States were approximately normally distributed with a mean of 70 inches and a standard deviation of 3 inches. The heights of adult women were approximately normally distributed with a mean of 64.5 inches and a standard deviation of 2.5 inches. Using a significance level of 5%, calculate the sample size needed to achieve a power of 80%.

\({H_0}: \mu_{males} = \mu_{females}\)

\({H_1}: \mu_{males} \neq\ \mu_{females}\)

The formula to calculate power is

\[Power= \large{\phi(-z_{1-\alpha/2}+\frac{|\mu_1-\mu_2|}{\sqrt{\sigma_1^2/n1+\sigma_2^2/n2}})}\]

The formula to calculate the sample size (N) is:

\[\large{n=\frac{{2\sigma^2_{pooled}*(z_{1-\alpha/2}+z_{1-\beta}})}{({\mu_1-\mu_2})^2}=\frac{2*(z_{1-\alpha/2}+z_{1-\beta})}{d^2}}\]

Where \(\sigma_{pooled}\) is the pooled variance which has the following formula \(\sigma_{pooled}=\sqrt{\frac{sd_1^2+sd_2^2}{2}}\). Cohen’s d (\(d\)) has the following formula \(d=\frac{\mu1-\mu2}{\sigma_{pooled}}\)

Luckily, the pwr packages will do the math for us as long as we provide Cohen’s d. First we start by calculating the pooled standard deviation.

s1 <- 3
s2 <-  2.5
sd.pooled  <-  sqrt((s1^2+s2^2)/2)

Next, we calculate the Cohen’s d

u1 <- 70
u2 <- 64.5

cohen.d <- (u1-u2)/sd.pooled
cohen.d

## [1] 1.991786

We calculate the N using the pwr package. Keep in mind that N should always be rounded up.

pwr::pwr.t.test(d=cohen.d,sig=0.05,power = 0.8,type ="two.sample")

## 
##      Two-sample t test power calculation 
## 
##               n = 5.121023
##               d = 1.991786
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Interpretation: We need to enroll at least six males and six females in our study to achieve a power of 80%.

Using the same example, what would be the power of your study if you were only able to recruit four subjects in each group?

pwr::pwr.t.test(n=4, d=cohen.d,sig=0.05,type ="two.sample")

## 
##      Two-sample t test power calculation 
## 
##               n = 4
##               d = 1.991786
##       sig.level = 0.05
##           power = 0.6533829
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Interpretation: If you enroll four subjects in each group, the power of the study will be 65%.

Example 2: Your interested in studying the effect of oral contraceptives (OC) on blood pressure in nonpregnant, premenopausal women of age 35- to 39-year-old. In a previous small scale study, OC users had a mean systolic blood pressure (SBP) of 132.86 mm Hg and sample standard deviation of 15.34 mm Hg. non-OC users had a mean SBP of 127.44 mm Hg and sample standard deviation of 18.23 mm Hg. Determine the appropriate sample size for the study proposed using a two-sided test with a significance level of .05 and a power of .80.

We start by calculating pooled standard deviation

sd1 <- 15.34
sd2 <- 18.23
sd.pooled  <-  sqrt((sd1^2+sd2^2)/2)
sd.pooled

## [1] 16.84708

Next, we calculate the Cohen’d

u1 <- 132.86
u2 <- 127.44

cohen.d <- (u1-u2)/sd.pooled
cohen.d

## [1] 0.3217174

Using the pwr package, we calculate N, which should always be rounded up.

pwr::pwr.t.test(d=cohen.d,sig=0.05,power = 0.8,type ="two.sample")

## 
##      Two-sample t test power calculation 
## 
##               n = 152.6321
##               d = 0.3217174
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Interpretation: We will need enroll 153 subjects in each group to achieve a power of 80%.

Comparing Two Proportions

Example: Crohn’s disease (CD) is a chronic relapsing inflammatory bowel disease. It is characterized by a transmural inflammation which can affect any part of the gastrointestinal tract, most commonly the ileum, colon or both. A placebo-controlled randomized trial is proposed to assess the efficacy of Drug A in achieving remission in CD patients. A previous study showed that proportion of subjects achieving remission by Drug A is 30% compared to 15% in the placebo group. Using a significance level of 5%, what is the sample size needed to achieve a power of 80%

\({H_0}\): Remission proportion is independent of treatment (\(p_1=p_2\)).

\({H_1}\): Remission proportion is dependent of treatment (\(p_1 \neq\ p_2\)).

\[Power=\phi(\frac{|p_1-p_2|}{\sqrt{p_1(1-p_1)/n_1+p_2(1-p_2)/n_2}}-z_{\alpha/2}) \]

\[n_i=\frac{(z_{\alpha/2}+z_\beta)^2}{(p_1-p_2)^2}[p_1(1-p_1)+p_2(1-p_2)]\]

To calculate sample size using the pwr package, we need to calculate Cohen’s h, which has the following formula

\[h=2[arcsin\sqrt{p_1}-arcsin\sqrt{p_2}] \]

the pwr package has the ES.h function to calculate Cohen’s h

p1 <- 0.3
p2 <- 0.15
cohen.h <- pwr::ES.h(p1,p2)
cohen.h

## [1] 0.3638807

Using the pwr package, we calculate N, which should always be rounded up.

power1 <-pwr.2p.test(h = cohen.h, sig.level = 0.05, power = .80)
power1

## 
##      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.3638807
##               n = 118.5547
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: same sample sizes

Interpretation: We will need enroll 119 subjects in each group to achieve a power of 80%.

Factors Affecting Statistical Power

If the significance level is made smaller (\(\alpha\) decreases), \(z_\alpha\) increases and hence the power decreases.

If the alternative mean is shifted farther away from the null mean (\(|\mu_1-\mu_2|\) increases) or the alternative proportion is shifted away from the null proportion (\(|p_1-p_2|\) increase), then the power increases.

If the standard deviation of the distribution of individual observations increases (\(\sigma\) increases), then the power decreases.
If the sample size (N) increases, then the power increases.

Check out it this cool visualizations of factors influencing statistical power https://ytliu0.github.io/Stat_Med/power2.html & https://demonstrations.wolfram.com/StatisticalPower/

Power Analysis and Sample Size Estimation

Zaid Yousif, PharmD, MAS

1/29/2024

RPubs

Sources

Learning Objectives

Load Libraries

Errors in Hypothesis Testing

Power and Sample Size

T-test of two independent samples

Comparing Two Proportions

Factors Affecting Statistical Power