Intro

This is a quick intro into how to calculate sample size for an A/B test involving lift on a percentage metric (like conversion rate).

Quick Theory

To estimate sample size, you need four out of five parameters:

Alternative Functions to Estimate Sample Size

Function 1: power.prop.test

One version of a function that does this for us is power.prop.test. Function parameters:

  • n: number of observations per group
  • p1: probability in one group
  • p2: probability in other group
  • sig.level: significance level (Type I error probability)
  • power: power of the test (1 minus Type II error probability)
  • alternative: one or two sided test
  • strict: use strict interpretation in two-sided case
  • tol: numerical tolerance used in root finding
# Sample size needed to pick up a 1% increase in rate
power.prop.test(p1=.01, p2=.02, power=0.8, sig.level=0.05)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 2318.165
##              p1 = 0.01
##              p2 = 0.02
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
# Minimum increase in rate that can be picked up with 1000 samples (in each group)
power.prop.test(p1=.01, n=1000, power=0.8, sig.level=0.05)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 1000
##              p1 = 0.01
##              p2 = 0.02684817
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Function 2 pwr.2p2n.test{pwr}

This can be used to calculate sample sizes with unequal groups. The recommended values for h are: 0.2 for small effects, 0.5 for medium and 0.8 for big effects.

Parameters

  • h - Effect size.
  • n1 - Number of observations in the first sample
  • n2 - Number of observationsz in the second sample
  • sig.level - Significance level (Type I error probability)
  • power - Power of test (1 minus Type II error probability)
  • alternative - a character string specifying the alternative hypothesis, must be one of “two.sided” (default), “greater” or “less”
library("pwr")
pwr.2p2n.test(h=0.2, n1=1000, sig.level=0.05, power=0.8)
## 
##      difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.2
##              n1 = 1000
##              n2 = 244.1239
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: different sample sizes

Function to Measure Actual Test Results prop.test

Actual Sample Size Test

prop.test(c(23, 48), c(2319,2319))
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(23, 48) out of c(2319, 2319)
## X-squared = 8.2388, df = 1, p-value = 0.0041
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.01827178 -0.00328924
## sample estimates:
##      prop 1      prop 2 
## 0.009918068 0.020698577