Hypergeometric Distribution Sample Size Calculator

November 24, 2017

Motivation

R Currently support calculation of the confidence intervals of the Hypergeometric distribution through the 'samplingbook' package available at Cran. However, sample size calculations are not supported in any package I could find in R.

Link to samplingbook package in R

https://cran.r-project.org/web/packages/samplingbook/index.html

Link to the Hypergoemetric Distribution on Wikipedia

https://en.wikipedia.org/wiki/Hypergeometric_distribution

Full Disclosure: I have a client that needed this application. I using it to both satisfy the client and demonstrate the use of Shiny.

Example Problem

Example: you would like to poll from N=100 likely voters, how many samples do you need to have a 95% confidence interval width = .1? In order to calculate the sample size, we need to know p (proportion estimated defective), N (population sample size), Confidence Level, and Maximum Confidence Interval Width. For this problem …

p (proportion estimated defective) = .5. The confidence intervals are widest at .5, thus using .5 will ensure adequate samples for any p.

N (population sample size) = 100. Only 100 are eligible to vote making the population these 100 people.

Confidence Level = .95. The resulting confidence interval will have a 95% chance of including the population proporition.

Target Max CI Width = .1.

Note: Assumptions required to use the Hypergometric distribution. 1) There are only two candidates. 2) Only 100 people are eligible to vote.")

HypoSampleSize Function

I've written the following function that will repatedly call various sample size and calculate their resulting confidence interval width. The signature of this function is below.

HypoSampleSize(proportion estimated defective, N, Confidence Level, startN, endN)

Note: that startN and endN are optional parameters that define the range of sample sizes for the calculator to use. Below is the solution for the voting example, note that n=54 for the smallest n where the Confidence interval is <=.1.

HypoSampleSize (.5,100,.95,52,55)

##   N (pop size) n (sample size) m (# events) CI Lower Width
## 1          100              52           26     0.11000000
## 2          100              53           26     0.10056604
## 3          100              54           27     0.10000000
## 4          100              55           28     0.09909091
##   proportion (m/n) CI Upper Width
## 1        0.5000000      0.1100000
## 2        0.4905660      0.1094340
## 3        0.5000000      0.1000000
## 4        0.5090909      0.1009091

Shiny Application for HypoSampleSize

Shiny provides an excellent way to use HypoSampleSize. Using the link below, you can use a GUI to control the parameter of HypoSampleSize and see the impact of changing n using plots instead of data frames.

USE GOOGLE CHROME FOR BEST RESULTS for some reason IE and Edge don't display properly at first. If you choose to use IE or Edge, you will need to change one of the inputs to get the plots to work.

https://philipmayfield.shinyapps.io/hypergeometric_sample_size_calculator/