Introduction

Suppose that a random sample of \(n\) objects is chosen from a group of \(N\) objects, \(S\) of which are successes. The distribution of the number of successes, \(X\), in the sample is called the hypergeometric distribution.

Probability Distribution

The probability function of the hypergeometric distribution uses what we learned about combinations in Lesson 3.2. \(C_x^S\) is the number of ways we can choose \(x\) successes from \(S\) successes. \(C_{n-x}^{N-S}\) is the number of ways we can choose \(n-x\) failures from \(N-S\) failures. And \(C_n^N\) is the total number of ways we choose \(n\) objects from \(N\) objects.

\[p(x) = P(X=x)=\frac{C_x^S C_{n-x}^{N-S}}{C_n^N}\]

where \(x\) can take integer values ranging from the larger of \(0\) and \(n-(N-S)\) to the smaller of \(n\) and \(S\).

In R, \(p(x) = \text{dhyper}(x,S,N-S,n)\)

The cumulative distribution function is \[F(x)=P(X \le x)=\sum_{k=0}^{x} \frac{C_k^S C_{n-k}^{N-S}}{C_n^N}\]

In R, \(F(x) = \text{phyper }(x,S,N-S,n)\)

Example 1: Three different computers are checked from ten in a department. Four of the ten computers have illegal software loaded. What is the probability that

  • exactly two have illegal software loaded?
  • at most two have illegal software loaded?

\(N=10\), \(n=3\), \(S=4\), \(x=2\)

  • \(p(2) = P(X=2) = \frac{C_2^4 C_{3-2}^{10-4}}{C_3^{10}} = \frac{6 \cdot 6}{120} = 0.3=\text{ dhyper}(2,4,6,3)\)

  • \(F(2) = P(X \le 2) = \sum_{x=0}^2 \frac{C_x^4 C_{3-x}^{10-4}}{C_3^{10}} = 0.9667 = \text{ phyper}(2,4,6,3)\)

Summary

\(X\) is the number of successes out of \(n\) when selected from a population of size \(N\) with \(S\) successes.

\[p(x) = P(X=x)=\frac{C_x^S C_{n-x}^{N-S}}{C_n^N} = \text{dhyper}(x,S,N-S,n)\] \[F(x)=P(X \le x)=\sum_{k=0}^{x} \frac{C_k^S C_{n-k}^{N-S}}{C_n^N} = \text{phyper }(x,S,N-S,n)\]