set.seed(seed = 17)
We know that when rolling a fair six-sided die, the probability of any given value being rolled is \(\frac{1}{6}\), and that the expected value of this uniform distribution is 3.5. However, in practice, rolling such a die six times would not guarantee that each number is rolled exactly once. Over hundreds, thousands, or even millions of repetitions, though, we would expect to see a much more even distribution of the numbers rolled. This is what the Law of Large Numbers describes, that as the sample size of an experiment increases, the sample mean will converge on the population mean. In other words, for random variable X, sample size n, and population average \(\mu\), \(\lim_{n \to \infty}{\bar{X_n} - \mu} = 0\)
The Central Limit Theorem describes how, with a sufficiently large sample size, a random variable X can be modeled with a normal distribution. Typically the cutoff for sample size is 30 or more, and once this is reached then we can take the mean of the sample to be the same as the mean of the population (\(\bar{X} = \mu\)) and the standard deviation of the sample to be the ratio of the population standard deviation and the square root of the sample size (Sample Standard Deviation = \(\frac{\sigma}{\sqrt{n}}\)). Using dice as the example, if we were to roll two six-sided dice instead of of one, after enough rolls of the dice we can see that the random variable (which comes from the sum of two uniform distributions) starts to resemble a normal distribution with mean 7 and standard deviation \(\frac{ \sqrt{210}}{6}\) (\(\approx2.4152\)).
(This website helped me understand the key differences between the LLN and the CLT: https://www.geeksforgeeks.org/maths/central-limit-theorem/)
Both the Law of Large Numbers and the Central Limit Theorem state that, given a large enough sample size, the sample mean will converge on the population mean. Where they differ is that the Law of Large Numbers states that the sample mean and the population mean follow the same distribution, whereas with the Central Limit Theorem the distribution of the sample is usually unknown and always irrelevant to the fact that the random variable can eventually be modeled with a normal distribution.
The hypergeometric distribution is a discrete probability distribution. Given a finite population and a fixed number of elements in the population with a desired trait, this distribution describes the probability of pulling a set number of successes from the sample within a number of tries without replacement. This distribution is most commonly used to determine the probability of pulling one or more specific cards from a deck within a number of draws, such as when trying to find the last card of a given suit to complete a flush in poker. The key aspects of this distribution is that population size is fixed and sampling is done without replacement, so the number of successes in the sample can never exceed the number of successes in the population.
In our scenario, let’s shuffle together four decks of playing cards, including the jokers. This yields a total of 216 cards (\(N\)). Our aim is to draw hearts, of which there are 52 in the pile (\(K\)). I’ll now pick a value for the number of draws, \(n\), such that \(n<0.1N\): \(n = 21\).
K <- 52
N <- 216
n <- 21
# to know if the Central Limit Theorem applies, let's check to see that nK/N > 5 and n(1-K/N) > 5
(n*K/N > 5) && (n*(1-K/N) > 5)
## [1] TRUE
x <- 0:n #up to the maximum number of draws
plot(x, dhyper(x, K, N-K, n), type = "h", lwd = 5, ylab = "Probabilities", xlab = "# of Hearts Drawn", main = "Hypergeometric Distribution")
If we’re going to use these parameters to approximate a normal
distribution, let’s now calculate the mean and standard deviation. \(\mu = n\frac{K}{N} = 21\frac{52}{216} \approx
5.0556\) \(\sigma =
\sqrt{n\frac{K}{N}\frac{N-K}{N}\frac{N-n}{N-1}} \approx
1.8659\)
mean <- n*K/N
sd <- sqrt((n*K/N)*(N-K)/N*(N-n)/(N-1))
plot(x, dnorm(x, mean, sd), type = "b", ylab = "Probability", xlab = "# of Hearts Drawn", main = "Normal Approximation")
Now we compare this approximation against a random sample of our initial
hypergeometric distribution. 1000 iterations of the experiment should
do.
data <- rhyper(1:1000, K, N-K, n)
hist(data)
This appears to be roughly a normal distribution with a right skew. It has a mean value of around 5, which is the same as in our approximation as well as the mean we saw in our earlier histogram.
c(mean(data), mean)
## [1] 4.989000 5.055556
The Central Limit Theorem seems to hold. There is a percentage difference of about 1.2% between the sample mean and the mean of the approximation, which is a difference of about 3.2% of a standard deviation.