import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
The Law of Large Numbers and Central Limit Theorem with Simulation in Python
A Short Guide to the Law of Large Numbers and Central Limit Theorem
This article provides a brief exploration of two fundamental statistical theorems: the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT). It explains the meaning, applications, and significance of both theorems in various fields, such as insurance, polling, and hypothesis testing. Practical simulations are demonstrated using R, including rolling a die to illustrate the LLN and sampling from a uniform distribution to showcase the CLT. Visualizations are generated using Matplotlib to highlight how sample means converge to expected values and approximate normal distributions. These concepts are essential for understanding statistical inference and the behavior of sample data.
1 Introduction
The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) are two of the most fundamental theorems in probability and statistics. This document provides a short guide to understanding these concepts, their practical applications, and how to simulate them in Python using Matplotlib for visualizations.
2 Part 1: The Law of Large Numbers
2.1 Meaning of the Law of Large Numbers
The Law of Large Numbers (LLN) states that as the number of independent and identically distributed (i.i.d.) trials or observations increases, the sample average converges to the expected value of the population. In other words, the more data you collect, the closer the sample mean will be to the population mean (Black 2023).
2.1.1 Types of LLN
- Weak Law of Large Numbers (WLLN): Convergence in probability.
- Strong Law of Large Numbers (SLLN): Almost sure convergence.
2.1.2 Formula
If \(X_{1}, X_2, ..., X_n\) are i.i.d. random variables with a finite expected value \(E(X)\), the sample average is given by:
\[ \bar{X}_n = \frac{1}{n} \sum_{i=1}^{n} X_i \]
As \(n\) increases, \(\bar{X}_n\) approaches \(E(X)\).
2.2 Applications of the Law of Large Numbers
LLN is foundational to many fields, including:
- Insurance: Insurers rely on LLN to predict average loss and set premiums.
- Gambling: Casinos use LLN to ensure that, over time, they make a predictable profit.
- Polling: As more people are polled, the average response should approximate the population mean.
2.3 Simulating the Law of Large Numbers in R
2.3.1 Example: Rolling a Fair Die
We will simulate rolling a fair six-sided die. The expected value is:
\[ E(X) = \frac{1+2+3+4+5+6}{6} = 3.5 \]
The simulation will show how the average of dice rolls converges to this expected value as the number of rolls increases.
# Set seed for reproducibility
123)
np.random.seed(
# Simulate rolling a fair die
= 10000
n = np.random.randint(1, 7, size=n)
dice_rolls = np.cumsum(dice_rolls) / np.arange(1, n+1) cumulative_avg
We then summarise and visualize the variables.
= pd.DataFrame({"rolls": dice_rolls, "avg": cumulative_avg})
myllm_df myllm_df.describe()
rolls | avg | |
---|---|---|
count | 10000.000000 | 10000.000000 |
mean | 3.486400 | 3.480132 |
std | 1.700031 | 0.043055 |
min | 1.000000 | 2.687500 |
25% | 2.000000 | 3.473821 |
50% | 3.000000 | 3.482198 |
75% | 5.000000 | 3.488255 |
max | 6.000000 | 6.000000 |
= "mako", kind = "kde", corner = True)
sns.pairplot(myllm_df, palette plt.show()
Next, we visualize the simulation of the LLN.
# Plot using Matplotlib
=(10, 6))
plt.figure(figsize='blue', label='Cumulative Average')
plt.plot(cumulative_avg, color=3.5, color='red', linestyle='--', label='Expected Value (3.5)')
plt.axhline(y'Law of Large Numbers: Convergence of Sample Mean')
plt.title('Number of Rolls')
plt.xlabel('Cumulative Average')
plt.ylabel(
plt.legend() plt.show()
2.3.2 Interpretation
In the plot above, as the number of dice rolls increases, the sample mean (blue line) converges toward the expected value of 3.5 (red dashed line). This demonstrates the Law of Large Numbers.
3 Part 2: The Central Limit Theorem
3.1 Meaning of the Central Limit Theorem
The Central Limit Theorem (CLT) states that the distribution of the sample mean of a sufficiently large number of i.i.d. random variables, regardless of the original distribution, will tend to follow a normal distribution, provided the original population has a finite variance (Black 2023).
3.1.1 Formula
Given a population with mean (\(\mu\)) and variance ( \(\sigma^2\) ), the distribution of the sample mean (\(\bar{X}\_n\)) for large (\(n\)) is approximately normal:
\[ \bar{X}_n \sim N\left(\mu, \frac{\sigma^2}{n}\right) \]
3.1.2 Conditions for CLT:
- The samples must be independent.
- The sample size should be large (typically (\(n > 30\))).
- The population should have finite variance.
3.2 Applications of the Central Limit Theorem
CLT is crucial in many statistical methods, including:
- Hypothesis Testing: CLT allows us to use normal approximations for the distribution of sample means.
- Confidence Intervals: Sample means can be assumed to be normally distributed for large samples.
- Quality Control: CLT is used in process control charts to monitor deviations in manufacturing processes.
3.3 Simulating the Central Limit Theorem in R
3.3.1 Example: Sampling from a Uniform Distribution
Let’s take 1,000 samples of size 50 from a uniform distribution (which is not normal) and demonstrate how the sample means approximate a normal distribution as per the CLT.
# Simulate sampling from a uniform distribution
= 1000
n_samples = 50
sample_size = [np.mean(np.random.uniform(0, 1, sample_size)) for _ in range(n_samples)] sample_means
We then summarise the variable.
= pd.DataFrame(sample_means)
clt_df clt_df.describe()
0 | |
---|---|
count | 1000.000000 |
mean | 0.501076 |
std | 0.040134 |
min | 0.366184 |
25% | 0.475069 |
50% | 0.501407 |
75% | 0.527721 |
max | 0.651768 |
Next, we plot the data demonstrating the CLT.
# Plot the distribution of sample means using Seaborn
=(10, 6))
plt.figure(figsize=True, color='lightblue', bins=30)
sns.histplot(sample_means, kde'Central Limit Theorem: Distribution of Sample Means')
plt.title('Sample Means')
plt.xlabel('Density')
plt.ylabel( plt.show()
3.3.2 Interpretation
In the histogram above, the distribution of the sample means (blue bars) approximates a normal distribution (red curve), even though the original data came from a uniform distribution. This illustrates the Central Limit Theorem in action.
4 Conclusion
The Law of Large Numbers and the Central Limit Theorem are foundational to probability and statistics. The LLN ensures that, as the sample size increases, the sample mean converges to the population mean, while the CLT shows that the distribution of sample means approaches normality as the sample size grows. These principles underpin much of statistical inference and are essential for understanding how sample data behave (Grolemund and Wickham 2023; McKinney et al. 2011; Bressert 2012; Waskom 2021).