The Law of Large Numbers and Central Limit Theorem with Simulation in Python

A Short Guide to the Law of Large Numbers and Central Limit Theorem

Author
Affiliations

John Karuitha, PhD

Karatina University, School of Business and Economics

University of the Witwatersrand, School of Construction Economics & Management

Published

October 7, 2024

Modified

October 7, 2024

Executive Summary

This article provides a brief exploration of two fundamental statistical theorems: the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT). It explains the meaning, applications, and significance of both theorems in various fields, such as insurance, polling, and hypothesis testing. Practical simulations are demonstrated using R, including rolling a die to illustrate the LLN and sampling from a uniform distribution to showcase the CLT. Visualizations are generated using Matplotlib to highlight how sample means converge to expected values and approximate normal distributions. These concepts are essential for understanding statistical inference and the behavior of sample data.

1 Introduction

The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) are two of the most fundamental theorems in probability and statistics. This document provides a short guide to understanding these concepts, their practical applications, and how to simulate them in Python using Matplotlib for visualizations.

2 Part 1: The Law of Large Numbers

2.1 Meaning of the Law of Large Numbers

The Law of Large Numbers (LLN) states that as the number of independent and identically distributed (i.i.d.) trials or observations increases, the sample average converges to the expected value of the population. In other words, the more data you collect, the closer the sample mean will be to the population mean (Black 2023).

2.1.1 Types of LLN

  • Weak Law of Large Numbers (WLLN): Convergence in probability.
  • Strong Law of Large Numbers (SLLN): Almost sure convergence.

2.1.2 Formula

If \(X_{1}, X_2, ..., X_n\) are i.i.d. random variables with a finite expected value \(E(X)\), the sample average is given by:

\[ \bar{X}_n = \frac{1}{n} \sum_{i=1}^{n} X_i \]

As \(n\) increases, \(\bar{X}_n\) approaches \(E(X)\).

2.2 Applications of the Law of Large Numbers

LLN is foundational to many fields, including:

  • Insurance: Insurers rely on LLN to predict average loss and set premiums.
  • Gambling: Casinos use LLN to ensure that, over time, they make a predictable profit.
  • Polling: As more people are polled, the average response should approximate the population mean.

2.3 Simulating the Law of Large Numbers in R

2.3.1 Example: Rolling a Fair Die

We will simulate rolling a fair six-sided die. The expected value is:

\[ E(X) = \frac{1+2+3+4+5+6}{6} = 3.5 \]

The simulation will show how the average of dice rolls converges to this expected value as the number of rolls increases.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# Set seed for reproducibility
np.random.seed(123)

# Simulate rolling a fair die
n = 10000
dice_rolls = np.random.randint(1, 7, size=n)
cumulative_avg = np.cumsum(dice_rolls) / np.arange(1, n+1)

We then summarise and visualize the variables.

myllm_df = pd.DataFrame({"rolls": dice_rolls, "avg": cumulative_avg})
myllm_df.describe()
rolls avg
count 10000.000000 10000.000000
mean 3.486400 3.480132
std 1.700031 0.043055
min 1.000000 2.687500
25% 2.000000 3.473821
50% 3.000000 3.482198
75% 5.000000 3.488255
max 6.000000 6.000000
sns.pairplot(myllm_df, palette = "mako", kind = "kde", corner = True)
plt.show()

Pairs Plot of Rolls and Average Outcomes

Next, we visualize the simulation of the LLN.

# Plot using Matplotlib
plt.figure(figsize=(10, 6))
plt.plot(cumulative_avg, color='blue', label='Cumulative Average')
plt.axhline(y=3.5, color='red', linestyle='--', label='Expected Value (3.5)')
plt.title('Law of Large Numbers: Convergence of Sample Mean')
plt.xlabel('Number of Rolls')
plt.ylabel('Cumulative Average')
plt.legend()
plt.show()

2.3.2 Interpretation

In the plot above, as the number of dice rolls increases, the sample mean (blue line) converges toward the expected value of 3.5 (red dashed line). This demonstrates the Law of Large Numbers.

3 Part 2: The Central Limit Theorem

3.1 Meaning of the Central Limit Theorem

The Central Limit Theorem (CLT) states that the distribution of the sample mean of a sufficiently large number of i.i.d. random variables, regardless of the original distribution, will tend to follow a normal distribution, provided the original population has a finite variance (Black 2023).

3.1.1 Formula

Given a population with mean (\(\mu\)) and variance ( \(\sigma^2\) ), the distribution of the sample mean (\(\bar{X}\_n\)) for large (\(n\)) is approximately normal:

\[ \bar{X}_n \sim N\left(\mu, \frac{\sigma^2}{n}\right) \]

3.1.2 Conditions for CLT:

  • The samples must be independent.
  • The sample size should be large (typically (\(n > 30\))).
  • The population should have finite variance.

3.2 Applications of the Central Limit Theorem

CLT is crucial in many statistical methods, including:

  • Hypothesis Testing: CLT allows us to use normal approximations for the distribution of sample means.
  • Confidence Intervals: Sample means can be assumed to be normally distributed for large samples.
  • Quality Control: CLT is used in process control charts to monitor deviations in manufacturing processes.

3.3 Simulating the Central Limit Theorem in R

3.3.1 Example: Sampling from a Uniform Distribution

Let’s take 1,000 samples of size 50 from a uniform distribution (which is not normal) and demonstrate how the sample means approximate a normal distribution as per the CLT.

# Simulate sampling from a uniform distribution
n_samples = 1000
sample_size = 50
sample_means = [np.mean(np.random.uniform(0, 1, sample_size)) for _ in range(n_samples)]

We then summarise the variable.

clt_df = pd.DataFrame(sample_means)
clt_df.describe()
0
count 1000.000000
mean 0.501076
std 0.040134
min 0.366184
25% 0.475069
50% 0.501407
75% 0.527721
max 0.651768

Next, we plot the data demonstrating the CLT.

# Plot the distribution of sample means using Seaborn
plt.figure(figsize=(10, 6))
sns.histplot(sample_means, kde=True, color='lightblue', bins=30)
plt.title('Central Limit Theorem: Distribution of Sample Means')
plt.xlabel('Sample Means')
plt.ylabel('Density')
plt.show()

3.3.2 Interpretation

In the histogram above, the distribution of the sample means (blue bars) approximates a normal distribution (red curve), even though the original data came from a uniform distribution. This illustrates the Central Limit Theorem in action.

4 Conclusion

The Law of Large Numbers and the Central Limit Theorem are foundational to probability and statistics. The LLN ensures that, as the sample size increases, the sample mean converges to the population mean, while the CLT shows that the distribution of sample means approaches normality as the sample size grows. These principles underpin much of statistical inference and are essential for understanding how sample data behave (Grolemund and Wickham 2023; McKinney et al. 2011; Bressert 2012; Waskom 2021).

References

Black, Ken. 2023. Business Statistics: For Contemporary Decision Making. John Wiley & Sons.
Bressert, Eli. 2012. “SciPy and NumPy: An Overview for Developers.”
Grolemund, Garrett, and Hadley Wickham. 2023. R for Data Science, 2nd Edition. O’Reilly Media.
McKinney, Wes et al. 2011. “Pandas: A Foundational Python Library for Data Analysis and Statistics.” Python for High Performance and Scientific Computing 14 (9): 1–9.
Waskom, Michael L. 2021. “Seaborn: Statistical Data Visualization.” Journal of Open Source Software 6 (60): 3021.