Discussion 5 - Central Limit Theorem

0.

set.seed(seed = 42)

1.

The Law of Large Numbers, in simple terms, states that as the number of independent and identically distributed (i.i.d.) random variables increases, the average or sample mean of those variables will converge to the expected value or population mean. In other words, if you repeatedly perform a random experiment and calculate the average outcome over a large number of trials, the average will become increasingly close to the expected value as the number of trials increases. This law is based on the assumption that each trial is independent and that the random variables have the same probability distribution. For example, let’s say you’re flipping a fair coin. The expected value or population mean for this experiment is 0.5, as there’s an equal chance of getting heads or tails. According to the Law of Large Numbers, if you flip the coin a large number of times and calculate the average, it will approach 0.5. In practice, this means that the more times you flip the coin, the closer the average will be to 0.5. The Law of Large Numbers is a fundamental concept in probability theory and statistics, and it has various applications in fields such as finance, insurance, economics, and scientific research. It provides a mathematical foundation for understanding the behavior of random variables and helps in making statistical inferences based on observed data.

2.

The Central Limit Theorem (CLT) is a fundamental concept in probability theory and statistics that explains the behavior of the sum or average of a large number of independent and identically distributed (i.i.d.) random variables. It states that regardless of the shape of the original distribution, when the sample size is sufficiently large, the distribution of the sample mean approaches a normal distribution. In simpler terms, the CLT tells us that if we take repeated samples from a population and calculate the means of those samples, the distribution of these sample means will resemble a bell-shaped curve, known as a normal distribution, even if the original population distribution is not normal. The CLT has several important implications and uses in statistics. Here are a few key points: - Approximation of the sampling distribution: The CLT allows us to approximate the sampling distribution of the sample mean, regardless of the underlying distribution of the population. This is significant because the normal distribution has many desirable properties and is well-understood mathematically. - Estimation and hypothesis testing: The CLT enables us to make inferences about population parameters based on sample statistics. For example, we can estimate the population mean and calculate confidence intervals or perform hypothesis tests using the sample mean. - Sample size determination: The CLT helps in determining the appropriate sample size for statistical analyses. It tells us that as the sample size increases, the sampling distribution becomes more normal, allowing for more accurate estimations and inferences. - Application to other statistics: The CLT is not limited to the sample mean but can be applied to other statistics as well. For instance, it can be used to analyze the sample sum, sample proportion, or sample variance.

3.

Similarities: - Both LLN and CLT deal with the behavior of random variables when the sample size is large. - They are applicable to independent and identically distributed (i.i.d.) random variables. - Both LLN and CLT are used to make inferences about population parameters based on sample statistics. - They are fundamental principles in statistical analysis and play a crucial role in hypothesis testing and estimation.

Differences: - LLN focuses on the behavior of the sample mean as the number of observations increases. It states that the sample mean converges to the population mean as the sample size grows larger. In contrast, CLT focuses on the distribution of the sample mean itself. It states that the distribution of the sample mean approximates a normal distribution as the sample size increases, regardless of the shape of the population distribution. - LLN is concerned with the convergence of the sample mean to the population mean, while CLT is concerned with the convergence of the sample mean’s distribution to a normal distribution. - LLN assumes the random variables are independent and identically distributed, but it does not make any assumptions about the shape of the population distribution. CLT also assumes independence and identical distribution, but it additionally assumes that the population distribution has a finite variance.

4.

The Exponential distribution is a continuous probability distribution that models the time between events in a Poisson process. It is characterized by its rate parameter, often denoted as lambda, which represents the average rate of event occurrence. The Exponential distribution is commonly used to model the lifespan or time to failure of certain systems or events that exhibit a constant hazard rate.

random_numbers <- rexp(100, rate = 0.5)
print(head(random_numbers, 5))

## [1] 0.3966736 1.3217905 0.5669821 0.0763838 0.9463533

pdf_value <- dexp(1, rate = 0.5)
print(pdf_value)

## [1] 0.3032653

cdf_value <- pexp(2, rate = 0.5)
print(cdf_value)

## [1] 0.6321206

5A.

# Set the parameters for the exponential distribution
rate <- 0.5
sample_size <- 100
num_samples <- 1000

# Create an empty vector to store the sample means
sample_means <- numeric(num_samples)

# Generate multiple samples and calculate their means
for (i in 1:num_samples) {
  sample <- rexp(sample_size, rate = rate)
  sample_means[i] <- mean(sample)
}

# Plot a histogram of the sample means
hist(sample_means, breaks = 30, freq = FALSE, main = "Sample Means Distribution",
     xlab = "Sample Mean", ylab = "Density")

# Add a theoretical normal distribution curve
mu <- 1/rate  # Mean of the exponential distribution
sigma <- sqrt(1/(rate^2 * sample_size))  # Standard deviation of the sample mean
curve(dnorm(x, mean = mu, sd = sigma), add = TRUE, col = "red", lwd = 2)

5B.

# Set the parameters for the exponential distribution
rate <- 0.5
sample_size <- 100
num_samples <- 1000

# Create an empty vector to store the sample 25th percentiles
sample_percentiles <- numeric(num_samples)

# Generate multiple samples and calculate their 25th percentiles
for (i in 1:num_samples) {
  sample <- rexp(sample_size, rate = rate)
  sample_percentiles[i] <- quantile(sample, 0.25)
}

# Plot a histogram of the sample 25th percentiles
hist(sample_percentiles, breaks = 30, freq = FALSE, main = "Sample 25th Percentiles Distribution",
     xlab = "Sample 25th Percentile", ylab = "Density")

# Add a theoretical normal distribution curve
mu <- qexp(0.25, rate)  # 25th percentile of the exponential distribution
sigma <- qnorm(0.25)/(rate*sqrt(sample_size))  # Standard deviation of the sample 25th percentile

# Set the parameters for the exponential distribution
rate <- 0.5
sample_size <- 100
num_samples <- 1000

# Create an empty vector to store the sample 80th percentiles
sample_percentiles <- numeric(num_samples)

# Generate multiple samples and calculate their 80th percentiles
for (i in 1:num_samples) {
  sample <- rexp(sample_size, rate = rate)
  sample_percentiles[i] <- quantile(sample, 0.8)
}

# Plot a histogram of the sample 80th percentiles
hist(sample_percentiles, breaks = 30, freq = FALSE, main = "Sample 80th Percentiles Distribution",
     xlab = "Sample 80th Percentile", ylab = "Density")

# Add a theoretical normal distribution curve
mu <- qexp(0.8, rate)  # 80th percentile of the exponential distribution
sigma <- qnorm(0.8)/(rate*sqrt(sample_size))  # Standard deviation of the sample 80th percentile
curve(dnorm(x, mean = mu, sd = sigma), add = TRUE, col = "red", lwd = 2)

Yes, the Central Limit Theorem (CLT) holds as expected. Here are three key points that illustrate the validity and implications of the CLT: Convergence to a Normal Distribution: The CLT states that the sample mean (or other sample statistics) of independent and identically distributed (i.i.d.) random variables will converge to a normal distribution as the sample size increases, regardless of the shape of the original distribution. This means that even if the individual random variables are not normally distributed, their means will tend to follow a normal distribution. This convergence to a normal distribution is observed in practice, as demonstrated by the histograms and theoretical normal distribution curves in the code examples provided earlier. Robustness to Distributional Assumptions: One of the powerful aspects of the CLT is its robustness to the underlying distribution of the random variables. The CLT holds even when the original distribution is not normal, as long as certain conditions are met (such as finite variance and independence). This makes the CLT widely applicable in various fields, as it allows us to make probabilistic inferences and construct confidence intervals for the population mean or other sample statistics, regardless of the specific distribution of the data. Sample Size Importance: The CLT highlights the importance of sample size in achieving the convergence to a normal distribution. As the sample size increases, the sampling distribution of the sample mean becomes increasingly close to a normal distribution, with a smaller spread. This implies that larger sample sizes provide more reliable estimates of the population mean or other sample statistics. The CLT helps us understand the relationship between sample size and the accuracy of statistical inference, emphasizing the need for sufficiently large samples to obtain reliable results.

Discussion 5 - Central Limit Theorem

Ruiyang Li

2024-04-22

0.

1.

2.

3.

4.

5A.

5B.