Dis_5

1) Describe Law of Large Numbers?

The Law of Large Numbers is a fundamental concept in probability theory and statistics. It states that as the number of independent, identically distributed (i.i.d.) random variables increases, the average of these variables converges to the expected value of the variables. In simpler terms, it suggests that the more observations or trials we have, the closer the average outcome will be to the true or expected value.

2) Please explain CLT in your own words :

The central limit theorem basically states two important things:

a.Mean of the sample = Mean of the population

b.Standard deviation of the sample = Standard deviation of population/sqrt

To put it another way, as the sample size increases, the mean becomes a normal distribution regardless of the population distribution, the sample variance shrinks, and the mean sample concentrates around the mean population distribution. Using this theorem we can convert any kind of distribution to a normal distribution.

What are the similarities and differences between LLN and CLT?

The Law of Large Numbers and the Central Limit Theorem are both important concepts in statistics, but they have different focuses and implications. The LLN describes the convergence of the sample mean to the expected value, while the CLT describes the convergence of the distribution of the sum or average to a normal distribution. Understanding and applying both concepts is crucial in various statistical analyses and inference procedures.

Pick up any distribution

The exponential distribution is a continuous probability distribution used to model the time between events in a Poisson process. It is characterized by a single parameter, λ (lambda), which represents the average rate of events occurring per unit of time.

The probability density function (PDF) of the exponential distribution is given by f(x) = λ * e^(-λx), where e is the base of the natural logarithm and x represents a non-negative value.

The exponential distribution is commonly used in various fields, such as reliability engineering, queuing theory, and survival analysis. It is particularly useful when analyzing situations where events occur randomly and independently over time, and the memorylessness property of the distribution implies that the probability of an event occurring in the next time interval is independent of how much time has already passed.

The exponential distribution exhibits a right-skewed shape, with a decreasing probability density as x increases. The mean of the distribution is equal to 1/λ, while the standard deviation is also equal to 1/λ.

Then, apply the CLT on the sample mean of this chosen distribution

# Parameters for the Exponential distribution
rate <- 0.2

# Number of samples to draw from the Exponential distribution
sample_size <- 1000

# Number of times to repeat the sampling process
num_samples <- 1000

# Function to generate sample means
generate_sample_means <- function(sample_size, num_samples, rate) {
  sample_means <- replicate(num_samples, mean(rexp(sample_size, rate)))
  return(sample_means)
}

# Generate sample means
set.seed(123)  # for reproducibility
sample_means <- generate_sample_means(sample_size, num_samples, rate)

# Plot the histogram of sample means
hist(sample_means, main = "Distribution of Sample Means (CLT)",
     xlab = "Sample Mean", col = "skyblue", border = "black")

Apply the CLT on any other sample statistic like say the sample median, sample 25th percentile or even the sample 80th percentile.

# Function to generate sample medians
generate_sample_medians <- function(sample_size, num_samples, rate) {
  sample_medians <- replicate(num_samples, median(rexp(sample_size, rate)))
  return(sample_medians)
}

# Function to generate sample 25th percentiles
generate_sample_25th_percentiles <- function(sample_size, num_samples, rate) {
  sample_25th_percentiles <- replicate(num_samples, quantile(rexp(sample_size, rate), probs = 0.25))
  return(sample_25th_percentiles)
}

# Generate sample medians
set.seed(123)  # for reproducibility
sample_medians <- generate_sample_medians(sample_size, num_samples, rate)

# Generate sample 25th percentiles
set.seed(123)  # for reproducibility
sample_25th_percentiles <- generate_sample_25th_percentiles(sample_size, num_samples, rate)

# Plot the histogram of sample medians
hist(sample_medians, main = "Distribution of Sample Medians (CLT)",
     xlab = "Sample Median", col = "lightgreen", border = "black")

# Plot the histogram of sample 25th percentiles
hist(sample_25th_percentiles, main = "Distribution of Sample 25th Percentiles (CLT)",
     xlab = "Sample 25th Percentile", col = "lightblue", border = "black")

Does central limit theorem hold as expected?

As sample size rises, the sample statistics (sample mean, sample median, and sample 25th percentile) converge to their theoretical values. This is consistent with the prediction of the central limit theorem.As the sample size grows, the histograms show a trend towards normality. The central limit theorem, which claims that the sampling distribution becomes approximately normal with increasing sample sizes, is supported by this pattern, even though the distributions may not exactly resemble normal distributions.