set.seed(seed = 178)

1. Please Google and describe Law of Large Numbers in wer own words.

This law is based on the idea that random events tend to average out over time. When we take a small sample from a large population, there might be some variability, and the sample average might not perfectly reflect the population average. However, as we increase the sample size, the influence of individual random variations diminishes. With a large enough sample, the average of the sample will converge towards the true average of the entire population.

In practical terms, the Law of Large Numbers is why we can make reliable predictions and decisions based on data. It provides a mathematical foundation for understanding how random processes behave when observed over a large scale, ensuring that our conclusions are more likely to be accurate when we have a substantial amount of data to analyze.

  1. Please explain CLT in your own words.

The Central Limit Theorem (CLT) is a fundamental concept in statistics that describes the distribution of sample means. According to the CLT, regardless of the shape of the original population distribution, the distribution of the sample means will tend to be approximately normal (bell-shaped) if the sample size is sufficiently large. This is a powerful idea with wide-ranging applications in statistics and data analysis.

To break it down further, let’s say we have a population with any shape of distribution (it could be uniform, skewed, or even completely random). If we repeatedly draw samples from this population and calculate the mean of each sample, the distribution of those sample means will be normal if our sample size is large enough. This normal distribution of sample means has several important properties, such as a well-defined mean and standard deviation, regardless of the shape of the original population.

The CLT is incredibly useful because it allows statisticians and researchers to make inferences about a population based on the distribution of sample means. For example, it forms the basis for many hypothesis tests and confidence intervals. When we know that the distribution of sample means is approximately normal, we can use the properties of the normal distribution to make statistical predictions and draw conclusions about the population mean.

One practical implication of the CLT is that it provides a basis for justifying the use of techniques like Z-tests and t-tests, which are commonly used in hypothesis testing. These tests rely on the assumption that the sampling distribution of the sample mean is normal, allowing researchers to make accurate inferences about population parameters even when they don’t know the shape of the population distribution.

I found this Youtube video very helpful in understanding the concept.

  1. What are the similarities and differences between LLN and CLT? Write a few lines.

Similarities:

Differences:

  1. Pick up any distribution apart from normal, uniform or poisson.

One interesting probability distribution apart from the normal, uniform, or Poisson distribution is the Exponential Distribution. This distribution is often used to model the time between events in a Poisson process, where events occur continuously and independently at a constant average rate. In the exponential distribution, the probability density function is skewed to the right, meaning that shorter intervals between events are more likely than longer intervals. The distribution is defined by a single parameter, often denoted as λ (lambda), which represents the rate at which events occur. A smaller λ corresponds to a slower event rate, while a larger λ results in a faster event rate. The exponential distribution is widely applied in various fields, including reliability engineering, queuing theory, and survival analysis.

  1. A. Apply the CLT on the sample mean of this chosen distribution in R.
rm(list = ls()) # Clear environment
# Set the parameters for the exponential distribution
lambda <- 0.5  

# Number of samples and sample size
num_samples <- 1000  # Number of samples
sample_size <- 30    # Size of each sample

# Create an empty vector to store sample means
sample_means <- numeric(num_samples)

# Generate random samples and calculate sample means
for (i in 1:num_samples) {
  sample <- rexp(sample_size, rate = lambda)  # Generate an exponential sample
  sample_means[i] <- mean(sample)            # Calculate the sample mean
}
# Plot the distribution of sample means
hist(
  sample_means, 
  breaks = 30, 
  prob   = TRUE, 
  main   = "Sample Means Distribution"
  )

#Adding a theoretical normal curve
curve(
  dnorm(
    x, 
    mean = 1/lambda, 
    sd   = sqrt(1/(lambda^2*sample_size))
    ), 
  add  = TRUE, 
  col  = "blue"
  )

# Set the parameters for the exponential distribution
lambda <- 0.5  

# Number of samples and sample size
num_samples <- 1000  # Number of samples
sample_size <- 30    # Size of each sample

# Create an empty vector to store sample medians
sample_medians <- numeric(num_samples)

# Generate random samples and calculate sample medians
for (i in 1:num_samples) {
  sample <- rexp(sample_size, rate = lambda)  # Generate an exponential sample
  sample_medians[i] <- median(sample)         # Calculate the sample median
}
# Plot the distribution of sample medians
hist(
  sample_medians, 
  breaks = 30, 
  prob   = TRUE, 
  main   = "Sample Medians Distribution")

# Adding a normal distribution curve for comparison
curve(
  dnorm(
        x, 
        mean = 1/lambda, 
        sd   = 1.2533/(lambda*sqrt(sample_size))
        ), 
  add  = TRUE, 
  col  = "blue"
  )

  1. Sample Size: The sample size of 30 in the provided code is suitable for observing a reasonable approximation to a normal distribution, which aligns with the CLT’s expectations for a “sufficiently large” sample size.

  2. Population Distribution: It’s important to note that the exponential distribution used in the code has finite mean and infinite variance. While the CLT may hold to some extent, it may not provide a perfect normal distribution approximation due to the infinite variance.

  3. Representation of Results: As we can see from the histogram and the blue curve (representing the normal distribution based on CLT), the distribution of sample medians becomes approximately normal. This demonstrates the application of the CLT to the sample median, with the sample median distribution approaching normality as the sample size increases.