Objective

Explore the properties of the sample average using a simulated dataset from a non-normal distribution.

Instructions

  1. Generate a simulated dataset: Generate a simulated dataset with 100 observations from a non-normal distribution (e.g., use the rgamma or rpois function).
set.seed(123)
simulated_data <- rgamma(100, shape = 2, rate = 1)
  1. Calculate the sample average: Calculate the sample average of the dataset using R.
sample_mean <- mean(simulated_data)
cat("Sample Mean:", sample_mean, "\n")
## Sample Mean: 1.7215
  1. Evaluate if the sample mean is a good approximation of the true mean and discuss its consistency.

    • Consistency of the Sample Mean:
      • As the sample size increases, check if the sample mean converges to the true mean.
      • Perform hypothesis tests or construct confidence intervals using samples of different sizes.
    • Law of Large Numbers:
      • Discuss the Law of Large Numbers, which states that as the sample size increases, the sample mean approaches the true mean of the population.
      • Even if the original data come from a non-normal distribution, the law of large numbers suggests that the sample mean becomes a more accurate estimate of the population mean with larger sample sizes.
# Consistency check using samples of different sizes
sample_sizes <- c(10, 100, 1000)
for (size in sample_sizes) {
  sample_data <- rgamma(size, shape = 2, rate = 1)
  sample_mean <- mean(sample_data)
  
  cat("Sample Size:", size, "\t Sample Mean:", sample_mean, "\n")
}
## Sample Size: 10   Sample Mean: 2.187421 
## Sample Size: 100      Sample Mean: 1.905998 
## Sample Size: 1000     Sample Mean: 1.937552
  1. If we want to find out the variance of the sample average, what should we do? Discuss the concept of standard error and how to calculate it.
standard_error <- sd(simulated_data) / sqrt(length(simulated_data))
cat("Standard Error of the Mean:", standard_error, "\n")
## Standard Error of the Mean: 0.1116929
# Variance of the sample average is the square of the standard error
variance_of_sample_mean <- standard_error^2
cat("Variance of the Sample Mean:", variance_of_sample_mean, "\n")
## Variance of the Sample Mean: 0.01247531
  1. Explore the sampling distribution of the mean:
    • Repeat the sampling process multiple times.
    • Create a histogram of the sample means.
num_samples <- 1000
sample_means <- replicate(num_samples, mean(rgamma(100, shape = 2, rate = 1)))

# Create a histogram of the sample means
hist(sample_means, main = "Sampling Distribution of the Mean", xlab = "Sample Mean", col = "skyblue", border = "black")
abline(v = 2, col = "red", lty = 2, lw = 2)
legend("topright", legend = c("True Mean=2"), col = c("red"), lty = c(2), lw = c(2))

Discuss the shape and properties of the sampling distribution.