Objective

Explore the properties of the sample average using a simulated dataset from a non-normal distribution.

Instructions

Generate a simulated dataset: Generate a simulated dataset with 100 observations from a non-normal distribution (e.g., use the rgamma or rpois function).

set.seed(123)
simulated_data <- rgamma(100, shape = 2, rate = 1)

Calculate the sample average: Calculate the sample average of the dataset using R.

sample_mean <- mean(simulated_data)
cat("Sample Mean:", sample_mean, "\n")

## Sample Mean: 1.7215

Evaluate if the sample mean is a good approximation of the true mean and discuss its consistency.
- Consistency of the Sample Mean:
  - As the sample size increases, check if the sample mean converges to the true mean.
  - Perform hypothesis tests or construct confidence intervals using samples of different sizes.
- Law of Large Numbers:
  - Discuss the Law of Large Numbers, which states that as the sample size increases, the sample mean approaches the true mean of the population.
  - Even if the original data come from a non-normal distribution, the law of large numbers suggests that the sample mean becomes a more accurate estimate of the population mean with larger sample sizes.

# Consistency check using samples of different sizes
sample_sizes <- c(10, 100, 1000)
for (size in sample_sizes) {
  sample_data <- rgamma(size, shape = 2, rate = 1)
  sample_mean <- mean(sample_data)
  
  cat("Sample Size:", size, "\t Sample Mean:", sample_mean, "\n")
}

## Sample Size: 10   Sample Mean: 2.187421 
## Sample Size: 100      Sample Mean: 1.905998 
## Sample Size: 1000     Sample Mean: 1.937552

If we want to find out the variance of the sample average, what should we do? Discuss the concept of standard error and how to calculate it.

standard_error <- sd(simulated_data) / sqrt(length(simulated_data))
cat("Standard Error of the Mean:", standard_error, "\n")

## Standard Error of the Mean: 0.1116929

# Variance of the sample average is the square of the standard error
variance_of_sample_mean <- standard_error^2
cat("Variance of the Sample Mean:", variance_of_sample_mean, "\n")

## Variance of the Sample Mean: 0.01247531

Explore the sampling distribution of the mean:
- Repeat the sampling process multiple times.
- Create a histogram of the sample means.

num_samples <- 1000
sample_means <- replicate(num_samples, mean(rgamma(100, shape = 2, rate = 1)))

# Create a histogram of the sample means
hist(sample_means, main = "Sampling Distribution of the Mean", xlab = "Sample Mean", col = "skyblue", border = "black")
abline(v = 2, col = "red", lty = 2, lw = 2)
legend("topright", legend = c("True Mean=2"), col = c("red"), lty = c(2), lw = c(2))

Discuss the shape and properties of the sampling distribution.

Lab 2: Exercise

Objective

Instructions