Overview

This report contains two parts. The first part investigates the behavior of the sample mean of exponential distributions and demonstrates the Central Limit Theorem (CLT) using simulations in R. The second part performs inferential data analysis on the ToothGrowth dataset available in R.

Part 1: Simulation Exercise – Exponential Distribution & CLT

Simulation Setup

lambda <- 0.2
n <- 40
sims <- 1000
set.seed(123)
means <- replicate(sims, mean(rexp(n, lambda)))

Sample Mean vs Theoretical Mean

sample_mean <- mean(means)
theoretical_mean <- 1 / lambda

hist(means, main="Distribution of Sample Means", xlab="Sample Means", col="lightblue", breaks=40)
abline(v=sample_mean, col="blue", lwd=2)
abline(v=theoretical_mean, col="red", lwd=2, lty=2)
legend("topright", legend=c("Sample Mean", "Theoretical Mean"), col=c("blue", "red"), lty=c(1,2))

Sample Mean: 5.012
Theoretical Mean: 5

Sample Variance vs Theoretical Variance

sample_variance <- var(means)
theoretical_variance <- (1 / lambda)^2 / n

sample_variance
## [1] 0.6004928
theoretical_variance
## [1] 0.625

Sample Variance: 0.6
Theoretical Variance: 0.625

Distribution Normality Check

qqnorm(means)
qqline(means, col = "red")

The QQ plot shows that the sample means follow a nearly straight line, indicating approximate normality as expected from the Central Limit Theorem.

Part 2: ToothGrowth Dataset Analysis

Load and Explore Data

data("ToothGrowth")
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
boxplot(len ~ supp * dose, data = ToothGrowth, main = "Tooth Length by Supplement and Dose", col="orange")

Compare Tooth Growth by Supplement

t.test(len ~ supp, data = ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

OJ supplement shows higher average tooth length than VC, with a statistically significant difference.

Compare Tooth Growth by Dose Levels

dose_0.5 <- subset(ToothGrowth, dose == 0.5)
dose_1.0 <- subset(ToothGrowth, dose == 1.0)
dose_2.0 <- subset(ToothGrowth, dose == 2.0)

t.test(dose_0.5$len, dose_1.0$len)
## 
##  Welch Two Sample t-test
## 
## data:  dose_0.5$len and dose_1.0$len
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean of x mean of y 
##    10.605    19.735
t.test(dose_1.0$len, dose_2.0$len)
## 
##  Welch Two Sample t-test
## 
## data:  dose_1.0$len and dose_2.0$len
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean of x mean of y 
##    19.735    26.100
t.test(dose_0.5$len, dose_2.0$len)
## 
##  Welch Two Sample t-test
## 
## data:  dose_0.5$len and dose_2.0$len
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean of x mean of y 
##    10.605    26.100

Tooth length increases with dose, and the differences between each dose level are statistically significant.

Assumptions

  • The observations are independent.
  • Tooth length is approximately normally distributed in each group.
  • The variances between groups are roughly equal (homoscedasticity).

Conclusion

This report demonstrated how the Central Limit Theorem applies to the distribution of averages from exponential data and performed hypothesis testing on the ToothGrowth dataset. The results confirmed theoretical expectations in both cases.