This report contains two parts. The first part investigates the behavior of the sample mean of exponential distributions and demonstrates the Central Limit Theorem (CLT) using simulations in R. The second part performs inferential data analysis on the ToothGrowth dataset available in R.
lambda <- 0.2
n <- 40
sims <- 1000
set.seed(123)
means <- replicate(sims, mean(rexp(n, lambda)))
sample_mean <- mean(means)
theoretical_mean <- 1 / lambda
hist(means, main="Distribution of Sample Means", xlab="Sample Means", col="lightblue", breaks=40)
abline(v=sample_mean, col="blue", lwd=2)
abline(v=theoretical_mean, col="red", lwd=2, lty=2)
legend("topright", legend=c("Sample Mean", "Theoretical Mean"), col=c("blue", "red"), lty=c(1,2))
Sample Mean: 5.012
Theoretical Mean: 5
sample_variance <- var(means)
theoretical_variance <- (1 / lambda)^2 / n
sample_variance
## [1] 0.6004928
theoretical_variance
## [1] 0.625
Sample Variance: 0.6
Theoretical Variance: 0.625
qqnorm(means)
qqline(means, col = "red")
The QQ plot shows that the sample means follow a nearly straight line, indicating approximate normality as expected from the Central Limit Theorem.
data("ToothGrowth")
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
boxplot(len ~ supp * dose, data = ToothGrowth, main = "Tooth Length by Supplement and Dose", col="orange")
t.test(len ~ supp, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
OJ supplement shows higher average tooth length than VC, with a statistically significant difference.
dose_0.5 <- subset(ToothGrowth, dose == 0.5)
dose_1.0 <- subset(ToothGrowth, dose == 1.0)
dose_2.0 <- subset(ToothGrowth, dose == 2.0)
t.test(dose_0.5$len, dose_1.0$len)
##
## Welch Two Sample t-test
##
## data: dose_0.5$len and dose_1.0$len
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean of x mean of y
## 10.605 19.735
t.test(dose_1.0$len, dose_2.0$len)
##
## Welch Two Sample t-test
##
## data: dose_1.0$len and dose_2.0$len
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean of x mean of y
## 19.735 26.100
t.test(dose_0.5$len, dose_2.0$len)
##
## Welch Two Sample t-test
##
## data: dose_0.5$len and dose_2.0$len
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean of x mean of y
## 10.605 26.100
Tooth length increases with dose, and the differences between each dose level are statistically significant.
This report demonstrated how the Central Limit Theorem applies to the distribution of averages from exponential data and performed hypothesis testing on the ToothGrowth dataset. The results confirmed theoretical expectations in both cases.