This project investigates the distribution of the sample mean of 40 exponential random variables using a simulation in R. It compares the simulated sample mean and variance to their theoretical values and demonstrates that the sampling distribution of the mean is approximately normal, as predicted by the Central Limit Theorem.
#given parameters from course
lambda <- 0.2
n <- 40
simulations <- 1000
#get means of exponential distribution
set.seed(42)
means = NULL
for (i in 1 : simulations){
means <- c(means, mean(rexp(n, lambda)))
}
#get theoretical values
theoretical_mean <- 1 / lambda
theoretical_sd <- (1 / lambda) / sqrt(n)
theoretical_var <- theoretical_sd^2
#get sample results
sample_mean <- mean(means)
sample_var <- var(means)
paste("Theoretical mean:", theoretical_mean)
[1] "Theoretical mean: 5"
paste("Simulated mean via sample", round(sample_mean,2))
[1] "Simulated mean via sample 4.99"
hist(means, breaks = 40, prob = TRUE,
main = "Distribution of sample means",
xlab = "Sample Mean",
xlim = c(1.5,8.5))
abline(v = theoretical_mean, col = "red", lwd = 2, lty = 2)
abline(v = sample_mean, col = "blue", lwd = 2)
legend("topright",
legend = c("Theoretical mean", "Sample mean"),
col = c("red", "blue"),
lwd = 2,
lty = c(2,1))
paste("Theoretical variance:", theoretical_var)
[1] "Theoretical variance: 0.625"
paste("Simulated variance via sample:", round(sample_var,2))
[1] "Simulated variance via sample: 0.63"
hist(means, breaks = 40, prob = TRUE,
main = "Distribution of means",
xlab = "Mean",
xlim = c(1.5,8.5))
curve(dnorm(x, mean = theoretical_mean, sd = theoretical_sd),
col = "red", lwd = 2, add = TRUE)
In red we see the corresponding normal distribution. Our samples follow this curve quite nicely.
According to the ToothGrowth Documentation:
The Effect of Vitamin C on Tooth Growth in Guinea Pigs Description The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
Format:
A data frame with 60 observations on 3 variables.
[,1] len numeric Tooth length
[,2] supp factor Supplement type (VC or OJ)
[,3] dose numeric Dose in milligrams/day
summary(ToothGrowth)
len supp dose
Min. : 4.20 OJ:30 Min. :0.500
1st Qu.:13.07 VC:30 1st Qu.:0.500
Median :19.25 Median :1.000
Mean :18.81 Mean :1.167
3rd Qu.:25.27 3rd Qu.:2.000
Max. :33.90 Max. :2.000
If we compare the two columns supp and dose we see that the two supplements OJ and VC were given in comparable doses and replicates:
table(ToothGrowth$dose[which(ToothGrowth$supp == "OJ")])
0.5 1 2
10 10 10
table(ToothGrowth$dose[which(ToothGrowth$supp == "VC")])
0.5 1 2
10 10 10
If we plot the tooth growth length density according to the supplements we get two similar, but not overlapping curves.
plot(density(ToothGrowth$len[which(ToothGrowth$supp == "OJ")]),
xlim = c(-10,50),
col = "red",
lwd = 2,
main = "Density plot denpending on supp")
points(density(ToothGrowth$len[which(ToothGrowth$supp == "VC")]),
type = "l",
col = "blue",
lwd = 2)
legend("topright",
legend = c("OJ", "VC"),
lty = 1,
col = c("blue", "red"),
lwd = 2)
If we split up the tooth growth length into dose and supplements we see an increase in length depending on the supplement that is more clear at lower dosage levels.
boxplot(ToothGrowth$len ~ ToothGrowth$supp*ToothGrowth$dose,
frame = FALSE,
col = c("white", "steelblue"),
names = c('OJ 0.5', 'VC 0.5', 'OJ 1', 'VC 1', 'OJ 2', 'VC 2'),
xlab = "",
ylab = "Length")
t.test(len ~ supp, data = ToothGrowth)
Welch Two Sample t-test
data: len by supp
t = 1.9153, df = 55.309, p-value = 0.06063
alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
95 percent confidence interval:
-0.1710156 7.5710156
sample estimates:
mean in group OJ mean in group VC
20.66333 16.96333
We do not see a statistical difference between the two supplement groups OJ and VC at alpha = 0.05. But as shown above there might be a dose dependency. For this we perform multiple tests and adjust the resulting p-values accordingly
t1 <- t.test(len ~ supp, data = ToothGrowth[ToothGrowth$dose == 0.5, ])
t2 <- t.test(len ~ supp, data = ToothGrowth[ToothGrowth$dose == 1.0, ])
t3 <- t.test(len ~ supp, data = ToothGrowth[ToothGrowth$dose == 2.0, ])
p_raw <- c(t1$p.value, t2$p.value, t3$p.value)
# BH correction due to multiple testing
p_adj <- p.adjust(p_raw, method = "BH")
#
data.frame(
Dose = c(0.5, 1.0, 2.0),
Raw_p = round(p_raw, 4),
Adjusted_p_BH = round(p_adj, 4)
)
With adjusted p-values of 0.0095 and 0.0031 we see strong significance of increased tooth growth length at lower dosages depending on the type of supplement. At concentrations of 0.5 and 1 mg/day orange juice seems to enhance tooth growth compared to ascorbic acid.