Overview In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
n <- 40
lambda <- 0.2
simulations <- 1000 # Do 1000 simulations
sampleMeans = NULL # Sample means
for(i in 1:simulations) {
sampleMeans <- c(sampleMeans, mean(rexp(n, lambda)))
}
This is the sample mean.
mean1 = mean(sampleMeans) # Mean of sample means
mean1
## [1] 5.003032
This is the theoretic mean.
mean2 = 1/lambda # Theoretic mean
mean2
## [1] 5
The theoretic mean and the sample mean are close.
hist(sampleMeans,
main = "Distribution of Sample Means",
xlab = "Sample Mean",
nclass = 50,
col = "lightblue")
abline(v = mean1, col = "blue", lwd = 2)
abline(v = mean2, col = "red", lwd = 2)
This is the sample variance.
var1 = var(sampleMeans); var1
## [1] 0.6699872
This is the theoretical variance.
var2 = 1/lambda; var2
## [1] 5
The sample variance is a little larger than the theoretical variance.
The blue line is a normal distribution with that the mean equals to the sample mean, and the standard deviation equals to the sample standard deviation. The sample distribution is approximately normal.
par(mfrow = c(1,2))
hist(sampleMeans,
probability = T,
main = "Distribution of Sample Means",
xlab = "Sample Mean",
nclass = 50,
col = "lightblue")
abline(v = mean1, col = "blue", lwd = 2)
curve(dnorm(x, mean1, sd=var1^0.5),
add=TRUE, col="blue", lwd=2) # Add normal distribution line
qqnorm(sampleMeans, col="lightblue"); qqline(sampleMeans, col="blue", lwd=2)
Overview Now in the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.
Dataset Description The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
head(ToothGrowth)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
par(mfrow = c(1,2))
hist(ToothGrowth$len, nclass = 50)
hist(ToothGrowth$dose, nclass = 50)
The dose and teeth growth are very positively linearly related.
ToothGrowth_VC <- subset(ToothGrowth, ToothGrowth[,"supp"] == "VC")
ToothGrowth_OJ <- subset(ToothGrowth, ToothGrowth[,"supp"] == "OJ")
cor(ToothGrowth$len, ToothGrowth$dose)
## [1] 0.8026913
cor(ToothGrowth_VC$len, ToothGrowth_VC$dose)
## [1] 0.8989722
cor(ToothGrowth_OJ$len, ToothGrowth_OJ$dose)
## [1] 0.7500585
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))
plot(ToothGrowth$dose, ToothGrowth$len,
main = "Length vs. Dose, Overall", xlab = "Length", ylab = "Dose")
abline(lm(ToothGrowth$len ~ ToothGrowth$dose), col="red")
#par(mfrow = c(1,2))
plot(ToothGrowth_VC$dose, ToothGrowth_VC$len,
main = "Length vs. Dose, VC", xlab = "Length", ylab = "Dose")
abline(lm(ToothGrowth_VC$len ~ ToothGrowth_VC$dose), col="red")
plot(ToothGrowth_OJ$dose, ToothGrowth_OJ$len,
main = "Length vs. Dose, OJ", xlab = "Length", ylab = "Dose")
abline(lm(ToothGrowth_OJ$len ~ ToothGrowth_OJ$dose), col="red")
Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)
H0: Orange juice doesn’t cause more tooth growth than ascorbic acid. Accepted.
t.test(len ~ supp, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
H0: At dose = 0.5, orange juice doesn’t cause more tooth growth than ascorbic acid. Rejected.
t.test(len ~ supp, data = subset(ToothGrowth, dose == 0.5))
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
H0: At dose = 1, orange juice doesn’t cause more tooth growth than ascorbic acid. Rejected.
t.test(len ~ supp, data = subset(ToothGrowth, dose == 1))
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
H0: At dose = 2, orange juice doesn’t cause more tooth growth than ascorbic acid. Accepted.
t.test(len ~ supp, data = subset(ToothGrowth, dose == 2))
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14