We simulate 1000 iterations of 40 exponentials by using replicate and then calculating the mean of the simulation. In comparison to the theoretical mean we derive with the constants lambda <- 0.2; noSim <- 1000; and n <- 40 (lambda is set to 0.2, The number of simulations we want is 1000, and the the number of exponents we want in each simulation is 40). Our experiment shows there is not much difference in the theoretical mean of the constants provided and the simulated mean. We also show that the variances are not much different and that the distribution of the simulation is approximately normal.
# Set seed for reproducibility of random number generation
# Set relevant variables: lambda, noSim (number of simulations), and n(number of exponentials)
# per above instructions
set.seed(999); lambda <- 0.2; noSim <- 1000; n <- 40
# Simulation: using replicate() we iterate 1000 simulations, take the means of each of the vectors,
# and then the mean of that resulting (mExp) vector.
sExp <- replicate(noSim, rexp(n,lambda))
mExp <- colMeans(sExp)
sMean <- mean(mExp)
#set theoretical mean
tMean <- 1/lambda
q1Answer <- data.frame('Theoretical Mean' = tMean,
'Simulated Mean' = sMean)
print(q1Answer)
## Theoretical.Mean Simulated.Mean
## 1 5 5.029028
From the above, we see there is not a large difference between the theoretical mean and the simulated mean.
hist(mExp, breaks = 40, xlab = "Mean", main = "1000 simulations of 40 exponentials", col = "orange")
abline(v = sMean, col = "blue")
abline(v = tMean, col = "red")
# Simulated Standard Deviation and Variance
sSD <- sd(mExp)
sVariance <- sSD^2
# Theoretical Standard Deviation and Variance
tSD <- (1/lambda)/sqrt(n)
tVariance <- tSD^2
kable(data.frame("Simulated Variance" = sVariance,
"Theoretical Variance" = tVariance))
| Simulated.Variance | Theoretical.Variance |
|---|---|
| 0.605616 | 0.625 |
We see here that the simulated variance is 0.605616 versus the theoretical variance of 0.625.
# Plot Histogram
hist(mExp, breaks = 40, xlab = "Mean", main = "Comparison to a Normal Distribution", col = "green")
xfit <- seq(min(mExp), max(mExp), length = 100)
yfit <- dnorm(xfit, mean = tMean, sd = tSD)
lines(xfit, yfit*200, pch = 20, col = 6, lwd = 1.5)
Taking our same distribution histogram from Q# 2.) we fit a line representing the normal distribution and can visually see that our simulation distribution is approximately normal shaped.
We conclude that the mean of the vector means for 1000 simulations is close to the theoretical mean we derive under the assumption of lambda equaling 0.2.
Now in the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.
The ToothGrowth dataset contains 60 observations of 3 variables on The Effects of Vitamin C on Tooth Growth in Guinea Pigs. We first use boxplots to visually explore the dataset, and then we conduct Hypothesis tests on the null Hypothesis that the mean of len (length of odontoblasts) for two samples is the same, where the two samples are the Guinea Pigs givensupp == 'VC' (Vitamin C) and those given supp == 'OJ' (Orange Juice).
Our Exploratory charting below reveals that the means look different for the two groups; however, when grouping by dose (dosage of 0.5, 1.0, or 2.0), our graphs reveal a similarity in mean for the higher dose of 2.0. Our hypothesis testing confirms our results as we see p-values that are significantly less than .05 for our 95% confidence interval when looking at just the two groups (supp == 'VC' vs supp == 'OJ') and for the two groups for the lower doses of dose == 0.5 & dose == 1.0. We get a higher p-value for the test on the two groups where dose == 2.0. Thus we reject the null Hypothesis in all cases except when dose == 2.0 where we fail to reject the null hypothesis.
data("ToothGrowth")
TG <- ToothGrowth
summary(TG)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
str(TG)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
ggplot(TG,aes(x=supp,y=len,fill=supp))+
geom_boxplot(alpha = 0.88)
We can see from the above, that the means for OJ vs VC as the supplement look very different. If we were going on look alone, we would reject our null Hypothesis and conclude the means as dissimilar.
ggplot(TG, aes(x = supp, y = len, fill = supp))+
geom_boxplot(alpha = 0.76)+
facet_grid(.~dose)
When we look at the means for OJ vs VC while considering dosage as a factor, we visually conclude the means as dissimilar for the 0.5 and 1.0 dosage levels; however, we can see there is more similarity in the groups’ means at the higher dosage level of 2.0.
supp.test <- t.test(len ~ supp, data= TG, var.equal = FALSE, paired=FALSE ,conf.level = .95)
testprint <- kable(data.frame(row.names = "By Supplement",
"t.stat" = test$statistic,
"df" = test$parameter,
"p-value" = test$p.value,
"lower" = test$conf.int[1],
"upper" = test$conf.int[2],
"OJ mean" = test$estimate[1],
"VC mean" = test$estimate[2]), caption = test$method)
testprint
| t.stat | df | p.value | lower | upper | OJ.mean | VC.mean | |
|---|---|---|---|---|---|---|---|
| By Supplement | 1.915268 | 55.30943 | 0.0606345 | -0.1710156 | 7.571016 | 20.66333 | 16.96333 |
In this case, we reject the null Hypothesis as our p-value: 0.0606345 is < 0.5.
test1 <- t.test(len ~ supp, data=TG[TG$dose ==0.5,], var.equal = FALSE, paired=FALSE ,conf.level = .95)
test2 <- t.test(len ~ supp, data=TG[TG$dose ==1.0,], var.equal = FALSE, paired=FALSE ,conf.level = .95)
test3 <- t.test(len ~ supp, data=TG[TG$dose ==2.0,], var.equal = FALSE, paired=FALSE ,conf.level = .95)
test1print <- kable(data.frame(row.names = "By Supplement, dose = 0.5",
"t.stat" = test1$statistic,
"df" = test1$parameter,
"p-value" = test1$p.value,
"lower" = test1$conf.int[1],
"upper" = test1$conf.int[2],
"OJ mean" = test1$estimate[1],
"VC mean" = test1$estimate[2]), caption = test$method)
test2print <- kable(data.frame(row.names = "By Supplement, dose = 1.0",
"t.stat" = test2$statistic,
"df" = test2$parameter,
"p-value" = test2$p.value,
"lower" = test2$conf.int[1],
"upper" = test2$conf.int[2],
"OJ mean" = test2$estimate[1],
"VC mean" = test2$estimate[2]), caption = test$method)
test3print <- kable(data.frame(row.names = "By Supplement, dose = 2.0",
"t.stat" = test3$statistic,
"df" = test3$parameter,
"p-value" = test3$p.value,
"lower" = test3$conf.int[1],
"upper" = test3$conf.int[2],
"OJ mean" = test3$estimate[1],
"VC mean" = test3$estimate[2]), caption = test$method)
Results
| t.stat | df | p.value | lower | upper | OJ.mean | VC.mean | |
|---|---|---|---|---|---|---|---|
| By Supplement, dose = 0.5 | 3.169733 | 14.96875 | 0.0063586 | 1.719057 | 8.780943 | 13.23 | 7.98 |
In this case, we reject the null Hypothesis as our p-value: 0.0063586 is < 0.5.
| t.stat | df | p.value | lower | upper | OJ.mean | VC.mean | |
|---|---|---|---|---|---|---|---|
| By Supplement, dose = 1.0 | 4.03277 | 15.35767 | 0.0010384 | 2.802148 | 9.057852 | 22.7 | 16.77 |
In this case, we reject the null Hypothesis as our p-value: 0.0010384 is < 0.5.
| t.stat | df | p.value | lower | upper | OJ.mean | VC.mean | |
|---|---|---|---|---|---|---|---|
| By Supplement, dose = 2.0 | -0.0461361 | 14.03982 | 0.9638516 | -3.798071 | 3.63807 | 26.06 | 26.14 |
In this case, we FAIL to reject the null Hypothesis as our p-value: 0.9638516 is > 0.5.
We conclude that our null Hypothesis is one we would reject when comparing the sample means of two groups OJ vs. VC and confirm our visual analysis above. Furthermore, we would also reject our null Hypothesis when segmenting the VC and OJ groups by dosage level administered WHEN dosage = 0.5 or dosage = 1.0. Only in the case of dosage = 2.0 do we fail to reject our null Hypothesis.
Our assumptions in these tests are that the groups (by supplement, and by supplement/dose) of individual subjects (guinea pigs) are independent and also have different variances (note we setvar.equal = FALSE in the above ttests.)