Analyze the ToothGrowth dataset in R by performing exploratory data analysis and confidence interval testing for the different supplements and dosage.
library(datasets)
df <- ToothGrowth
boxplot(len~dose+supp,data=df,main="Tooth Length by Dosage and Supplement")
At first glance, it appears that higher doses of both supplements are correlated to a higher level of tooth length. The variation in the VC data appears much more separated between each dosage amount, while the OJ data appears to have a higher level of variability denoted by the heavily overlapping “whisker” regions in the boxplot above.
Means and standard deviations for each supplement/dosage combination are provided here…
library(dplyr)
summary <- df %>% group_by(supp,dose) %>% summarize(mean = mean(len),sd = sd(len))
as.data.frame(summary)
## supp dose mean sd
## 1 OJ 0.5 13.23 4.459709
## 2 OJ 1.0 22.70 3.910953
## 3 OJ 2.0 26.06 2.655058
## 4 VC 0.5 7.98 2.746634
## 5 VC 1.0 16.77 2.515309
## 6 VC 2.0 26.14 4.797731
The largest tooth lengths appeared in the 2.0 dosage of both supplements. I will compare these two supplements using the highest dosage for each, to see if one is significantly more effective than the other.
topOJ <- df[df$dose==2.0 & df$supp=="OJ",1]
topVC <- df[df$dose==2.0 & df$supp=="VC",1]
t.test(topOJ,topVC,paired=FALSE)$conf
## [1] -3.79807 3.63807
## attr(,"conf.level")
## [1] 0.95
The data is IDD
The variances across all supplement/dosage combinations are not equal
The different subjects are not paired
The same teeth (or sets of teeth) were measured in each test
Since the confidence interval contains 0, we can not say that the tooth growth amount from the highest dosage of the two treatments are significantly different.
Here I will attempt to discover whether there is a difference within each treatment for that treatment’s two highest doses. I will assume the variance is equal between both dosage levels.
Confidence interval for OJ…
highDose <- df[df$dose==2.0 & df$supp=="OJ",1]
midDose <- df[df$dose==1.0 & df$supp=="OJ",1]
t.test(highDose,midDose,paired=FALSE)$conf
## [1] 0.1885575 6.5314425
## attr(,"conf.level")
## [1] 0.95
Confidence interval for VC…
highDose <- df[df$dose==2.0 & df$supp=="VC",1]
midDose <- df[df$dose==1.0 & df$supp=="VC",1]
t.test(highDose,midDose,paired=FALSE)$conf
## [1] 5.685733 13.054267
## attr(,"conf.level")
## [1] 0.95
The data is IDD
The variances across all supplement/dosage combinations are not equal
The different subjects are not paired
The same teeth (or sets of teeth) were measured in each test
Both groups were deemed significant, due to neither interval containing zero. This means the higher dose did result in higher tooth length. However, the VC interval was far higher than the OJ interval, so while the result was the same for both supplements, the results were much clearer for VC.