Xiangzhu Long
We’re going to analyze the ToothGrowth data in the R datasets package and make a basic summary of the data, like using confidence intervals and hypothesis tests to compare tooth growth by supp and dose.
# Load dependencies
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
set.seed(7)
# Load the ToothGrowth data
data(ToothGrowth)
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
# Comparing the data sets
ggplot(ToothGrowth, aes(x = len, fill = supp)) +
facet_wrap(~dose) + scale_fill_brewer(palette = 'Set1') +
xlab('tooth length') + ggtitle('Distribution of Guinea_pig Tooth Lengths') +
geom_histogram()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
# compare tooth growth by supp and dose.
supp.t1 <- t.test(len~supp, paired=F, var.equal=T, data=ToothGrowth)
supp.t2 <- t.test(len~supp, paired=F, var.equal=F, data=ToothGrowth)
supp.result <- data.frame("p-value"=c(supp.t1$p.value, supp.t2$p.value),
"Conf-Low"=c(supp.t1$conf[1],supp.t2$conf[1]),
"Conf-High"=c(supp.t1$conf[2],supp.t2$conf[2]),
row.names=c("Equal Var","Unequal Var"))
supp.result
## p.value Conf.Low Conf.High
## Equal Var 0.06039337 -0.1670064 7.567006
## Unequal Var 0.06063451 -0.1710156 7.571016
Based on the sample data provided:
1.At lower dosages (.5 Mg - 1 Mg), orange juice provides more tooth growth than ascorbic acid.
2.At the higher dosage (2 Mg), the rate of tooth growth is not statistically different between supplement methods.
3.Regardless of the supplement method, dosage is a key factor in tooth growth.