data("ToothGrowth")
library("knitr")
library("ggplot2")
library("fitdistrplus")
## Loading required package: MASS
Overview
In this project we will perform a basic analysis to determine if supplements in the form of orange juice or ascorbic acid and dose quantity influence tooth growth. In this context, the null hypothesis, \(H0\), states that there is no significant relationship between orange juice or ascorbic acid or dosage as it relates to tooth growth.
summary(ToothGrowth); str(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
#hist(ToothGrowth$len)
Analysis
Length vs Supplement
We evaluated whether or not there appeared to be an interaction between the dosage level and the length of tooth growth. Given a p-value (p >= 0.05) we would accept the null hypothesis that there is no relationship between length of tooth growth and the supplement taken. However, when comparins length of tooth growth to the dosage of supplement the findings are dissimilar.
#test by 'supp'
oj <- ToothGrowth[ToothGrowth$supp %in% 'OJ', ]
vc <- ToothGrowth[ToothGrowth$supp %in% 'VC', ]
t.test(len ~ supp, paired = FALSE, data = subset(ToothGrowth, ToothGrowth$supp!="Shoe"))
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
Length vs Dosage - 0.5 vs 1.0
As you can see from the very small p-vaue (p <= 0.5) we would reject the null hypothesis. There does appear to be an interaction with tooth growth length and this particular dosage amount of supplement.
#test by 'dose = 0.5' and 1.0
t.test(len ~ dose, paired = FALSE, data = subset(ToothGrowth, ToothGrowth$dose %in% c(.5, 1)))
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
Length vs Dosage - 1.0 vs 2.0
Once again, in this test we observe another small p-value and reject the null hypothesis. There does appear to be an interaction between dosage amount and length.
t.test(len ~ dose, paired = FALSE, data = subset(ToothGrowth, ToothGrowth$dose %in% c(1, 2)))
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
Length vs Dosage - 2.0 vs 0.5
Finally, we observe another smal p-value (p <= 0.05) and reject the null hypothesis. There does appear to be an even stronger interaction between the smallest dosage and the largest dosage (roughly 4x).
t.test(len ~ dose, paired = FALSE, data = subset(ToothGrowth, ToothGrowth$dose %in% c(.5, 2)))
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
Lastly, we present a boxplot of the tooth growth by the delivery method (supplement by dosage), which shows that as the supplement amount doubles so too does the tooth growth amount, up to a certain point. This further supports out statistical analysis of p-values from the t-test.
boxplot(len~supp*dose, data = ToothGrowth, col=c("blue", "goldenrod"), xlab = "Delivery Method by Dose", ylab = "Tooth Growth (mm)")

Conclusion
The goal of the exercise was to explore a data and test for relationships between variables. We determined that there was no interaction between tooth growth and supplement but that indeed there was an interaction between tooth growth and the dosage of either supplement.
Appendix
ggplot(data = ToothGrowth, aes(x = len)) + geom_density() + xlim(-5,45) + geom_vline(xintercept = median(ToothGrowth$len), size = .75, color = "blue")

descdist(ToothGrowth$len)

## summary statistics
## ------
## min: 4.2 max: 33.9
## median: 19.25
## mean: 18.81333
## estimated sd: 7.649315
## estimated skewness: -0.1499519
## estimated kurtosis: 2.045018
#fit.uniform <- fitdist(ToothGrowth$len, distr = "unif")
#plot(fit.uniform)