In this project we will analyze the Tooth Growth Dataset included in the R package and we will attempt to achieve the following goals:
Load the ToothGrowth data and perform some basic exploratory data analyses.
Provide a basic summary of the data.
Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.
State conclusions and the assumptions needed for the conclusions.
library(ggplot2)
data(ToothGrowth)
Data Summary
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
Plotting
ggplot (data=ToothGrowth, aes(x=dose, y=len), fill = supp) +
geom_point(aes(colour = factor(supp))) +
geom_smooth(method="lm", se=FALSE) +
facet_grid(. ~ supp) +
xlab("Dose in miligrams") +
ylab("Tooth length") +
guides(fill=guide_legend(title="Supplement type"))
The above scatterplots show that at higher dosages both orange juice and ascorbic acid increase tooth growth length comparabl, but at lower dosages orange juice increases tooth growth at a higher rate than ascorbic acid. To better understand the data and to back up what we have already seen we will now break the data down by supplement and then by dosage.
s_OJ = subset(ToothGrowth, supp == "OJ")
s_VC = subset(ToothGrowth, supp == "VC")
s_test <- t.test(s_OJ$len, s_VC$len)
print(s_test)
##
## Welch Two Sample t-test
##
## data: s_OJ$len and s_VC$len
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
Given the p value of 0.0606 is greater than 0.05 we cannot reject the null hypothesis, therefore there is no significant tooth growth by supplement across the entire dataset. The confidence interval [-0.171, 7.571] includes 0, which also lends toward the conclusion that there is no significant tooth growth by supplement across the entire dataset
low_t_test <- t.test(subset(s_OJ, dose == 0.5)$len, subset(s_VC, dose == 0.5)$len)
print(low_t_test)
##
## Welch Two Sample t-test
##
## data: subset(s_OJ, dose == 0.5)$len and subset(s_VC, dose == 0.5)$len
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean of x mean of y
## 13.23 7.98
Given the p value of 0.006359 is lower than 0.05 we can reject the null hypothesis, therefore accepting the alternative hypothesis that at a dosage of 0.5mg orange juice results in greater tooth growth than ascorbic acid at the same dose. The confidence interval [1.719057, 8.780943] does not include 0, which also lends toward the conclusion that there is significant tooth growth for orange juice over ascorbic acid for a dose of 0.5mg.
medium_t_test <- t.test(subset(s_OJ, dose == 1.0)$len, subset(s_VC, dose == 1.0)$len)
print(medium_t_test)
##
## Welch Two Sample t-test
##
## data: subset(s_OJ, dose == 1)$len and subset(s_VC, dose == 1)$len
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean of x mean of y
## 22.70 16.77
Given the p value of 0.001038 is lower than 0.05 we can reject the null hypothesis, therefore accepting the alternative hypothesis that at a dosage of 1.0mg orange juice results in greater tooth growth than ascorbic acid at the same dose. The confidence interval [2.802148 9.057852] does not include 0, which also lends toward the conclusion that there is significant tooth growth for orange juice over ascorbic acid for a dose of 1.0mg
high_t_test <- t.test(subset(s_OJ, dose == 2.0)$len, subset(s_VC, dose == 2.0)$len)
print(high_t_test)
##
## Welch Two Sample t-test
##
## data: subset(s_OJ, dose == 2)$len and subset(s_VC, dose == 2)$len
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean of x mean of y
## 26.06 26.14
Given the p value of 0.9639 is greater than 0.05 we cannot reject the null hypothesis, therefore there is no significant tooth growth by supplement for dosages of 2.0mg. The confidence interval [-3.79807, 3.63807] includes 0, which also lends toward the conclusion that there is no significant tooth growth by supplement for dosages of 2.0mg.
After plotting the data and running Two Sample t-tests by both supplement and dosage we can say with confidence that at lower dosages (0.5mg and 1.0mg) orange juice increases tooth growth over ascorbic acid, but at a dosage of 2.0mg there is no significant difference in tooth growth between orange juince and ascorbic acid.