This is an assignment for the Coursera course Statistical Inference. The goal of this assignment is to use exploratory data analysis to evaluate tooth growth data from the R datasets.
The data from this experiment comes from the R datasets package. The response is the length of teeth in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
This analysis shows that tooth length of guinea pigs who receive different doses of Vitamin C are different with a significance level of 0.05. However, there is insufficient evidence to conclude that Orange Juice works differently than ascorbic acid supplements.
The official assignment:
Now in the second portion of the class, we’re going to analyze the ToothGrowth data in the R datasets package.
This data is part of the base R installation. Let’s call the data and check it out. There appears to be some interesting interactions between the delivery method and the dose. For the purposes of this course I will analyze the variables separately.
data(ToothGrowth)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
library(ggplot2)
ggplot(ToothGrowth, aes(x=dose, y=len)) + ggtitle("ToothGrowth Length vs Dose by Delivery Method") + geom_boxplot(aes(fill=factor(dose))) + geom_jitter() + facet_grid(.~supp)
How do the different treatments affect tooth length? We can use a standard t test to compare the delivery methods. In this case, the 95% confidence interval contains 0, so we cannot reject the hypothesis that there is no difference between the methods.
## Comparing len by supp
dat_OJ <- ToothGrowth[ToothGrowth$supp=="OJ","len"]
dat_VC <- ToothGrowth[ToothGrowth$supp=="VC","len"]
t.test(dat_OJ,dat_VC)
##
## Welch Two Sample t-test
##
## data: dat_OJ and dat_VC
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
Next let’s look at the various doses. I will assume that 1 mg is the standard dose achieved in the guinea pig’s natural diet. We want to know whether an increase or decrease in dose will change tooth growth.
First, we compare the standard dose of 1 mg with the lower dose of 0.5 mg. The 95% confidence interval for the difference between these two doses is 6.28 to 11.98. This does not contain 0, so we can reject the null hypothesis that the two methods are the same. It appears that a dose of 1 mg causes more tooth growth than a dose of 0.5 mg.
## Comparing len by dose
dat_std_dose <- ToothGrowth[ToothGrowth$dose==1,"len"]
dat_low_dose <- ToothGrowth[ToothGrowth$dose==0.5,"len"]
dat_high_dose <- ToothGrowth[ToothGrowth$dose==2,"len"]
t.test(dat_std_dose,dat_low_dose)
##
## Welch Two Sample t-test
##
## data: dat_std_dose and dat_low_dose
## t = 6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 6.276219 11.983781
## sample estimates:
## mean of x mean of y
## 19.735 10.605
Finally, we compare the standard dose of 1 mg with the higher dose of 2 mg. The 95% confidence interval for the difference between these two doses is -9.00 to -3.73. This does not contain 0, so we can reject the null hypothesis that the two methods are the same. It appears that a dose of 2 mg causes more tooth growth than the standard dose of 1 mg.
t.test(dat_std_dose,dat_high_dose)
##
## Welch Two Sample t-test
##
## data: dat_std_dose and dat_high_dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean of x mean of y
## 19.735 26.100
This analysis assumes that the experiment was suitably randomized to prevent biasing the results. If for example, guinea pigs with faster tooth growth were pre-selected for one group of the experiment, then these results are incorrect. Also, for analyzing the impact of delivery method, I assumed that dose is not a significant variable. This is obviously false. However, I analyzed the data this way to show how other factors can influence the test results.
This simple analysis shows that tooth length of guinea pigs who receive different doses of Vitamin C are different with a significance level of 0.05. However, there is insufficient evidence to conclude that Orange Juice works differently than ascorbic acid supplements.