Examining Tooth Growth Data in R
One of the standard learning data sets included in R is the “ToothGrowth” data set. The tooth growth data set is the length of the odontoblasts (teeth) in each of 10 guinea pigs at three Vitamin C dosage levels (0.5, 1, and 2 mg) with two delivery methods (orange juice or ascorbic acid).
The file contains 60 observations of 3 variables
We set out to answer to do the following:
Let’s first request a summary of the data:
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
We see from the summary data the minimum, and maximum tooth length, the minimum and maximum dosage, and that half of the cases received the dose via the VC method and the other half the OJ method.
Let’s plot what the data looks like where tooth length is a function of dosage, and color code for the type of supplement used
From the graph it appears that as as dosage increases, the tooth length also increases. It also appears that for lower dosages (0.5 and 1.0) OJ delivery leads to more tooth growth than VC delivery.
We need to use confidence intervals to investigate if that is actually true.
In this summary I will show the mean length (lenmean), length standard deviation (lensd), and the number of observations (count). We will summarize the data three ways.
First let’s examine by dosage and supplement type.
## Source: local data frame [6 x 5]
## Groups: supp
##
## supp dose lenmean lensd count
## 1 OJ 0.5 13.23 4.459709 10
## 2 OJ 1.0 22.70 3.910953 10
## 3 OJ 2.0 26.06 2.655058 10
## 4 VC 0.5 7.98 2.746634 10
## 5 VC 1.0 16.77 2.515309 10
## 6 VC 2.0 26.14 4.797731 10
Second let’s summarize by supplement type only (ignoring dosage)
## Source: local data frame [2 x 4]
##
## supp lenmean lensd count
## 1 OJ 20.66333 6.605561 30
## 2 VC 16.96333 8.266029 30
Third let’s summarize by dosage only (ignoring supplement type)
## Source: local data frame [3 x 4]
##
## dose lenmean lensd count
## 1 0.5 10.605 4.499763 20
## 2 1.0 19.735 4.415436 20
## 3 2.0 26.100 3.774150 20
It looks like vitamin C is related to tooth growth, generally. It also appears that OJ is a better supplement method than VC
For all confidence intervals, we will use a 95% confidence.
First let’s compare to see if there is a difference between OJ and VC at all dosage levels at once.
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
It appears that there is no difference between the two delivery methods when all the data is lumped together, since the confidence interval of the difference in means includes 0. The p-value here is 0.06, so it was very close to being significant. But close only counts in horseshoes and hand grenades.
Let’s subdivide the data and see if there’s a difference between OJ and VC at different dosage levels. We will see if there is a difference at the 0.5, 1.0, and 2.0 mg levels.
At the 0.5 mg dosage is there a difference between VC and OJ?
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
Yes, there is! The confidence interval does not include zero, so we can conclude that there is a significant difference between VC and OJ supplement methods at the 0.5 mg dosage.
At the 1.0 mg dosage is there a difference between VC and OJ?
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
Yes, there is! The confidence interval does not include zero, so we can conclude that there is a significant difference between VC and OJ supplement methods at the 1.0 mg dosage.
At the 2.0 mg dosage is there a difference between VC and OJ?
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
No, there is not! The confidence interval includes zero, so we can conclude that there is no significant difference between VC and OJ supplement methods at the 2.0 mg dosage.
Now let’s compare to see if there is a significant difference between dosage levels. We will examine OJ and VC seperately.
We will compare if a 0.5 mg dose via OJ is significant different than a 1.0 mg dose via OJ.
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.415634 -5.524366
## sample estimates:
## mean in group 0.5 mean in group 1
## 13.23 22.70
The confidence interval does not include zero, so we conclude there is a significant difference between a 0.5mg dose and a 1.0 mg dose via OJ
We will compare if a 1.0 mg dose via OJ is significant different than a 2.0 mg dose via OJ.
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.5314425 -0.1885575
## sample estimates:
## mean in group 1 mean in group 2
## 22.70 26.06
The confidence interval does not include zero, so we conclude there is a significant difference between a 1.0mg dose and a 2.0 mg dose via OJ
We will compare if a 0.5 mg dose via VC is significant different than a 1.0 mg dose via VC.
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.265712 -6.314288
## sample estimates:
## mean in group 0.5 mean in group 1
## 7.98 16.77
The confidence interval does not include zero, so we conclude there is a significant difference between a 0.5mg dose and a 1.0 mg dose via VC
We will compare if a 1.0 mg dose via VC is significant different than a 2.0 mg dose via VC.
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.054267 -5.685733
## sample estimates:
## mean in group 1 mean in group 2
## 16.77 26.14
The confidence interval does not include zero, so we conclude there is a significant difference between a 1.0mg dose and a 2.0 mg dose via VC
Based off this data we can conclude the following:
Assumptions
# Load package for plotting, and for data analysis
library(ggplot2)
library(dplyr)
# Load data
data(ToothGrowth)
# Get data summary
summary(ToothGrowth)
# Plot the length (y) by the dosage (x)
g <- ggplot(ToothGrowth, aes(x= dose, y= len)) +
geom_point(aes(color=supp))
print(g)
# Summarize by dose and supp, the mean length of growth.
a <- ToothGrowth %>%
group_by(supp,dose) %>%
summarize(lenmean=mean(len), lensd=sd(len), count = n())
print(a)
# Summarize by supp only.
b <- ToothGrowth %>%
group_by(supp) %>%
summarize(lenmean=mean(len), lensd=sd(len), count = n())
print(b)
# Summarize by dose only.
c <- ToothGrowth %>%
group_by(dose) %>%
summarize(lenmean=mean(len), lensd=sd(len), count = n())
print(c)
# Compare OJ to VC at all dosage levels
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth)
# Compare low dosage OJ and VC
lowdose <- ToothGrowth[ToothGrowth$dose==0.5, ]
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=lowdose)
# Compare mid dosage OJ and VC
middose <- ToothGrowth[ToothGrowth$dose==1.0, ]
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=middose)
# Compare high dosage OJ and VC
highdose <- ToothGrowth[ToothGrowth$dose==2.0, ]
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=highdose)
# Compare 0.5 to 1.0 via OJ
OJlowtomid <- filter(ToothGrowth, dose < 2, supp=="OJ")
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=OJlowtomid)
# Compare 1.0 to 2.0 via OJ
OJmidtohigh <- filter(ToothGrowth, dose > 0.5, supp=="OJ")
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=OJmidtohigh)
# Compare 0.5 to 1.0 via VC
VClowtomid <- filter(ToothGrowth, dose < 2, supp=="VC")
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=VClowtomid)
# Compare 1.0 to 2.0 via VC
VCmidtohigh <- filter(ToothGrowth, dose > 0.5, supp=="VC")
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=VCmidtohigh)