In this analysis I will analyze the ToothGrowth data from the R datasets package.
library(datasets); data(ToothGrowth) #Loading the data
library(dplyr); library(ggplot2)
Structure of data.
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Head and tail of data.
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
## "55" 24.8 "OJ" 2
## "56" 30.9 "OJ" 2
## "57" 26.4 "OJ" 2
## "58" 27.3 "OJ" 2
## "59" 29.4 "OJ" 2
## "60" 23 "OJ" 2
Summary of data.
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
As can be gleaned from the statistics above, this data set has 60 observations of 3 variables. R’s help page for this dataset gives some better context for the dataset on the whole.
Since, as the briefer above clarifies, the dose variable only takes one of three possible values, I will convert it from a num to a factor.
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
The first thing I will do is make a couple of boxplots to explore the potential relationship between length (len) and the other two variables: supplement (supp) and dose.
par(mfrow = c(1,2))
boxplot(len ~ dose, ToothGrowth, col = 2:4, xlab = "Dose", ylab = "Length")
boxplot(len ~ supp, ToothGrowth, col = 2:3, xlab = "Supplement", ylab = "Length")
The first plot indicates that larger doses of vitamin C may be associated with longer odontoblast length.
The second plot indicates that orange juice (OJ) as a supplement may be associated with larger length of odontoblasts compared to ascorbic acid (VC) as a supplement.
Digging a little deeper, lets look at the average lengths for each pair of dose and supp.
as.data.frame(ToothGrowth %>% group_by(supp,dose) %>% summarize(MeanLength = mean(len)))
## supp dose MeanLength
## 1 OJ 0.5 13.23
## 2 OJ 1 22.70
## 3 OJ 2 26.06
## 4 VC 0.5 7.98
## 5 VC 1 16.77
## 6 VC 2 26.14
Let’s take a look at these same data points with a plot.
ggplot(ToothGrowth, aes(supp, len)) + geom_boxplot(aes(fill = dose)) + facet_grid(.~dose) +
labs(x = "Supplement", y = "Length")
One can see that, according to this data, OJ appears to be associated with longer length than VC for dose levels 0.5 and 1.0, but that this association no longer seems to hold for dose level 2.0. In fact, for dose level 2.0, VC has a higher mean length (26.14) than OJ(26.06), though the difference is negligible. It’s possible that OJ has a stronger association with length than does VC only for lower dose levels. Further, it’s possible that VC has a stronger association with length than does OJ for higher dose levels. Redoing this experiment and adding another, higher dose level would help test this hypothesis.
As was seen in the exploratory section above, higher doses of vitamin C appeared to be associated with longer odontoblast length. To further investigate whether the average difference in lengths by dose levels is significant, I will compare the average lengths for the largest dose (2) with those for the smallest dose (.5) using a two-sided Student’s t-test, assuming equal variances and independence.
dose.5 <- ToothGrowth$len[ToothGrowth$dose == .5]
dose2 <- ToothGrowth$len[ToothGrowth$dose == 2]
t.test(dose2, dose.5, var.equal = TRUE, conf.level = .99)
##
## Two Sample t-test
##
## data: dose2 and dose.5
## t = 11.799, df = 38, p-value = 2.838e-14
## alternative hypothesis: true difference in means is not equal to 0
## 99 percent confidence interval:
## 11.93407 19.05593
## sample estimates:
## mean of x mean of y
## 26.100 10.605
The results of the t-test suggest that dose 2 is associated with longer lengths than dose .5, since the 99% confidence interval lies entirely above 0, indicating that the true difference in means is not equal to 0. Further, the very low p-value and high abs(t-value) indicates the differences are significant.
As was seen in the exploratory section above, vitamin C intake via orange juice (OJ) appeared to be associated with longer odontoblast length compared to vitamin C intake via ascorbic acid (VC). To further investigate whether the difference in lengths by supplement type is significant, I will compare the average lengths for the two supplement types (OJ and VC) using a two-sided Student’s t-test, assuming equal variances and independence.
doseOJ <- ToothGrowth$len[ToothGrowth$supp == "OJ"]
doseVC <- ToothGrowth$len[ToothGrowth$supp == "VC"]
t.test(doseOJ, doseVC, var.equal = TRUE, conf.level = .99)
##
## Two Sample t-test
##
## data: doseOJ and doseVC
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 99 percent confidence interval:
## -1.445056 8.845056
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
The results of the t-test indicate that one can not conclude that supp OJ is associated with longer lengths than supp VC, since the 99% confidence interval includes 0. The p-value is also above .05, indicating that the results are not unusual enough, assuming the null hypothesis is true (mean in differences is 0).
The analysis of odontoblast length by dose levels leads one to the conclusion that higher dosage levels of vitamin C are associated with odontoblast length. Only data for three dose levels was included, however, and the relevant t-test only looked at the largest and smallest; it’s unclear if this association would hold up with even larger dose levels (5, 10, etc.). It’s worth noting the difference in mean length between dose level .5 and dose level 1 is larger than the difference in mean length between dose level 1 and dose level 2, suggesting diminishing length increase for larger dose levels.
The analysis of odontoblast length by supplement type leads one unable to reach the conclusion that OJ is associated with odontoblast length compared to VC.
Assumed above are equal variances when conducting the two t-tests, as well as the data not being paired. Also assumed is that the effects of dose and supp on odontoblast length hold for teeth in general, and not merely the teeth of guinea pigs.