Objective

The data and results presented below are a brief exploratory analysis of the Tooth Growth data set. The goal of the analysis is to quickly summarize the data and observe any important results. Also, various t-tests will be included to determine if different factors ultimately influence tooth length.

Loading and Processing The Data

The following code chunk implements necessary libraries for the script and loads in the “ToothGrowth” dataset.

library(datasets); library(ggplot2); library(dplyr)
tooth <- ToothGrowth

Quickly looking at the data, we notice there are 3 columns and 60 observations included in this relatively clean dataset.

head(tooth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
str(tooth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Exploratoring the Data

Let’s breifly explore the data.

#Create Mean aggregated data for plots: 
mean_len <- aggregate(len ~ supp + dose, data = tooth, mean)
mean_len <- rename(mean_len, length = len, supplement = supp)
mean_len
##   supplement dose length
## 1         OJ  0.5  13.23
## 2         VC  0.5   7.98
## 3         OJ  1.0  22.70
## 4         VC  1.0  16.77
## 5         OJ  2.0  26.06
## 6         VC  2.0  26.14
#Explore the data via plots: 
p1 <- ggplot(mean_len, aes(x = supplement, y = length)) + geom_bar(stat = "identity", aes(fill = supplement), show.legend = FALSE) + facet_grid(.~dose, labeller = label_both) + guides(fill = F) + ggtitle("Mean Tooth Length By Supplement Type") + xlab("Supplement") + ylab("Tooth Length")
p1

With the graphical visualization of the data, we notice a few key points to investigate. First, dose size seems to play a key role in tooth length. Next, there might be correlation between tooth length and the type of supplement provided.

Numerical Summary of the Data:

The first numerical summary is of the tooth lengths unbias of supplement or dosage.

#Overall summary of tooth lengths regardless of dose or supplement
summary(tooth$len)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20   13.07   19.25   18.81   25.27   33.90

Next, we have a list of summaries of tooth length values organized by supplement type and dosage.

#Summary of lengths by supplement type and dosage
with(tooth, by(len, list(supp, dose), summary))
## : OJ
## : 0.5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.20    9.70   12.25   13.23   16.18   21.50 
## -------------------------------------------------------- 
## : VC
## : 0.5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20    5.95    7.15    7.98   10.90   11.50 
## -------------------------------------------------------- 
## : OJ
## : 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   14.50   20.30   23.45   22.70   25.65   27.30 
## -------------------------------------------------------- 
## : VC
## : 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13.60   15.28   16.50   16.77   17.30   22.50 
## -------------------------------------------------------- 
## : OJ
## : 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22.40   24.57   25.95   26.06   27.07   30.90 
## -------------------------------------------------------- 
## : VC
## : 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.50   23.38   25.95   26.14   28.80   33.90

Statistical Analysis

We want to find any correlations between tooth length and dosage or supplement type. This calls for a group of t-tests.

Supplement as a Factor

First, let’s investigate supplement type and it’s correlation to tooth length. I’m testing the hypothesis where the mean tooth length for OJ (orange juice) is equal to VC (vitamin C supplement). Note: I quickly edited the column names for a more tidy dataset.

#Clean Tooth Data for ease of use
tooth$dose <- as.factor(tooth$dose)
tooth <- as.data.frame(split(tooth, "dose"))
colnames(tooth) = c("length", "supplement", "dose")
#Length vs. Supplement
t.test(formula = length ~ supplement, data = tooth)
## 
##  Welch Two Sample t-test
## 
## data:  length by supplement
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

The confidence interval [-.171, 7.571] includes 0 and thus we do not reject the null hypothesis. There doesn’t seem to be convincing evidence in the data suggesting that supplement type is a bias for tooth length.

Dosage as a factor

Now, let’s investigate the dose size relative to tooth length. There are three factors “.5”, “1”, and “2” so each dose will be compared with one of the other factors. The hypothesis is the same; mean tooth length for dose A and dose B are equivalent, where A and B are to be spcified in each test.

test1 <- subset(tooth, dose == c("0.5", "1"))
test2 <- subset(tooth, dose == c("1", "2"))
test3 <- subset(tooth, dose == c("0.5", "2"))


t.test(length ~ dose, test1, alt = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  length by dose
## t = -4.4725, df = 17.976, p-value = 0.0002952
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -14.43327  -5.20673
## sample estimates:
## mean in group 0.5   mean in group 1 
##             10.63             20.45
t.test(length ~ dose, test2, alt = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  length by dose
## t = -3.6827, df = 17.949, p-value = 0.00171
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -10.899993  -2.980007
## sample estimates:
## mean in group 1 mean in group 2 
##           19.02           25.96
t.test(length ~ dose, test3, alt = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  length by dose
## t = -7.3335, df = 17.635, p-value = 9.362e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -19.72833 -10.93167
## sample estimates:
## mean in group 0.5   mean in group 2 
##             10.63             25.96

In all three cases, we reject the null hypothesis for a two sided t-test providing evidence that there is a correlation between tooth length and dosage with 95% confidence.

Conclusion

In the breif analysis, we have shown that there is not compelling evidence in the data to suggest a correlation between the supplement and tooth length. However, when comparing dosage, in all three cases we confirmed there seems to be a correlation between dose size and tooth length. We can see this in the first figure from the exploratory analysis as there seems to be a steady incrasing trend over the doses.