ToothGrowth Analysis

Exploratory data analysis

We’ll start directly by loading the data and understanding its structure

data("ToothGrowth")
glimpse(ToothGrowth)

## Observations: 60
## Variables: 3
## $ len  <dbl> 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 5.2, 7.0, 16....
## $ supp <fctr> VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, ...
## $ dose <dbl> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1....

table(ToothGrowth$dose)

## 
## 0.5   1   2 
##  20  20  20

levels(ToothGrowth$supp)

## [1] "OJ" "VC"

Dose has three values, which will serve us better as a factor.

ToothGrowth$dose <-  as.factor(ToothGrowth$dose)

Let’s analyze the distribution of length of odontoblasts, for different values of dose levels.

doselen <- ggplot(ToothGrowth, aes(x=dose, y=len))
doselen <- doselen + geom_violin(aes(color=dose), trim = F) + ggtitle("Length of odontoblasts vs. Dose Level") +
  theme(plot.title = element_text(hjust = 0.5)) + xlab("Vitamin C Dose level") + ylab("Length of odontoblasts")
doselen

Looking at the distribution, one can see that the length of odontoblasts increases as the dose increases. Now let’s include the Delivery methods into our analysis.

doselensupp <- doselen + facet_wrap(~supp)+ ggtitle("Length of odontoblasts vs. Dose Level with Delivery methods") +
  theme(plot.title = element_text(hjust = 0.5)) + xlab("Vitamin C Dose level") + ylab("Length of odontoblasts")
doselensupp

Here one can see for example that at dose level 2, the Length of odontoblasts is much more stable with OJ (Orange Juice) , while with VC the values are on average higher but less stable. Dose level 0.5 and 1 have different characteristics as well.

Basic summary

summary(ToothGrowth)

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

Let’s check which delivery method has the better mean/max length of odontoblasts. We’ll also check which has higher variance.

ToothGrowth %>% group_by(supp) %>% summarize(mean_length = mean(len))

## # A tibble: 2 × 2
##     supp mean_length
##   <fctr>       <dbl>
## 1     OJ    20.66333
## 2     VC    16.96333

ToothGrowth %>% group_by(supp) %>% summarize(max_length = max(len))

## # A tibble: 2 × 2
##     supp max_length
##   <fctr>      <dbl>
## 1     OJ       30.9
## 2     VC       33.9

ToothGrowth %>% group_by(supp) %>% summarize(var_length = var(len))

## # A tibble: 2 × 2
##     supp var_length
##   <fctr>      <dbl>
## 1     OJ   43.63344
## 2     VC   68.32723

So OJ has higher average length of odontoblasts, but doesn’t have maximum value. At the same time, OJ has much less variance.

Hypothesis tests to compare tooth growth by supp and dose.

Now we’ll run multiple hypothesis tests with different values of supp and dose and later determine if we can reject the null hypothesis. We’ll start with different values of dose levels.

t.test(len ~ dose, paired = F, var.equal = F, data = subset(ToothGrowth, dose %in% c(0.5, 1)))

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

t.test(len ~ dose, paired = F, var.equal = F, data = subset(ToothGrowth, dose %in% c(1, 2)))

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

t.test(len ~ dose, paired = F, var.equal = F, data = subset(ToothGrowth, dose %in% c(0.5, 2)))

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

Looking at the confidence intervals and p-values for the 3 conducted t-tests, we can easily reject the null hypothesis. Now we turn our attention to supp values.

t.test(len ~ supp, paired = F, var.equal = F, data = subset(ToothGrowth, dose == 0.5))

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

t.test(len ~ supp, paired = F, var.equal = F, data = subset(ToothGrowth, dose == 1))

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

t.test(len ~ supp, paired = F, var.equal = F, data = subset(ToothGrowth, dose == 2))

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

Here for dose level 2, we can’t reject the null hypothesis since the confidence interval contains 0.

Conclusions and assumptions

Looking at the conducted t-tests, we can conclude that tooth growth is primarily affected by the dose level of Vitamin C. For these conclusions we assume that all data are independent of each other and that there are no other “latent” factors actually influencing the tooth growth masquerading as dose level.

ToothGrowth Analysis

Amulya Bhatia

Exploratory data analysis

Basic summary

Hypothesis tests to compare tooth growth by supp and dose.

Conclusions and assumptions