An Interesting Analysis of Tooth Growth

Overview

Analyzie tooth growth (data:ToothGrowth in R datasets package) to perform basic exploratory analysis, a summary of the data, model to compare tooth growth by supp and dose, and finally provide conclusions.

Data Exploration

First, We load the data, and necessary libraries. and perform exploratory analysis.

data("ToothGrowth")
library(tidyverse)

## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr

## Conflicts with tidy packages ----------------------------------------------

## filter(): dplyr, stats
## lag():    dplyr, stats

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

head(ToothGrowth,10)

##     len supp dose
## 1   4.2   VC  0.5
## 2  11.5   VC  0.5
## 3   7.3   VC  0.5
## 4   5.8   VC  0.5
## 5   6.4   VC  0.5
## 6  10.0   VC  0.5
## 7  11.2   VC  0.5
## 8  11.2   VC  0.5
## 9   5.2   VC  0.5
## 10  7.0   VC  0.5

We plan to model len on supp (2 level factor) and dose (only 3 levels); We change dose to being factors.

ToothGrowth$dose <- as.factor(ToothGrowth$dose)

Data Summary

From the plots below, we can see that

ggplot(ToothGrowth, aes(dose, len)) +
        geom_violin(aes(fill = dose)) +
        facet_grid(.~ supp) +
        labs(x = "Dosage", y = "Length", title = "Length vs Dosage by Supp", caption = "Figure - 1: higher dosage (dose) leads to higher tooth length (len), irrespective of supplement (supp)")

ggplot(ToothGrowth, aes(supp, len)) +
        geom_violin(aes(fill = supp)) +
        facet_grid(.~ dose) +
        labs(x = "Supplement", y = "Length", title = "Length by Supplement for Dosage", caption = "Figure - 1: For each dosage (dose), slightly varying tooth lengths (len) per supplement (supp)")

Model

Now we move to modeliing the data to thru confidence interval and hypothesis testing

t.test(len ~ supp, ToothGrowth)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

We find that confidence interval includes 0, and p-value is 0.06 which is greater than 0.05. Therefore, supp ( suplement type) has no impact on tooth growth.

Below, we are doing a series of comparison on tooth growth by dosage.

t1 <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5, 1.0))
t.test(len ~ dose, t1)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

t1 <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5, 2.0))
t.test(len ~ dose, t1)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

t1 <- subset(ToothGrowth, ToothGrowth$dose %in% c(1.0, 2.0))
t.test(len ~ dose, t1)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

Based on the results - none of the confidence intervals include 0, as well, the p-values are substanltially small for every tests.

Conclusion

Therefore, we can reject null hypothesis, eg we find that tooth length increases with dosage increased.

This happens irrespective of the different suppplement delivery methods - eg. supp showed to have no effect on tooth growth.