Analyzie tooth growth (data:ToothGrowth in R datasets package) to perform basic exploratory analysis, a summary of the data, model to compare tooth growth by supp and dose, and finally provide conclusions.
First, We load the data, and necessary libraries. and perform exploratory analysis.
data("ToothGrowth")
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
head(ToothGrowth,10)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
## 7 11.2 VC 0.5
## 8 11.2 VC 0.5
## 9 5.2 VC 0.5
## 10 7.0 VC 0.5
We plan to model len on supp (2 level factor) and dose (only 3 levels); We change dose to being factors.
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
From the plots below, we can see that
ggplot(ToothGrowth, aes(dose, len)) +
geom_violin(aes(fill = dose)) +
facet_grid(.~ supp) +
labs(x = "Dosage", y = "Length", title = "Length vs Dosage by Supp", caption = "Figure - 1: higher dosage (dose) leads to higher tooth length (len), irrespective of supplement (supp)")
ggplot(ToothGrowth, aes(supp, len)) +
geom_violin(aes(fill = supp)) +
facet_grid(.~ dose) +
labs(x = "Supplement", y = "Length", title = "Length by Supplement for Dosage", caption = "Figure - 1: For each dosage (dose), slightly varying tooth lengths (len) per supplement (supp)")
Now we move to modeliing the data to thru confidence interval and hypothesis testing
t.test(len ~ supp, ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
We find that confidence interval includes 0, and p-value is 0.06 which is greater than 0.05. Therefore, supp ( suplement type) has no impact on tooth growth.
Below, we are doing a series of comparison on tooth growth by dosage.
t1 <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5, 1.0))
t.test(len ~ dose, t1)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
t1 <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5, 2.0))
t.test(len ~ dose, t1)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
t1 <- subset(ToothGrowth, ToothGrowth$dose %in% c(1.0, 2.0))
t.test(len ~ dose, t1)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
Based on the results - none of the confidence intervals include 0, as well, the p-values are substanltially small for every tests.
Therefore, we can reject null hypothesis, eg we find that tooth length increases with dosage increased.
This happens irrespective of the different suppplement delivery methods - eg. supp showed to have no effect on tooth growth.