Introduction

This project will use the “ToothGrowth” dataset from the R datasets package to compare tooth growth by Vitamin C dose and delivery mechanism. In the Tooth Growth dataset, the variable len is the response variable indicating the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each subject received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (a form of vitamin C and coded as VC).

Source: https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/ToothGrowth

#Load the ggplot package
library(ggplot2)
library(data.table)
options(datatable.verbose=FALSE)

# Load the data ToothGrowth
data(ToothGrowth)

# Look at the structure of the data
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
#Provide a basic summary of the data
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
#Generate frequency distributions of the 

The table summary shows that the dataset contains 60 observations of which 30 have supp value = OJ (orange juice) and 30 have supp value = VC (Vitamin C).

Let’s create box plots for records with supplement type values OJ and VC comparing Tooth Growth by delivery mechanism and dosage.

# Box plot
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
p <- ggplot(ToothGrowth, aes(x = dose, y = len)) + 
  geom_boxplot(aes(fill = dose), position = position_dodge(0.9)) +
     facet_grid(cols = vars(supp), labeller = as_labeller(c("OJ" = "Orange juice", "VC" = "Ascorbic Acid")))+
labs(title="Tooth Growth by supplement type and dose", hjust=0.5 , y = "Tooth Growth", x = "Dose (mg/day)") +
theme(plot.title.position = 'plot',
plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette = "Dark2")
p

Hypothesis tests to compare tooth growth by supplement type and dose

This null hypothesis for this exercise (H0) is that there is no difference in length of odontoblasts between the subjects who received OJ and the subjects who received VC in the same doses. For simplicity, we will refer to this outcome variable (length of odontoblasts) as Tooth Growth.

Comparison by delivery method for the same dosage

We will use two sample t-tests to evaluate statistically significant differences between guinea pigs subjects who received the same daily does of ascorbic acid delivered either as a supplement of either orange juice (OJ) or Vitamin C (VC).

Evaluate the statistical significance of difference in mean values of tooth growth between the OJ and VC subject groups at each dose level

#check for statistically significant difference in tooth growth (len) at dose 0.5 mg/day

#get the length for each supplement type at dose=0.5
supp_OJ_dose_05 = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 0.5]
supp_VC_dose_05 = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 0.5]

#run t-test to calculate p value and 95% confidence intervals
ttest_0_5<-t.test(supp_OJ_dose_05, supp_VC_dose_05, alternative = "two.sided", paired = FALSE, var.equal = FALSE, conf.level = 0.95)

#check for statistically significant difference in tooth growth (len) at dose 1.0 mg/day

#get the length for each supplement type at dose=1.0
supp_OJ_dose_10 = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 1.0]
supp_VC_dose_10 = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 1.0]

#run t-test to calculate p value and 95% confidence intervals
ttest_1_0<-t.test(supp_OJ_dose_10, supp_VC_dose_10, alternative = "two.sided", paired = FALSE, var.equal = FALSE, conf.level = 0.95)

#check for statistically significant difference in tooth growth (len) at dose 2.0 mg/day

#get the length for each supplement type at dose=1.0
supp_OJ_dose_20 = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 2.0]
supp_VC_dose_20 = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 2.0]

#run t-test to calculate p value and 95% confidence intervals
ttest_2_0<-t.test(supp_OJ_dose_20, supp_VC_dose_20, alternative = "two.sided", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
#print out the results of the t-tests
# Make summary of the conducted t.tests, which compare the delivery methods by dosage
# take p-values and CI
 ttest_results<- data.table(
      "Dose" = c("0.5 mg/day","1.0 mg/day","2.0 mg/day"),
      "p-value" = c(ttest_0_5$p.value, ttest_1_0$p.value, ttest_2_0$p.value),
      "CI_lower" = c(ttest_0_5$conf.int[1],ttest_1_0$conf.int[1], ttest_2_0$conf.int[1]),
      "CI_higher" = c(ttest_0_5$conf.int[2],ttest_1_0$conf.int[2], ttest_2_0$conf.int[2])
      )

Results of two-sided t-tests of the difference in mean values between the subjects who received OJ vs VC at three dosage levels

#print table showing output of t-tests
ttest_results
##          Dose     p-value  CI_lower CI_higher
## 1: 0.5 mg/day 0.006358607  1.719057  8.780943
## 2: 1.0 mg/day 0.001038376  2.802148  9.057852
## 3: 2.0 mg/day 0.963851589 -3.798070  3.638070

Conclusions

The mean value tooth of growth (len) is greater for subjects who received OJ than for subjects who received VC in doses of 0.5 mg/day (p=0.006) and 1.0 mg/day (p=0.001) so we will reject the null hypothesis and conclude that OJ is associated with higher levels of tooth growth at the dose levels of 0.5 and 1.0 mg/day. For groups with subjects who received OJ and subjects who received VC at the dose of 2.0 mg/day, the difference in mean values of tooth growth (len) is not different at a level of statistical significance (p-value = 0.964). We, therefore, accept the null hypothesis at the dose level of 2.0 mg/day and conclude that there is no difference in tooth growth between the subjects who received OJ and the subjects who received VC at a dose of 2.0 mg/day.

The evaluation of the results of the t-tests relies on the assumption that both the OJ and VC dosage level subgroups are independent of each other and are normally distributed.