Part 2: Basic Inferential Data Analysis

Data Loading and Basic Exploratory Data Analyses

In this part, the ToothGrowth data will be loaded and some basic exploratory data analyses will be performed.

knitr::opts_chunk$set(echo = TRUE)
if(!require(tidyverse)){
   install.packages("tidyverse")
}
library(tidyverse)
library(datasets) #load the dataset
data(ToothGrowth)
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
table(ToothGrowth$supp, ToothGrowth$dose) # split of cases between different dose levels and delivery methods
##     
##      0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10

ToothGrowth is a data frame with 60 observations on 3 variables, namely “len”, “supp” and “dose”.

The variable “len” refers to the length of odontoblasts in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day, under the variable “dose”) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (a form of vitamin C and coded as VC), under the variable “supp”. There were 10 animal subjects for each dose level under each delivery method.

Below shows the basic exploratory data analyses of ToothGrowth:

ggplot(data=ToothGrowth,
       mapping=aes(x=as.factor(dose), y=len, fill=supp))+
        geom_boxplot() +
        labs(x= "Dose of Vitamin C(mg/day)",
             y = "Length of odontoblasts",
             title = "Effect of vitamin C on the length of odonoblasts in guinea pigs")+
          stat_summary(fun = median, 
                        geom = "text", 
                        aes(label = after_stat(y)), 
                        vjust = -1,    # Adjust vertical position above the line
                        size = 4)       # Adjust text size

print(summary(ToothGrowth))
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
print("Summary table of ToothGrowth dataset")
## [1] "Summary table of ToothGrowth dataset"
Meandata<-ToothGrowth |>
        group_by(supp, dose)|>
        summarise(mean_length=mean(len), SD=sd(len),.groups = "drop")
print(Meandata)
## # A tibble: 6 × 4
##   supp   dose mean_length    SD
##   <fct> <dbl>       <dbl> <dbl>
## 1 OJ      0.5       13.2   4.46
## 2 OJ      1         22.7   3.91
## 3 OJ      2         26.1   2.66
## 4 VC      0.5        7.98  2.75
## 5 VC      1         16.8   2.52
## 6 VC      2         26.1   4.80
print("Summary table of the mean length of odonoblasts with different dosages and delivery methods")
## [1] "Summary table of the mean length of odonoblasts with different dosages and delivery methods"
Meandata2<-ToothGrowth |>
        group_by(dose)|>
        summarise(mean_length=mean(len), SD=sd(len), .groups = "drop")
print(Meandata2)
## # A tibble: 3 × 3
##    dose mean_length    SD
##   <dbl>       <dbl> <dbl>
## 1   0.5        10.6  4.50
## 2   1          19.7  4.42
## 3   2          26.1  3.77
print("Summary table of the mean length of odonoblasts")
## [1] "Summary table of the mean length of odonoblasts"
print("with different dosages")
## [1] "with different dosages"
ggplot(data=Meandata2,
       mapping=aes(x=dose, y=mean_length))+
        geom_point(colour="blue")+
        geom_smooth(formula = 'y ~ x', method = 'lm', se=FALSE, colour="grey")+
        geom_errorbar(aes(ymin = mean_length - SD, ymax = mean_length + SD), 
                width = 0.2, position = position_dodge(0.3))+
         labs(x= "Dose of Vitamin C(mg/day)",
             y = "Length of odontoblasts",
             title = "Effect of vitamin C on the mean length of odonoblasts in guinea pigs")       

Basic Summary of the data

Based on the boxplot and the line graph above, vitamin C supplement shows a positive relationship with length of odonoblasts in guinea pig in a dose-dependent mannner, as the length increased with the dose. The delivery method, VC, seems to be more effective than OJ at the lower dose levels (0.5mg/day and 1mg/day), as suggested by the higher median values in the boxplot. Same trends were observed based on the table of the mean length values. Statistical analyses are needed to confirm these observations.

Data Subsetting and Statistical Analyses

Data Subsetting

To prepare data for statistical analyses, ToothGrowth dataset was subset as following:

oj_data <- filter(ToothGrowth, supp == "OJ")
vc_data <- filter(ToothGrowth, supp == "VC")
dose0.5 <- ToothGrowth[ToothGrowth$dose == 0.5, ]$len
dose1 <- ToothGrowth[ToothGrowth$dose == 1, ]$len
dose2 <- ToothGrowth[ToothGrowth$dose == 2, ]$len
oj_dose0.5 <- oj_data[oj_data$dose == 0.5, ]$len
oj_dose1 <- oj_data[oj_data$dose == 1, ]$len
oj_dose2 <- oj_data[oj_data$dose == 2, ]$len
vc_dose0.5 <- vc_data[vc_data$dose == 0.5, ]$len
vc_dose1 <- vc_data[vc_data$dose == 1, ]$len
vc_dose2 <- vc_data[vc_data$dose == 2, ]$len

Hypothesis Test: Effect of Dose levels on Length of Odonoblasts

To support the notion that vitamin C supplement shows a positive relationship with length of odonoblasts in guinea pig, it is hypothesized that a higher dose resulted in a greater length while the null hypotheses state that the mean of length remained unchanged regardless to the dose level. One sided t-tests were performed as following:
Test 1:
H0: length mean of dose0.5 = length mean of dose1
Ha: length mean of dose0.5 < length mean of dose1

t.test(dose1-dose0.5, paired = FALSE, alt = "greater")
## 
##  One Sample t-test
## 
## data:  dose1 - dose0.5
## t = 6.9669, df = 19, p-value = 6.127e-07
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  6.863996      Inf
## sample estimates:
## mean of x 
##      9.13

Test 2:
H0: length mean of dose1 = length mean of dose2
Ha: length mean of dose1 < length mean of dose2

t.test(dose2-dose1, paired = FALSE, alt = "greater")
## 
##  One Sample t-test
## 
## data:  dose2 - dose1
## t = 4.6046, df = 19, p-value = 9.671e-05
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  3.974821      Inf
## sample estimates:
## mean of x 
##     6.365

In both cases, the t statistics are larger than the 95 percent confidence interval and the p-values are less than 0.05. Thus, the null hypotheses are rejected, supporting the alternative hypotheses, i.e. a higher dose resulted in a greater length of odonoblasts. To investigate if this also holds true for different delivery methods, further one-sided t-tests (Tests 3 - 6) are carried out as following:
Test 3:
H0: length mean of oj_dose0.5 = length mean of oj_dose1
Ha: length mean of oj_dose0.5 < length mean of oj_dose1
Test 4:
H0: length mean of oj_dose1 = length mean of oj_dose2
Ha: length mean of oj_dose1 < length mean of oj_dose2
Test 5:
H0: length mean of vc_dose0.5 = length mean of vc_dose1
Ha: length mean of vc_dose0.5 < length mean of vc_dose1
Test 6:
H0: length mean of vc_dose1 = length mean of vc_dose2
Ha: length mean of vc_dose1 < length mean of vc_dose2

p3<-t.test(oj_dose1-oj_dose0.5, paired = FALSE, alt = "greater")$p.value
p4<-t.test(oj_dose2-oj_dose1, paired = FALSE, alt = "greater")$p.value
p5<-t.test(oj_dose1-oj_dose0.5, paired = FALSE, alt = "greater")$p.value
p6<-t.test(oj_dose2-oj_dose1, paired = FALSE, alt = "greater")$p.value
# Create a data frame with the results
p_values_table <- data.frame(
  Test = c("Test 3", "Test 4", "Test 5", "Test 6"),
  Comparison = c("OJ: Dose 1 vs 0.5", "OJ: Dose 2 vs 1", "VC: Dose 1 vs 0.5", "VC: Dose 2 vs 1"),
  P_Value = c(p3, p4, p5, p6)
)

# Print the table to the console
print(p_values_table)
##     Test        Comparison    P_Value
## 1 Test 3 OJ: Dose 1 vs 0.5 0.00121757
## 2 Test 4   OJ: Dose 2 vs 1 0.04191956
## 3 Test 5 VC: Dose 1 vs 0.5 0.00121757
## 4 Test 6   VC: Dose 2 vs 1 0.04191956

As all the p-values are lower than 0.05, the null hypotheses are rejected with p-value = 0.05, supporting the notion that a higher dose results in a greater length of odonoblasts regardless to the delivery methods (OJ or VC).

Hypothesis Tests: Effect of Delivery methods on Length of Odonoblasts

To evaluate whether OJ is a more effective delivery method than VC over different dose levels, following t-tests are performed (one-sided for Test 7 to Test 9; two-sided for test 10.

Test 7:
H0: length mean of vc_dose0.5 = length mean of oj_dose0.5
Ha: length mean of vc_dose0.5 < length mean of oj_dose0.5
Test 8:
H0: length mean of vc_dose1 = length mean of oj_dose1
Ha: length mean of vc_dose1 < length mean of oj_dose1
Test 9:
H0: length mean of vc_dose2 = length mean of oj_dose2
Ha: length mean of vc_dose2 < length mean of oj_dose2
Test 10:
H0: length mean of vc_dose2 = length mean of oj_dose2
Ha: length mean of vc_dose2 =/= length mean of oj_dose2

p7<-t.test(oj_dose0.5-vc_dose0.5, paired = FALSE, alt = "greater")$p.value
p8<-t.test(oj_dose1-vc_dose1, paired = FALSE, alt = "greater")$p.value
p9<-t.test(oj_dose2-vc_dose2, paired = FALSE, alt = "greater")$p.value
p10<-t.test(oj_dose2-vc_dose2, paired = FALSE, alt = "two.sided")$p.value
# Create a data frame with the results
p_values_table2 <- data.frame(
  Test = c("Test 7", "Test 8", "Test 9", "Test 10"),
  Comparison = c("Dose 0.5: OJ vs VC", "Dose 1: OJ vs VC", "Dose 2: OJ vs VC", "Dose 2: OJ vs VC (two-sided)"),
  P_Value = c(p7, p8, p9, p10)
)

# Print the table to the console
print(p_values_table2)
##      Test                   Comparison     P_Value
## 1  Test 7           Dose 0.5: OJ vs VC 0.007736024
## 2  Test 8             Dose 1: OJ vs VC 0.004114624
## 3  Test 9             Dose 2: OJ vs VC 0.516521648
## 4 Test 10 Dose 2: OJ vs VC (two-sided) 0.966956704

As the p values for dose levels 0.5mg/day and 1mg/day (Test 7 & Test 8) are much lower than 0.05, we can reject null hypotheses and support the alternative hypotheses: OJ is a more effective delivery method at the dose levels 0.5mg/day and 1mg/day. However, at dose level 2mg/day, the p-value for either one-sided (Test 9) or two-sided (Test 10) t-test is much higher than 0.05 and thus we the null hypothesis cannot be rejected. Thus, OJ and VC appear to make no significant difference in stimulating the growth of odonoblasts at the higher dose level.

Conclusions and assumptions

*Conclusions:
- Length of odonoblasts increased with the dose of Vitamin C, regardless to the delivery methods, suggesting a positive influence of Vitamin C on the growth of odonoblasts.
- OJ was a more effective delivery method compared to VC at the dose levels 0.5mg/day and 1mg/day.

*Assumption:
- The experiment was done with random assignment of guinea pigs to different dose levels and delivery methods to control for confounders that might affect the outcome.
- Members of the sample population, i.e. the 60 guinea pigs, are representative of the entire population of guinea pigs. This assumption allows us to generalize the results.
- For the t-tests, the variances are assumed to be different for the two groups being compared as the default setting was used (var.equal=FALSE).