Synopis

This document proposes a basic exploratory analysis of the ToothGrowth data in the R datasets package. It uses hypothesis tests based on confidence intervals to compare tooth growth by delivery methods and dose levels.

Assumptions

  • The sample population of observed Guinea pigs is representative of the entire population of Guinea pigs.
  • The set of guinea pigs have been randomly selected and observed.
  • The variances between both every groups of observed pigs during T-tests are unequal.

Data processing

The ToothGrowth R dataset measures the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).

Loading

library(datasets)
data(ToothGrowth)

toothGrowth <- ToothGrowth
names(toothGrowth) <- c("length", "method", "dose")
toothGrowth$dose <- as.factor(toothGrowth$dose)

Exploratory data analysis

str(toothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ length: num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ method: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose  : Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...
head(toothGrowth)
##   length method dose
## 1    4.2     VC  0.5
## 2   11.5     VC  0.5
## 3    7.3     VC  0.5
## 4    5.8     VC  0.5
## 5    6.4     VC  0.5
## 6   10.0     VC  0.5
summary(toothGrowth)
##      length      method   dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

Several plots which present the median, minimum, maximum and outliers.

library(ggplot2)
library(grid)
library(gridExtra)

p1 <- ggplot(toothGrowth, aes(x=dose, y=length, color=dose)) + 
    geom_boxplot() +
    theme(legend.position = "none")

p2 <- ggplot(toothGrowth, aes(x=dose, y=length, color=dose)) + 
    geom_boxplot() +
    facet_grid(. ~ method)

p3 <- ggplot(toothGrowth, aes(x=method, y=length, color=method)) + 
    geom_boxplot() +
    theme(legend.position = "none")

p4 <- ggplot(toothGrowth, aes(x=method, y=length, color=method)) + 
    geom_boxplot() +
    facet_grid(. ~ dose)

grid.arrange(p1, p2, p3, p4, ncol = 2, nrow = 2, widths=c(1.5, 2.5))

The exploratory analysis suggests:

  • A correlation between the dosage level and the tooth growth.
  • The global absence of correlation between the delivery method and the tooth growth.
  • A faster increase of the tooth length with the Orange Juince delivery method for the 0.5 and 1.0 dosage levels.

Hypothesis testing with confidence intervals

This section compares tooth growth by delivery method and dose using hypothesis tests based on confidence intervals. The following correlations are analysed:

Delivery method and tooth growth

This section analyses existing correlation between the delivery method and the variation on the tooth length.

The NULL hypothesis to test is that there is no correlation between the delivery method and the tooth length.

T-test

The grouping factor for the T-test is the delivery method, assuming unequal variance between both groups of observations, and no data preparation is required since there are only two levels.

test <- t.test(length ~ method, paired = F, var.equal = F, data = toothGrowth)
test$conf.int[1:2]
## [1] -0.1710156  7.5710156
test$p.value
## [1] 0.06063451

The 95% confidence interval contains the zero value and the p-value is greater than 0.05. Consequently the NULL hypothesis is consolidated and cannot be rejected.

Dosage and tooth growth

This section analyses existing correlation between the dosage level and the variation on the tooth length.

The NULL hypothesis to test is that there is no correlation between the dose level and the tooth length.

T-test

The grouping factor for the T-test is the dosage level, it has more than two levels, hence data preparation is required before proceeding a T-test. Unpaired groups of observations have to be prepared, based on every combination of dosage level intervals.

dose.level.5_10 <- subset(toothGrowth, dose %in% c(.5, 1.0))
dose.level.5_20 <- subset(toothGrowth, dose %in% c(.5, 2.0))
dose.level.10_20 <- subset(toothGrowth, dose %in% c(1.0, 2.0))

Assuming unequal variance between both groups of observations

test <- t.test(length ~ dose, paired = F, var.equal = F, data = dose.level.5_10)
test$conf.int[1:2]
## [1] -11.983781  -6.276219
test$p.value
## [1] 1.268301e-07

The 95% confidence interval does not contain the zero value and the p-value is smaller than 0.05. Consequently the NULL hypothesis can be rejected for sure.

test <- t.test(length ~ dose, paired = F, var.equal = F, data = dose.level.5_20)
test$conf.int[1:2]
## [1] -18.15617 -12.83383
test$p.value
## [1] 4.397525e-14

The 95% confidence interval does not contain the zero value and the p-value is smaller than 0.05. Consequently the NULL hypothesis can be rejected for sure.

test <- t.test(length ~ dose, paired = F, var.equal = F, data = dose.level.10_20)
test$conf.int[1:2]
## [1] -8.996481 -3.733519
test$p.value
## [1] 1.90643e-05

The 95% confidence interval does not contain the zero value and the p-value is smaller than 0.05. Consequently the NULL hypothesis can be rejected for sure.

These T-tests confirm an evident correlation between the dosage level and the tooth length.

Delivery method and tooth length inside dose levels

This section proposes a deeper analysis of the existing correlation between the delivery method and the variation on the tooth length, this time for a given dosage level.

The NULL hypothesis to test is that there is no correlation between the delivery method and the tooth length for a given dosage level.

T-test

The grouping factor for the T-test is the delivery method, it has two levels, but the the dosage level has more than two levels, hence data preparation is required before proceeding a T-test. Unpaired groups of observations have to be prepared, based on every dosage level.

dose.level.5 <- subset(toothGrowth, dose %in% c(.5))
dose.level.10 <- subset(toothGrowth, dose %in% c(1.0))
dose.level.20 <- subset(toothGrowth, dose %in% c(2.0))

Assuming unequal variance between both groups of observations

test <- t.test(length ~ method, paired = F, var.equal = F, data = dose.level.5)
test$conf.int[1:2]
## [1] 1.719057 8.780943
test$p.value
## [1] 0.006358607

The 95% confidence interval does not contain the zero value and the p-value is smaller than 0.05. Consequently the NULL hypothesis can be rejected for sure.

test <- t.test(length ~ method, paired = F, var.equal = F, data = dose.level.10)
test$conf.int[1:2]
## [1] 2.802148 9.057852
test$p.value
## [1] 0.001038376

The 95% confidence interval does not contain the zero value and the p-value is smaller than 0.05. Consequently the NULL hypothesis can be rejected for sure.

test <- t.test(length ~ method, paired = F, var.equal = F, data = dose.level.20)
test$conf.int[1:2]
## [1] -3.79807  3.63807
test$p.value
## [1] 0.9638516

The 95% confidence interval contains the zero value and the p-value is greater than 0.05. Consequently the NULL hypothesis cannot be rejected.

The T-tests confirm:

  • a local correlation between the delivery method and the tooth length for the 0.5 and 1.0 dosage levels.
  • the absence of correlation between the delivery method and the tooth length for the 2.0 dosage level.

Conclusion

This analysis leads to the following conclusions, based on the above assumptions: