Overview

The purpose on this project is to find out how delivery method of vitamin C affects on tooth growth in Guinea Pigs. The study was done across 60 Guinea pigs supplying dose of Vitamin C of 0.5, 1 and 2 mg with 2 different delivery methods: orange juice (OJ) and an aqueous solution of ascorbic acid (VC).
To give answer to this question, a exploratory data analysis and a computing of 95% confidece interval method will be carried out.

Exploratory data analysis

The data set of the study “ToothGrowth” is taken from the library “datasets” in R. Let’s proceed to load required packages and data:

library(datasets)
library(ggplot2)
library(dplyr)

data("ToothGrowth")
data <- ToothGrowth

Let’s see how the data is structured and the number of variables we have:

summary(data)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

The variable len makes reference to the tooth’s length in the Guinea pigs once the pigs were sacrified and tooth were extracted for measurement.
Supp is a factor variable and represent the delivery method: OJ or VC.
Dose is the amount of vitamin C supplied: 0.5, 1 or 2 mg.

So we have 60 Guinea pigs which have been divided in 6 groups of 10 pigs. 3 groups have received OJ as delivery method and VC for the rest. Each of those 6 groups have received a different dose of Vitamin C.

Let’s do some exploratory data analysis:

# dose only have 3 possible values, so I convert it to a factor for a better visualization
data$dose <- as.factor(data$dose)

q <- ggplot(data, aes(dose, len)) + geom_point(aes(colour = supp)) + ggtitle("Tooth Growth vs. Vitamin C dose by delivery method") + stat_summary(aes(group = supp, colour = supp), fun.y = mean, geom = "line")
print(q)

Every point represents the tooth growth (y axis) in every Guinea pig for different dose (x axis). The delivery method is shown by colours. The lines represent the average value of len for every dose and supp.
A increase of dose of vitamin C causes in average terms a increment in tooth length.
It seems that OJ is more efective for doses 0.5 and 1 mg, but is not clear enough for dose of 2 mg.
To simplify this excercise, I am going to make the hypothesis that the tooth length increases with the vitamin C dose, so the objective of this study will be to determine which is the best delivery method that makes the tooth to grow more for every dose .
To achieve this goal, I will use 95% confidence interval method.

Confidence Interval Method

This study will determine which delivery method is better for every dose with a certainty of 95%. The way to proceed is to select one dose, and pass some argument to the function t.test in R, as the sample size is quite small (which refers to the t-distribution).
The way it works is calculating the difference of the tooth growth for the two groups (OJ and VC) for a particular dose, compute the standard deviation for each group and build a interval which if we pick a random sample from the two groups, calculate the difference of tooth length, the mean of this difference will lay between the two extrems of the interval with a certainty of 95%.
The results will be refered to OJ - VC.

dose.v <- unique(data$dose)
supp.v <- unique(data$supp)
# Create a list to save the results
result <- list("Dose-0.5" = c(), "Dose-1" = c(), "Dose-2"  = c())

for( i in seq_along(dose.v)){
  
  g1 <- filter(data, supp == "OJ" & dose == dose.v[i])[,"len"]
  g2 <- filter(data, supp == "VC" & dose == dose.v[i])[,"len"]
  
  test <- as.vector(t.test(g1, g2, paired = FALSE, var.equal = FALSE)$conf.int)
  
  result[[i]] <- round(test,2)

}

print(result)
## $`Dose-0.5`
## [1] 1.72 8.78
## 
## $`Dose-1`
## [1] 2.80 9.06
## 
## $`Dose-2`
## [1] -3.80  3.64

Conclusion

Dose-0.5 Confidence Interval 1.72, 8.78
It is quite wide, but it is positive and 0 is excluded from it, so we can assure with a certainty of 95% that OJ produces a bigger tooth growth than VC for dose = 0.5 mg.

Dose-1 Confidence Interval 2.8, 9.06
As for the previous dose, this interval is positive and 0 is excluded from it, so we can assure with a certainty of 95% that OJ produces a bigger tooth growth than VC for dose = 1mg.

Dose-2 Confidence Interval -3.8, 3.64
This interval contains 0 and it is centered between the two extrems, so we can not say which delivery method is better for a dose of 2 mg.

The conclusion of this study is that for dose 0.5 and 1 mg, OJ provides a larger tooth growth than VC. For dose of 2 mg, they can use either OJ or VC as it seems both provides the same results.

Assumptions

  1. 60 different Guinea pigs will be an independent sample study (paired = FALSE in t.test function).
  2. Do not know anything about equality of the variance, so I consider the argument var.equal = FALSE for the t.test function.
  3. The sample size is not large enough, so to make sure the calculation is right, I have choosen to use T-confidence interval using the t.test function in R (which uses t-quantiles). As seen in theory, if n is large, t-distribution tends to be a normal distribution.
  4. From the exploratory analysis I concluded that the tooth length is proportional to the supply of vitamin C, so I focused this study in determine which is the best delivery method that produces a larger tooth growth.
  5. All variables are iid.
  6. Use certainty of 95% as it is commonly used (t-quantile = 0.975).
  7. The results will be refered to OJ - VC.