Overview of the project

We’re going to analyze the ToothGrowth data for guinea pigs in the R datasets package. We will focus on answering the following research questions: H0: Vitamin C has no effect on tooth length. H1: The Dosage of Vitamin C has a positive effect on thooth length. H2: Orange juice (natural vitamin C) has a bigger effect on tooth length than Ascorbic Acid (synthetic vitamin C).

Data Description

The random variable is the tooth length measured among 10 guinea pigs at three different dosage levels of Vitamin C (0.5, 1, and 2 mg) and ingested with two different delivery methods (orange juice or ascorbic acid).The data format is a data frame with 60 observations on 3 variables (len renamed Length, supp renamed SupplementType and dose renamed Dose).

Data Processing

# Loading the dataset and looking at the data frame content.
library(datasets); data(ToothGrowth);str(ToothGrowth);
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# Renaming variables
names(ToothGrowth) <- c("Length", "SupplementType", "Dose")
levels(ToothGrowth$SupplementType) <- c("OrangeJuice", "AscorbicAcid")

Hypothesis testing, t-tests and p-values.

## Investigating Hypothesis 1
# Dose 0.5 mg versus 1 mg: Performing a t-test
Length1 <- subset(ToothGrowth, Dose %in% c(1,0.5))
t.test(Length ~ Dose, paired = FALSE, var.equal = FALSE, data = Length1)
## 
##  Welch Two Sample t-test
## 
## data:  Length by Dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735
# Dose 1 mg versus 2 mg: Performing a t-test
Length2 <- subset(ToothGrowth, Dose %in% c(1, 2))
t.test(Length ~ Dose, paired = FALSE, var.equal = FALSE, data = Length2)
## 
##  Welch Two Sample t-test
## 
## data:  Length by Dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

Conclusion: For all the above tests, the confidence interval do not include 0 and the p-value is below 0.05. We can therefore reject the null hypothesis and conclude that the level of intake of Vitamin C has an effect on tooth length. The t-tests also show that the samples with different dosages are significantly different from each other. Looking also at their means and standard deviation (see appendix), we can then conclude that the higher the Vitamin C dosage, the longer the thooth are.

## Investigating Hypothesis 2
# Ascorbic acid versus orange Juice at 0.5 mg: Performing a t-test
Length3 <- subset(ToothGrowth, Dose %in% c(0.5))
t.test(Length ~ SupplementType, paired = FALSE, var.equal = FALSE, data = Length3)
## 
##  Welch Two Sample t-test
## 
## data:  Length by SupplementType
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
##  mean in group OrangeJuice mean in group AscorbicAcid 
##                      13.23                       7.98
# Ascorbic acid versus orange Juice at 1 mg: Performing a t-test
Length4 <- subset(ToothGrowth, Dose %in% c(1))
t.test(Length ~ SupplementType, paired = FALSE, var.equal = FALSE, data = Length4)
## 
##  Welch Two Sample t-test
## 
## data:  Length by SupplementType
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
##  mean in group OrangeJuice mean in group AscorbicAcid 
##                      22.70                      16.77
# Ascorbic acid versus orange Juice at 2 mg: Performing a t-test
Length5 <- subset(ToothGrowth, Dose %in% c(2))
t.test(Length ~ SupplementType, paired = FALSE, var.equal = FALSE, data = Length5)
## 
##  Welch Two Sample t-test
## 
## data:  Length by SupplementType
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
##  mean in group OrangeJuice mean in group AscorbicAcid 
##                      26.06                      26.14

Conclusion: For 0.5 and 1 mg, the confidence interval do not include 0, the p-value is below 0.05. We can therefore reject the null hypothesis and conclude that orange juice has a bigger effect on tooth length than Ascorbic Acid for Vitamin C intake of 0.5 mg and 1 mg. For 2 mg, we can not reject the null hypothesis and can not conclude that Orange juice has a bigger effect on tooth length than Ascorbic Acid. However, we previously deduce that 2 mg has the biggest effect on tooth length, but their delivery method appear to be insignificant at that dosage.

Appendix

Plots

# Plotting the dataset with a boxplot and a coplot (idea for codes taken from R tutorials)
library(graphics)
# Graf 1
boxplot(Length ~ Dose, data = ToothGrowth,boxwex = 0.25, at = 1:3 - 0.2,
        subset = SupplementType == "AscorbicAcid", col = "yellow",main = "ToothGrowth data: length vs dose",
        xlab = "Vitamin C dosage (mg)", ylab = "Tooth Length", xlim = c(0.5, 3.5), ylim = c(0, 40), yaxs = "i")
boxplot(Length ~ Dose, data = ToothGrowth, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2,
        subset = SupplementType == "OrangeJuice", col = "orange")
legend(2.8, 11, c("Ascorbic Acid", "Orange Juice"), fill = c("yellow", "orange"))

# Graf 2
coplot(Length ~ Dose | SupplementType, data = ToothGrowth, panel = panel.smooth, col="red", bg = "orange", pch = 21, bar.bg = c(fac="orange"), xlab = "ToothGrowth data: length vs dose", ylab="Tooth Length")

Assumptions

Our analysis assumes the following: 1 The datasets are unpaired - Even thus, we lack information regarding the modalities of the study design, it is fair to assume that toothgrowth is studied among independent guinea pigs due to nature and time constraints involved in growing teeth. 2 We assume the variance of the population of guinea pigs to be unequals due to the inequalities in asbsording the ascorbic acid from natural and synthetic sources in the population.

Further t-tests

# Dose 0.5 versus 2 (not really needed)
t.test(Length ~ SupplementType, paired = FALSE, var.equal = FALSE, data = ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  Length by SupplementType
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
##  mean in group OrangeJuice mean in group AscorbicAcid 
##                   20.66333                   16.96333

Conclusion: When all dosages are combined in the same samples, the t-test shows that their are no significant difference between their delivery methods. The research question needs to be investigated in a more detailed fashion.

Means and standard deviations

d1 <- ToothGrowth$Length[ToothGrowth$SupplementType=="AscorbicAcid" & ToothGrowth$Dose==0.5 ]; mean(d1); sd(d1);
## [1] 7.98
## [1] 2.746634
d2 <- ToothGrowth$Length[ToothGrowth$SupplementType=="OrangeJuice" & ToothGrowth$Dose==0.5 ]; mean(d2); sd(d2);
## [1] 13.23
## [1] 4.459709
d3 <- ToothGrowth$Length[ToothGrowth$SupplementType=="AscorbicAcid" & ToothGrowth$Dose==1 ]; mean(d3); sd(d3);
## [1] 16.77
## [1] 2.515309
d4 <- ToothGrowth$Length[ToothGrowth$SupplementType=="OrangeJuice" & ToothGrowth$Dose==1 ]; mean(d4); sd(d4);
## [1] 22.7
## [1] 3.910953
d5 <- ToothGrowth$Length[ToothGrowth$SupplementType=="AscorbicAcid" & ToothGrowth$Dose==2 ]; mean(d5); sd(d5);
## [1] 26.14
## [1] 4.797731
d6 <- ToothGrowth$Length[ToothGrowth$SupplementType=="OrangeJuice" & ToothGrowth$Dose==2 ]; mean(d6); sd(d6);
## [1] 26.06
## [1] 2.655058

Source

C. I. Bliss (1952) The Statistics of Bioassay. Academic Press.

References

McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

Boxplots: Using β€˜at =’ and adding boxplots – example idea by Roger Bivand.