Synopsis

We’re going to analyze the ToothGrowth data in the R datasets package. The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

A data frame with 60 observations on 3 variables.

Source: C. I. Bliss (1952). The Statistics of Bioassay Academic Press.

Tasks for this project

  1. Load the ToothGrowth data and perform some basic exploratory data analyses
  2. Provide a basic summary of the data.
  3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there are other approaches worth considering)
  4. State your conclusions and the assumptions needed for your conclusions.
Loading ggplot2 library
library("ggplot2")

Task 1: Load the ToothGrowth data and perform some basic exploratory data analyses

Load in TootGrowth data
data("ToothGrowth")
View ToothGrowth structure
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Task 2: Provide a basic summary of the ToothGrowth data.

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
Convert dose column from numeric to factor
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
Length of Tooth by DOses Using “OJ” vs “VC” Supplements
#Convert dose column from numeric to factor
ToothGrowth$dose <- as.factor(ToothGrowth$dose)

g <- ggplot(ToothGrowth, aes(x = dose, y = len, fill = supp)) 
g <- g + geom_col() 
g <- g + labs(title = "Length of Tooth by Doses for Both Supplements", x = "Doses", y = "Tooth Length", fill="Supplements" ) 
g <- g + facet_grid(~supp, scales = "free")
g

We can see that the doses given through orange juice are more effective. This can be noticed since greater growth through the teeth’ length is observed when the dose is administered via orange juice and less when it is administered via ascorbic acid.

We could also notice that when 2mg doses are administered per day, the teeth length is the same regardless of the delivery methods.

All these are only initial guesses. We need to verify if all these guesses are true by carrying out hypothesis tests.

Task 3: Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

Hypothesis 1: Variation of Tooth Length by Supplements(OJ and VC)

Our null hypothesis is that the length of the tooth does not vary when we use either of the two delivery methods (VC or OJ) while our alternative hypothesis would be that tooth length varies depending on the delivery methods.

Hypothesis test on the effect of the supplements
t.test(len ~ supp, data = ToothGrowth)$conf.int
## [1] -0.1710156  7.5710156
## attr(,"conf.level")
## [1] 0.95
t.test(len ~ supp, data = ToothGrowth)$p.value
## [1] 0.06063451

Since the p-value is 0.06063 just a little bit greater than the significant level of 0.05, and the confidence interval contains zero there is not enough evidence to reject the null hypothesis.

We cannot assume the delivery type has a significant effect on tooth growth. So, we will fail to reject the null hypothesis which claimed that, supplements have no effect on tooth length.

Hypothesis 2: Variation of Tooth Length by Different Doses

Our null hypothesis here is that tooth length does not vary between methods when we use different doses while our alternative hypothesis would be that the length of the teeth varies according to the method and dose delivered.

Subset of ToothGrowth data by 0.5mg and 1.0mg
dose0.5_1.0 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
Hypothesis test on the subset
t.test(len ~ dose, data = dose0.5_1.0)$conf.int
## [1] -11.983781  -6.276219
## attr(,"conf.level")
## [1] 0.95
t.test(len ~ dose, data = dose0.5_1.0)$p.value
## [1] 1.268301e-07
Subset of ToothGrowth data by 0.5mg and 2.0mg
dose0.5_2.0 <- subset(ToothGrowth, dose %in% c(0.5, 2.0))
Hypothesis test on the subset
t.test(len ~ dose, data = dose0.5_2.0)$conf.int
## [1] -18.15617 -12.83383
## attr(,"conf.level")
## [1] 0.95
t.test(len ~ dose, data = dose0.5_2.0)$p.value
## [1] 4.397525e-14
Subset of ToothGrowth data by 1.0mg and 2.0mg
dose1.0_2.0 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))
Hypothesis test on the subset
t.test(len ~ dose, data = dose1.0_2.0)$conf.int
## [1] -8.996481 -3.733519
## attr(,"conf.level")
## [1] 0.95
t.test(len ~ dose, data = dose1.0_2.0)$p.value
## [1] 1.90643e-05

From all the t.tests above, we can see that the p-values are very small and therefore significant.

We will reject the null hypothesis and establish that increasing the dose level leads to an increase in tooth length since the p-values are far less than the significant level and the confidence intervals don’t contain zero.

Task 4: State your conclusions and the assumptions needed for your conclusions

Conclusions

  1. Supplement type has no significant effect on tooth growth
  2. Increasing the dose level leads to an increase in tooth growth

Assumptions

  1. The experiment was done with random assignment of guinea pigs to different dose level categories and supplement types to control for confounders that might affect the outcome.
  2. Members of the sample population, i.e. the 60 guinea pigs, are representative of the entire population of guinea pigs. This assumption allows us to generalize the results.
  3. For the t-tests, the variances are assumed to be different for the two groups being compared. This assumption is less strong than the case in which the variances are assumed to be equal.