Overview

This report provides basic analyses & summary of the ToothGrowth data in the R datasets package. In particular, confidence interval / hypothesis testing is used to compare tooth growth by supp and dose.

Exploratory Data Analysis

Summary of the data in tabular form can be found in appendix. Due to space constraint, let’s just plot the data

library(ggplot2)
g <- ggplot(ToothGrowth, aes(x = supp, y = len, group = factor(dose)))
g <- g + geom_point(size =10, pch = 21, alpha = .5, aes(fill = dose))
g

The tooth length is higher for larger dose, as expected. We also see that VC type supplement has larger spread (more variability) than OJ type.

Hypothesis Testing / Confidence Interval

We’ll treat the data as two independent groups and apply the t-test to the data.

Let’s build our null and alternative hypothesis

Status quo is that there is no difference in tooth growth for the two groups of supplement method (and hence tooth growth only depends on the dose). \(H_0\): True difference in mean in the two groups (OJ and VC) equals zero.

Our alternative hypothesis would be that one treatment method is more effective than the other. \(H_1\): True difference in mean in the two groups (OJ and VC) is not zero.

We’ll do three t-tests, one for each category of dose. The full output of the t-tests are in the appendix.

Dose = 0.5

id1 <- with(ToothGrowth, supp == "VC" & dose == 0.5)
g1 <- ToothGrowth$len[id1]
id2 <- with(ToothGrowth, supp == "OJ" & dose == 0.5)
g2 <- ToothGrowth$len[id2]

#difference is g1 - g2
t.test(g1, g2, paired = FALSE, var.equal = FALSE)$conf

## [1] -8.780943 -1.719057
## attr(,"conf.level")
## [1] 0.95

With this result, we would reject the null hypothesis in favor of the alternative hypothesis (the two treatment methods are not equally effective). Furthermore, since the confidence interval is entirely below zero, it suggests that Orange Juice (OJ) is a more effective treatment.

Dose = 1.0

id1 <- with(ToothGrowth, supp == "VC" & dose == 1.0)
g1 <- ToothGrowth$len[id1]
id2 <- with(ToothGrowth, supp == "OJ" & dose == 1.0)
g2 <- ToothGrowth$len[id2]

#difference is g1 - g2
t.test(g1, g2, paired = FALSE, var.equal = FALSE)$conf

## [1] -9.057852 -2.802148
## attr(,"conf.level")
## [1] 0.95

Again, with this result, we would reject the null hypothesis in favor of the alternative hypothesis (the two treatment methods are not equally effective). Furthermore, since the confidence interval is entirely below zero, it suggests that Orange Juice (OJ) is a more effective treatment.

Dose = 2.0

id1 <- with(ToothGrowth, supp == "VC" & dose == 2.0)
g1 <- ToothGrowth$len[id1]
id2 <- with(ToothGrowth, supp == "OJ" & dose == 2.0)
g2 <- ToothGrowth$len[id2]

#difference is g1 - g2
t.test(g1, g2, paired = FALSE, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  g1 and g2
## t = 0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.63807  3.79807
## sample estimates:
## mean of x mean of y 
##     26.14     26.06

With this result, we would fail to reject the null hypothesis.

Conclusions & Assumptions

Here’s a summary of the assumptions made:

We assume that there are 60 pigs in total, divided into 6 groups of 10.
We assume unequal variance for the analysis.
We use 95% confidence intervals (5% probability of Type I error)
We assume the two groups are independent

Our results show that for low dose (0.5 and 1.0), the two treatment methods are not equally effective and Orange Juice (OJ) may stimulate response of longer odontoblasts. However, for higher dose (2.0), the two methods (Orange Juice and pure chemical) are equally effective. Hence there is no difference between the two treatment methods for high dose (2.0).

Appendix

Basic summary of the data

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

The data consists of 3 columns: lenght of tooth (len), supplement method used (supp), and dose of the supplement (dose). We know from str that there are two types of supplements and a few levels of dose. The complete description of the data: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ToothGrowth.html

Further sneek peek into the data:

head(ToothGrowth)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

tail(ToothGrowth)

##     len supp dose
## 55 24.8   OJ    2
## 56 30.9   OJ    2
## 57 26.4   OJ    2
## 58 27.3   OJ    2
## 59 29.4   OJ    2
## 60 23.0   OJ    2

Full output of the t-tests

For dose = 0.5

id1 <- with(ToothGrowth, supp == "VC" & dose == 0.5)
g1 <- ToothGrowth$len[id1]
id2 <- with(ToothGrowth, supp == "OJ" & dose == 0.5)
g2 <- ToothGrowth$len[id2]

#difference is g1 - g2
t.test(g1, g2, paired = FALSE, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  g1 and g2
## t = -3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.780943 -1.719057
## sample estimates:
## mean of x mean of y 
##      7.98     13.23

For dose = 1.0

id1 <- with(ToothGrowth, supp == "VC" & dose == 1.0)
g1 <- ToothGrowth$len[id1]
id2 <- with(ToothGrowth, supp == "OJ" & dose == 1.0)
g2 <- ToothGrowth$len[id2]

#difference is g1 - g2
t.test(g1, g2, paired = FALSE, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  g1 and g2
## t = -4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.057852 -2.802148
## sample estimates:
## mean of x mean of y 
##     16.77     22.70

For dose = 2.0

id1 <- with(ToothGrowth, supp == "VC" & dose == 2.0)
g1 <- ToothGrowth$len[id1]
id2 <- with(ToothGrowth, supp == "OJ" & dose == 2.0)
g2 <- ToothGrowth$len[id2]

#difference is g1 - g2
t.test(g1, g2, paired = FALSE, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  g1 and g2
## t = 0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.63807  3.79807
## sample estimates:
## mean of x mean of y 
##     26.14     26.06

Inferential Exercise

Kevin Siswandi

24 May 2015