Coursera Statistical Inference Project - Tooth Growth

This project investigates the tooth growth data set in R, looking at summary statistics and confidence intervals/hypothesis tests. The data relates to the length of odontoblasts in 60 guinea pigs. Each guinea pig received one of three doses of Vitamin C (0.5, 1.0, or 2.0 mg/day) by one of two delivery methods (orange juice or ascorbic acid).

1-2. Load Data and Summary Statistics

data(ToothGrowth)
rows <- nrow(ToothGrowth)
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

png("figure/len_dose_histogram.png", width = 800, height = 480)

par(mfrow=c(1,2))

a <- hist(ToothGrowth$len,
     main = "Histogram of Tooth Growth Length",
     xlab = "Length",
     ylab = "Frequency")

b <- hist(ToothGrowth$dose,
     main = "Histogram of Tooth Growth Dose",
     xlab = "Dose",
     ylab = "Frequency")

dev.off()

## quartz_off_screen 
##                 2

tooth growth len dose histogram

As you can see, there are 60 of records, with len, supp, and dose as the fields. 30 records have OJ as the supp field value, and 30 have VC. Dose ranges from 0.5 to 2.0, and len ranges from 4.2 to 33.9.

We will also break down average growth by the two fields using aggregate.

aggregate(len ~ supp, data = ToothGrowth, FUN = "mean")

##   supp      len
## 1   OJ 20.66333
## 2   VC 16.96333

aggregate(len ~ dose, data = ToothGrowth, FUN = "mean")

##   dose    len
## 1  0.5 10.605
## 2  1.0 19.735
## 3  2.0 26.100

3. Compare Tooth Growth by Supp and Dose

In this section, we will investigate tooth growth by Supp and Dose using confidence intervals. Based on the summary previously performed, these certainly seem like major differences, but we'll use hypothesis testing to gain more information.

First, we will test the hypothesis that there is a correlation between supp and len.

t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = ToothGrowth)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Based on the Student's T Test, the p-value is 0.06 and the 95% confidence interval contains zero, so we cannot reject the null hypothesis that there is no effect betwee the supplements.

Now we will subset dose and perform the same test.

dose_0.5_1.0 <- subset (ToothGrowth, dose %in% c(0.5, 1.0))
dose_0.5_2.0 <- subset (ToothGrowth, dose %in% c(0.5, 2.0))
dose_1.0_2.0 <- subset (ToothGrowth, dose %in% c(1.0, 2.0))

t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = dose_0.5_1.0)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = dose_0.5_2.0)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = dose_1.0_2.0)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

For each of these comparisons, the p-value is below 0.5 (in fact close to zero) and the 95% confidence intervals do not contain zero, so we can reject the null hypothesis and say that increasing the dose results in increasing tooth length.

4. Conclusions

We can conclude that different doses leads to increased tooth growth, but we cannot say that different supplements leads to any difference in tooth growth.

For this analysis, we assume that the sample of 60 guinea pigs are representative of the universe of guinea pigs and that the animals were randomly assigned to the given doses and delivery methods. Additionally, for the t-test, we assume that the variances are not equal.