Exploring the ToothGrowth data in the R datasets package

For this second part of the project we’re going to explore the ToothGrowth dataset in R and use hypothesis test to compare tooth growth in guinea pigs by supplement and dose.

Some useful information regarding the ToothGrowth data can be obtained directly from the R help (?ToothGrowth). 60 guinea pigs received one of three dose levels (0.5, 1.0, and 2.0 mg/day) of vitamin C, by one of two delivery methods, orange juice (OJ) or ascorbic acid (coded as VC).

Analysis

1. Load required R Libraries

library(ggplot2)

2. Load the ToothGrowth data

data("ToothGrowth")

3. Explore the data

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

head(ToothGrowth)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

unique(ToothGrowth$dose)

## [1] 0.5 1.0 2.0

4. Perform basic summary

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

5. Plot the information

plot1 <- ggplot(ToothGrowth, aes(x=as.factor(dose), y=len, fill=supp)) +
    geom_boxplot() +
    xlab("Dose (mg/day)") +
    ylab("Tooth length") +
    guides(fill=guide_legend(title="Supplement type")) +
    labs(title="Tooth Growth in Guinea Pigs")
print(plot1)

We can see that tooth length seems to increse for bigger doses, both for VC and OJ supplements.

Hypothesis

Let’s now use t-test to compare tooth growth by supplement and by dose. For this evaluation we assume that the sample size is representative of the population of guinea pigs, and that the variance of the different groups compared is not equal.

The formula for t-test is as follow:

\[\ t = \bar{X_1} - \bar{X_2} / \sqrt{S_1^2/N_1 + S_2^2/N_2}\]

in R, we use t.test. First, we evaluate tooth length by supplement:

Lenght by supplement

t.test(len ~ supp, data = ToothGrowth)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Here p-value=0.06063 is bigger than the threshold of 0.05, and the confidence interval contains zero. This means that we can not reject the null hypothesis that different supplements have no impact on tooth growth.

Now, lets apply t-test for the three different pair of doses.

0.5 and 1.0

dose_05_10 <-subset(ToothGrowth, dose %in% c(0.5,1.0)) 
t.test(len ~ dose, data = dose_05_10)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

0.5 and 2.0

dose_05_20 <-subset(ToothGrowth, dose %in% c(0.5,2.0)) 
t.test(len ~ dose, data = dose_05_20)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

1.0 and 2.0

dose_10_20 <-subset(ToothGrowth, dose %in% c(1.0,2.0)) 
t.test(len ~ dose, data = dose_10_20)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

In all three cases p-value is very small and the confidence interval does not contains zero; therefore, we can reject the null hypothesis and determine that the dose affects the tooth length.

Conclusions

Based on the results previously exposed, we can conclude that:

Different supplements have no impact on tooth growth.
Increased dose level leads to increased tooth growth.

Statistical Inference Course Project Part 2

adanlp

March 25, 2018