Overview

This is the second part of the Statistical Inference Course Project. It will consist on basic inferential data analysis of the ToothGrowth data in the R datasets package. This data records the effect of vitamin C on tooth growth in guinea pigs.


Loading the Data & Basic Exploratory Analysis

library(plyr)
library(dplyr)
library(ggplot2)
library(datasets) #open datasets package 
data("ToothGrowth") #load data

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Using the str function, we see that this dataset consists of 60 observations (n=60) and three variables:

Let’s do a quick comparision of supplement type at each dose level.

data=ToothGrowth 

data$dose <- as.factor(data$dose) #make dose a factor
data$supp <- mapvalues(data$supp, from = c("OJ", "VC"), to = c("Orange Juice", "Ascorbic Acid")) #change supp level names

g <- ggplot(data, aes(x=dose, y=len))
g + facet_wrap(.~supp) + 
        geom_boxplot(aes(fill=dose)) +
        labs(x="Dose mg/day", y="Tooth Length", 
             title="Guinea Pig Tooth Length vs. Vitamin C Dose")

Basic Summary of Data

Looking at the conditional boxplots, tooth length increases as vitamin C dosage increases. Also at 0.5 and 1 mg/day doses, orange juice (OJ) yields longer tooth length compared to ascorbic acid (VC). However, as 2 mg/day, tooth length seems unaffected by supplement type.

To support the summary, the table below shows the mean length of each supplement for each dose level.

#calculate average length per supp per dose
data %>% 
        group_by(supp, dose) %>% 
        summarise(average_length = mean(len))
## # A tibble: 6 x 3
## # Groups:   supp [2]
##   supp          dose  average_length
##   <fct>         <fct>          <dbl>
## 1 Orange Juice  0.5            13.2 
## 2 Orange Juice  1              22.7 
## 3 Orange Juice  2              26.1 
## 4 Ascorbic Acid 0.5             7.98
## 5 Ascorbic Acid 1              16.8 
## 6 Ascorbic Acid 2              26.1

Hypothesis Testing

We will now compare the supplements across the dataset and at each dosage level.

Test 1: OJ and VJ result in equal tooth length across doses.

Given m1 = mean len given OJ & m2 = mean len given VC;

H_o: m1 = m2

OJ <- data$len[data$supp=="Orange Juice"] #subset OJ lengths
VC <- data$len[data$supp=="Ascorbic Acid"] #subset VC lengths

test1 <- t.test(OJ, VC, paired = FALSE, var.equal = FALSE)
print(test1)
## 
##  Welch Two Sample t-test
## 
## data:  OJ and VC
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

With a p-value of 0.0606345, we do not reject H_o at an alpha level 0.05. Furthermore the 95% confidence interval of the two means includes 0, so this supports the result.

Test 2: OJ and VJ result in equal tooth length at 0.5mg/day dose.

Given m1 = mean len given OJ & m2 = mean len given VC

H_o: m1 = m2 vs. H_a: m1 > m2

OJ_5 <- data$len[data$supp=="Orange Juice" & data$dose==0.5]
VC_5 <- data$len[data$supp=="Ascorbic Acid" & data$dose==0.5] 

test2 <- t.test(OJ_5, VC_5, paired = FALSE, var.equal = FALSE)
print(test2)
## 
##  Welch Two Sample t-test
## 
## data:  OJ_5 and VC_5
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean of x mean of y 
##     13.23      7.98

With a p-value of 0.0063586, we reject H_o at an alpha level 0.05. Furthermore the 95% confidence interval of the two means does not includes 0, so this supports the result.

Test 3: OJ and VJ result in equal tooth length at 1 mg/day dose.

Given m1 = mean len given OJ & m2 = mean len given VC;

H_o: m1 = m2 vs. H_a: m1 > m2

OJ_1 <- data$len[data$supp=="Orange Juice" & data$dose==1]
VC_1 <- data$len[data$supp=="Ascorbic Acid" & data$dose==1] 

test3 <- t.test(OJ_1, VC_1, paired = FALSE, var.equal = FALSE)
print(test3)
## 
##  Welch Two Sample t-test
## 
## data:  OJ_1 and VC_1
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean of x mean of y 
##     22.70     16.77

With a p-value of 0.0063586, we reject H_o at an alpha level 0.05. Furthermore the 95% confidence interval of the two means does not includes 0, so this supports the result.

Test 4: OJ and VJ result in equal tooth length at 2 mg/day dose.

Given m1 = mean len given OJ & m2 = mean len given VC;

H_o: m1 = m2 vs. H_a: m1 > m2

OJ_2 <- data$len[data$supp=="Orange Juice" & data$dose==2]
VC_2 <- data$len[data$supp=="Ascorbic Acid" & data$dose==2] 

test4 <- t.test(OJ_2, VC_2, paired = FALSE, var.equal = FALSE)
print(test4)
## 
##  Welch Two Sample t-test
## 
## data:  OJ_2 and VC_2
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean of x mean of y 
##     26.06     26.14

With a p-value of 0.0063586, we do not reject H_o at an alpha level 0.05. Furthermore the 95% confidence interval of the two means includes 0, so this supports the result.


Conclusion

Given the following assumptions:

  1. The data iid normal and representative of the population.
  2. The data follows a t-distribution.
  3. Dose and supplement are the only variables affecting tooth length.

We conclude that across doses there is no significant difference between the two supplements; orange juice (OJ) and ascorbic acid (VC).

However, looking at each dose level, there is a significant difference between OJ and VC at dose levels 0.5 mg/day and 1 mg/day. In both cases, OJ yielded a higher mean tooth length.

At a dose level of 2 mg/day there is no significant difference between supplements. This may account for the failure to reject the Test 1 null hypothesis.