Overview

A inferential analysis of the effect of Vitamin C on tooth growth in guinea pigs, as recorded in the R ToothGrowth dataset. According to the R data documentation, “The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).”

Our analysis will examine the ToothGrowth data, objectives as follows:

  1. Load the ToothGrowth data and perform basic exploratory data analysis.
  2. Provide a basic summary of the data.
  3. Use confidence intervals and / or hypothesis tests to compare tooth growth by supp and dose.
  4. Statement of conclusions and the necessary supporting assumptions.

1. Load R Packages, Data and Perform Exploratory Data Analysis

library(datasets)
library(ggplot2)
data("ToothGrowth")

Structure of Dataset

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Summary of Dataset

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

First Six Rows of Dataset

head(ToothGrowth)

2. Basic Summary of Data

Exploratory analysis confirms there are 60 observations of 3 variables: len (tooth length), supp (supplement type), and dose (dose in mg per day).

Observation of dataset structure reveals that the dose variable is numeric.

First task is to convert from numeric to factor, to facilitate plotting.

ToothGrowth$dose <- as.factor(ToothGrowth$dose)

Visualize tooth growth per delivery method and respective doseage:

ggplot(data=ToothGrowth, aes(x=supp, y=len)) +
  geom_boxplot(aes(fill=supp)) + xlab("Supplement Delivery Method") +
  ylab("Tooth Length") + facet_grid(~ dose) +
  ggtitle("Tooth Length per Delivery Method and Dose Amount (mg/day)") + 
  labs(fill='Supplement') +     
  theme(plot.title = element_text(lineheight=.5, hjust=0.5, face="bold"))

Visual inspection of the above plot suggests an apparent correlation between increased tooth growth when Vitamin C is administered by Orange Juice instead of Ascorbic Acid, until the doseage reaches 2 mg/day at which point the mean tooth growth is equivalent among the two delivery methods. Notable is that at the 2 mg/day doseage, the data indicates an increase in range of growth attributed to Ascorbic Acid.

Can we rely solely on visually interesting data representation? Let’s put these statistics to the test!

3. Confidence Intervals and / or Hypothesis Tests: Tooth Growth by Supp vs Dose

Null hypothesis: tooth growth is influenced by delivery method: OJ vs VC.

The alternative hypothesis is that the true difference in means per supp is < 0.

First we’ll use Gossett’s t-test to evaluate tooth growth by supplement.

t.test(len ~ supp, data = ToothGrowth, alternative = 'less')
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.9697
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf 6.931731
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Resulting sample estimates show OJ: mean 20.66333 vs. VC: mean 16.96333.

Next, Gossett’s t-test to evaluate tooth growth by dose: 2mg/day vs. 1mg/day.

len <- ToothGrowth$len
dose <- ToothGrowth$dose
t.test(len[dose==2], len[dose==1], paired = FALSE, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  len[dose == 2] and len[dose == 1]
## t = 4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.735613 8.994387
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

Null hypothesis: tooth growth is NOT influenced by dose: 2/mg day vs. 1/mg day.

The alternative hypothesis is that true difference in means is not equal to 0.

Resulting sample estimates show mean of 2mg/day 26.100 vs mean of 1mg/day 19.7353 with p-value = 1.811e-05 and 95 percent confidence interval: 3.735613 8.9943.

Assumptions and Conclusions

  1. Random assignment of guinea pigs to supplement type and dose levels
  2. The sample population of guinea pigs represent overall guinea pig population

When measuring tooth growth by delivery method (OJ vs. VC), OJ: mean 20.66333 vs. VC: mean 16.96333 at p-value = 0.9697 and 95 percent confidence interval: -Inf 6.931731. Based on the high p-value, significantly above 5%, and the low confidence interval, we fail to reject the null hypothesis that delivery method influences tooth growth. These results are within the range of normal chance variability.

When measuring tooth growth by doseage (2mg/day vs. 1mg/day), mean of 2mg/day 26.100 vs mean of 1mg/day 19.7353 with p-value = 1.811e-05 and 95 percent confidence interval: 3.735613 8.9943. Because the p-value of this test is 0 and the confidence interval is > 0, we can reject the null hypothesis and accept that dosage change from 1mg/day to 2mg/day positively influences tooth length.