Statistical Inference Course Project Part 2

Title

Examining Tooth Growth Data in R

Overview

One of the standard learning data sets included in R is the “ToothGrowth” data set. The tooth growth data set is the length of the odontoblasts (teeth) in each of 10 guinea pigs at three Vitamin C dosage levels (0.5, 1, and 2 mg) with two delivery methods (orange juice or ascorbic acid).

The file contains 60 observations of 3 variables

len : Tooth length
supp : Supplement type (VC or OJ)
dose : Dose in milligrams

Procedure

We set out to answer to do the following:

Load the ToothGrowth data and perform some basic exploratory data analyses
Provide a basic summary of the data.
Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)
State your conclusions and the assumptions needed for your conclusions.

Load the ToothGrowth data and perform some basic exploratory data analyses

Let’s first request a summary of the data:

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

We see from the summary data the minimum, and maximum tooth length, the minimum and maximum dosage, and that half of the cases received the dose via the VC method and the other half the OJ method.

Let’s plot what the data looks like where tooth length is a function of dosage, and color code for the type of supplement used

From the graph it appears that as as dosage increases, the tooth length also increases. It also appears that for lower dosages (0.5 and 1.0) OJ delivery leads to more tooth growth than VC delivery.

We need to use confidence intervals to investigate if that is actually true.

Provide a basic summary of the data

In this summary I will show the mean length (lenmean), length standard deviation (lensd), and the number of observations (count). We will summarize the data three ways.

First let’s examine by dosage and supplement type.

## Source: local data frame [6 x 5]
## Groups: supp
## 
##   supp dose lenmean    lensd count
## 1   OJ  0.5   13.23 4.459709    10
## 2   OJ  1.0   22.70 3.910953    10
## 3   OJ  2.0   26.06 2.655058    10
## 4   VC  0.5    7.98 2.746634    10
## 5   VC  1.0   16.77 2.515309    10
## 6   VC  2.0   26.14 4.797731    10

Second let’s summarize by supplement type only (ignoring dosage)

## Source: local data frame [2 x 4]
## 
##   supp  lenmean    lensd count
## 1   OJ 20.66333 6.605561    30
## 2   VC 16.96333 8.266029    30

Third let’s summarize by dosage only (ignoring supplement type)

## Source: local data frame [3 x 4]
## 
##   dose lenmean    lensd count
## 1  0.5  10.605 4.499763    20
## 2  1.0  19.735 4.415436    20
## 3  2.0  26.100 3.774150    20

It looks like vitamin C is related to tooth growth, generally. It also appears that OJ is a better supplement method than VC

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

For all confidence intervals, we will use a 95% confidence.

First let’s compare to see if there is a difference between OJ and VC at all dosage levels at once.

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

It appears that there is no difference between the two delivery methods when all the data is lumped together, since the confidence interval of the difference in means includes 0. The p-value here is 0.06, so it was very close to being significant. But close only counts in horseshoes and hand grenades.

Let’s subdivide the data and see if there’s a difference between OJ and VC at different dosage levels. We will see if there is a difference at the 0.5, 1.0, and 2.0 mg levels.

At the 0.5 mg dosage is there a difference between VC and OJ?

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

Yes, there is! The confidence interval does not include zero, so we can conclude that there is a significant difference between VC and OJ supplement methods at the 0.5 mg dosage.

At the 1.0 mg dosage is there a difference between VC and OJ?

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

Yes, there is! The confidence interval does not include zero, so we can conclude that there is a significant difference between VC and OJ supplement methods at the 1.0 mg dosage.

At the 2.0 mg dosage is there a difference between VC and OJ?

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

No, there is not! The confidence interval includes zero, so we can conclude that there is no significant difference between VC and OJ supplement methods at the 2.0 mg dosage.

Now let’s compare to see if there is a significant difference between dosage levels. We will examine OJ and VC seperately.

We will compare if a 0.5 mg dose via OJ is significant different than a 1.0 mg dose via OJ.

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.415634  -5.524366
## sample estimates:
## mean in group 0.5   mean in group 1 
##             13.23             22.70

The confidence interval does not include zero, so we conclude there is a significant difference between a 0.5mg dose and a 1.0 mg dose via OJ

We will compare if a 1.0 mg dose via OJ is significant different than a 2.0 mg dose via OJ.

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.5314425 -0.1885575
## sample estimates:
## mean in group 1 mean in group 2 
##           22.70           26.06

The confidence interval does not include zero, so we conclude there is a significant difference between a 1.0mg dose and a 2.0 mg dose via OJ

We will compare if a 0.5 mg dose via VC is significant different than a 1.0 mg dose via VC.

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.265712  -6.314288
## sample estimates:
## mean in group 0.5   mean in group 1 
##              7.98             16.77

The confidence interval does not include zero, so we conclude there is a significant difference between a 0.5mg dose and a 1.0 mg dose via VC

We will compare if a 1.0 mg dose via VC is significant different than a 2.0 mg dose via VC.

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.054267  -5.685733
## sample estimates:
## mean in group 1 mean in group 2 
##           16.77           26.14

The confidence interval does not include zero, so we conclude there is a significant difference between a 1.0mg dose and a 2.0 mg dose via VC

Conclusions and Assumptions

Based off this data we can conclude the following:

As dosage increases, tooth length increases regardless of supplement method.
At the 0.5 mg and 1.0 mg dosage the OJ supplement method leads to more tooth growth than the VC method.
At the 2.0 mg dosage, there is no significant difference between the OJ and VC supplement methods.

Assumptions

We assume that the measurements are not paired.
We do not assume that the variances are equal (var.equal=FALSE)
We assume the populations are independent, that there was no crossover between the subjects and dosage.
We assume that the guinea pigs were truly selected at random so no conflating factors influence the results.

Appendix of Code Run

# Load package for plotting, and for data analysis
library(ggplot2)
library(dplyr)
# Load data
data(ToothGrowth)
# Get data summary
summary(ToothGrowth)

# Plot the length (y) by the dosage (x)
g <- ggplot(ToothGrowth, aes(x= dose, y= len)) +
    geom_point(aes(color=supp))
print(g)

# Summarize by dose and supp, the mean length of growth.
a <- ToothGrowth %>% 
    group_by(supp,dose) %>%
    summarize(lenmean=mean(len), lensd=sd(len), count = n())
print(a)

# Summarize by supp only.
b <- ToothGrowth %>% 
    group_by(supp) %>%
    summarize(lenmean=mean(len), lensd=sd(len), count = n())
print(b)

# Summarize by dose only.
c <- ToothGrowth %>% 
    group_by(dose) %>%
    summarize(lenmean=mean(len), lensd=sd(len), count = n())
print(c)

# Compare OJ to VC at all dosage levels
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth)

# Compare low dosage OJ and VC
lowdose <- ToothGrowth[ToothGrowth$dose==0.5, ]
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=lowdose)

# Compare mid dosage OJ and VC
middose <- ToothGrowth[ToothGrowth$dose==1.0, ]
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=middose)

# Compare high dosage OJ and VC
highdose <- ToothGrowth[ToothGrowth$dose==2.0, ]
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=highdose)

# Compare 0.5 to 1.0 via OJ
OJlowtomid <- filter(ToothGrowth, dose < 2, supp=="OJ")
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=OJlowtomid)

# Compare 1.0 to 2.0 via OJ
OJmidtohigh <- filter(ToothGrowth, dose > 0.5, supp=="OJ")
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=OJmidtohigh)

# Compare 0.5 to 1.0 via VC
VClowtomid <- filter(ToothGrowth, dose < 2, supp=="VC")
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=VClowtomid)

# Compare 1.0 to 2.0 via VC
VCmidtohigh <- filter(ToothGrowth, dose > 0.5, supp=="VC")
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=VCmidtohigh)