Tooth Growth Analysis using Confidence Intervals and Random Normal Distributions

Tooth Growth Data Analysis

Now in the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.

library(datasets)
library(ggplot2)

#Perform exploratory analysis of the dataset to better understand its contents
summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

unique(ToothGrowth$dose)

## [1] 0.5 1.0 2.0

Initial observation shows us that there are two types of supplement and three sizes of dose that affect the tooth length. Dosage clearly appears to be a factor, but the relationship between supplement and length is less clear.

# Initial plot of ToothGrowth data
ggplot(aes(x=dose, y = len), data = ToothGrowth) + geom_point(aes(color = supp))

The data is overlapping and difficult to determine based on the plot above, so let’s look at either factor individually.

# Tooth growth by supplement and dose
ggplot(aes(x=supp, y=len), data= ToothGrowth) + geom_boxplot(aes(fill=supp)) + facet_wrap(~dose)

It appears now that lower doses have a high discrepency between supplements, but at the higher dosage of 2.0, the supplements results are comparable.

Confidence Interval Testing

We can use confidence intervals and/ hypothesis tests to compare tooth growth by supplement and dose. We will start by calculating the mean and standard deviation for length per each combination.

library(plyr)
tooth_means <- ddply(ToothGrowth, .(dose, supp), summarize, mean=mean(len), sd=sd(len))
print(tooth_means)

##   dose supp  mean       sd
## 1  0.5   OJ 13.23 4.459709
## 2  0.5   VC  7.98 2.746634
## 3  1.0   OJ 22.70 3.910953
## 4  1.0   VC 16.77 2.515309
## 5  2.0   OJ 26.06 2.655058
## 6  2.0   VC 26.14 4.797731

Our dataset has 10 samples for each variant, so we will determine the confidence interval for each.

OJ5error <- qt(0.975,df=9)*4.459709/sqrt(10)
OJ5left <- 13.23-OJ5error
OJ5right <- 13.23+OJ5error

VC5error <- qt(0.975,df=9)*2.746634/sqrt(10)
VC5left <- 7.98-VC5error
VC5right <- 7.98+VC5error

OJ1error <- qt(0.975,df=9)*3.910953/sqrt(10)
OJ1left <- 22.7-OJ1error
OJ1right <- 22.7+OJ1error

VC1error <- qt(0.975,df=9)*2.515309/sqrt(10)
VC1left <- 16.77-VC1error
VC1right <- 16.77+VC1error

OJ2error <- qt(0.975,df=9)*2.655058/sqrt(10)
OJ2left <- 26.06-OJ2error
OJ2right <- 26.06+OJ2error

VC2error <- qt(0.975,df=9)*4.797731/sqrt(10)
VC2left <- 26.14-VC2error
VC2right <- 26.14+VC2error

Now lets compare the normal distributions of each supplement per dose using a thousand random samples utilizing the respective mean and standard deviation and plot the 95% confidence interval.

OJ_half <- rnorm(1000, 13.23, 4.459709)
VC_half <- rnorm(1000, 7.98, 2.746634)
OJ_1 <- rnorm(1000, 22.70, 3.910953)
VC_1 <- rnorm(1000, 16.77, 2.515309)
OJ_2 <- rnorm(1000, 26.06, 2.655058)
VC_2 <- rnorm(1000, 26.14, 4.797731)

par(mfrow=c(3, 2))
hist(OJ_half, col="red", breaks=40)
abline(v = OJ5left, col="green")
abline(v = OJ5right, col="green")

hist(VC_half, col="blue", breaks=40)
abline(v = VC5left, col="green")
abline(v = VC5right, col="green")

hist(OJ_1, col="red", breaks=40)
abline(v = OJ1left, col="green")
abline(v = OJ1right, col="green")

hist(VC_1, col="blue", breaks=40)
abline(v = VC1left, col="green")
abline(v = VC1right, col="green")

hist(OJ_2, col="red", breaks=40)
abline(v = OJ2left, col="green")
abline(v = OJ2right, col="green")

hist(VC_2, col="blue", breaks=40)
abline(v = VC2left, col="green")
abline(v = VC2right, col="green")

The normal distributions of each combination of dose and supplement from a thousand samples of each give us generally Gaussian distributions. The true mean will have a 95% chance of falling between the two green lines for each plot. From this we can gather that 95% of test subjects would fall in the following ranges.
OJ .5: 10.04 to 16.42
VC .5: 6.02 to 9.94
OJ 1: 19.9 to 25.5
VC 1: 14.97 to 18.57
OJ 2: 24.16 to 27.96
VC 2: 22.71 to 29.57

Conclusion

We can see clearly that OJ and VC are most similar, and in fact nearly identical, for the dosage of 2.0, whereas the overlap ismuch smaller for dosages .5 and 1. This still doesn’t indicate clearly whether the dosage and supplement are independent of each other or correlated. Right now, the data indicates that high dosage of either supplement is beneficial for tooth growth, whereas low dosage is more beneficial with supplement OJ. Our hypothesis then is that the supplement does not impact the tooth growth but the dosage does.

Assumptions:

The test population was randomly selected and independent.
The test subjects were all of similar health and would otherwise have experienced similar tooth growth.
Measurements were unbiased and performed randomly.

Tooth Growth Analysis using Confidence Intervals and Random Normal Distributions

Denver Durham

11/17/2016

Tooth Growth Data Analysis

Confidence Interval Testing

Conclusion

Assumptions: