Exploratory Data Analysis Utilizing ToothGrowth Data Set in R

Data Analysis Series: Statistical Inference

Courtney D. Shelley

August 22, 2014

Purpose

The purpose of this assignment is to demonstrate an exploratory data analysis suitable for presentation. Data utilized is the ToothGrowth dataset in the R datasets package.

library(datasets)
data = ToothGrowth
observations <- nrow(data)
variables <- ncol(data)
names <- names(data)

Description

Effects of Vitamin C on tooth growth in guinea pigs was studied using 10 guinea pigs each receiving three dose levels of Vitamin C: 0.5, 1, and 2 mg, with each of two delivery methods: orange juice (OJ) and ascorbic acid (VC). The resulting dataset consists of 60 observations on 3 variables: tooth length, supplement type (VC or OJ), and dose in milligrams.

Exploratory Analysis

low<-subset(data, data$dose==0.5)
med<-subset(data, data$dose==1.0)
high<-subset(data, data$dose==2.0)

VC<-subset(data, data$supp=="VC")
OJ<-subset(data, data$supp=="OJ")


c(min(low$len), max(low$len), mean(low$len))    # Low dosage min, max, mean
## [1]  4.20 21.50 10.61
c(min(med$len), max(med$len), mean(med$len))    # Med dosage min, max, mean
## [1] 13.60 27.30 19.73
c(min(high$len), max(high$len), mean(high$len)) # High dosage min, max, mean
## [1] 18.5 33.9 26.1
c(min(VC$len), max(VC$len), mean(VC$len))       # Acid dosage min, max, mean
## [1]  4.20 33.90 16.96
c(min(OJ$len), max(OJ$len), mean(OJ$len))       # Juice dosage min, max, mean
## [1]  8.20 30.90 20.66
t.test(VC$len,OJ$len, paired=FALSE)$conf
## [1] -7.571  0.171
## attr(,"conf.level")
## [1] 0.95
par(mfrow=c(1,2))
plot(density(low$len), col="blue", xlim=c(0,40), ylim=c(0,0.15), lwd=2, main="Dosage Amount", 
     xlab="Tooth Length(mm)")
lines(density(med$len), col="red", lwd=2)
lines(density(high$len), col="green", lwd=2)
legend("topright", pch=16, col=c("blue", "red", "green"), 
       legend=c("Low (0.5ml)", "Med (1.0ml)", "High (2.0ml"))

plot(density(OJ$len), col="blue", lwd=2, main="Delivery Method", 
     xlab="Tooth Length (mm)") 
lines(density(VC$len), col="red", lwd=2)
legend("topright", pch = 16, col = c("blue", "red"), 
       legend= c("Orange Juice", "Ascorbic Acid"))

plot of chunk dataSummary A difference in tooth growth is observed between delivery methods of Vitamin C. Guinea pigs receiving orange juice supplements had tooth growth ranging from 8.20mm to 30.90mm with a mean growth length of 16.96mm. Guinea pigs receiving ascorbic acid supplements had tooth growth ranging from 4.20mm to 33.90mm, with a mean of 16.96. A Gosset’s t-Test does not support a true difference in means between the two groups, as demonstrated by the confidence interval spanning 0. Thus, we must dig deeper into differences caused by dosage level intervals.

anova<-aov(data[,1]~as.factor(data[,2])*as.factor(data[,3]))
summary(anova)
##                                           Df Sum Sq Mean Sq F value
## as.factor(data[, 2])                       1    205     205   15.57
## as.factor(data[, 3])                       2   2426    1213   92.00
## as.factor(data[, 2]):as.factor(data[, 3])  2    108      54    4.11
## Residuals                                 54    712      13        
##                                            Pr(>F)    
## as.factor(data[, 2])                      0.00023 ***
## as.factor(data[, 3])                      < 2e-16 ***
## as.factor(data[, 2]):as.factor(data[, 3]) 0.02186 *  
## Residuals                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(1,2))
lowVC <- subset(VC, VC[,3]==0.5)
medVC <- subset(VC, VC[,3]==1.0)
highVC <- subset(VC, VC[,3]==2.0)
plot(density(highVC$len), col="blue", lwd=2, xlim=c(0,40), ylim=c(0,0.25), main="Ascorbic Acid Dosage Amount", 
     xlab="Tooth Length (mm)") 
lines(density(medVC$len), col="red", lwd=2)
lines(density(lowVC$len), col="green", lwd=2)
legend("topright", pch = 16, col = c("blue", "red", "green"), 
       legend= c("High (2.0ml)", "Med (1.0ml)", "Low (0.5ml)"))

lowOJ <- subset(OJ, OJ[,3]==0.5)
medOJ <- subset(OJ, OJ[,3]==1.0)
highOJ <- subset(OJ, OJ[,3]==2.0)
plot(density(highOJ$len), col="blue", lwd=2, xlim=c(0,40), ylim=c(0,0.25), main="Orange Juice Dosage", 
     xlab="Tooth Length (mm)") 
lines(density(medOJ$len), col="red", lwd=2)
lines(density(lowOJ$len), col="green", lwd=2)
legend("topright", pch = 16, col = c("blue", "red", "green"), 
       legend= c("High (2.0ml)", "Med (1.0ml)", "Low (0.5ml)"))

plot of chunk anova

Analysis of variance demonstrates that supplement type significantly predicts tooth growth outcome (p-value = 0+), as does dosage amount (p-value = 0+). Interactions between supplement type and dosage amount are weakly significant (p-value = 0.0219).

Null Hypothesis Significance Testing to Verify Analysis of Variance

To verify the above ANOVA conclusions, confidence intervals were fitted about factor level means.

VC.lm<-lm(VC$len~VC$dose)
summary(VC.lm)
## 
## Call:
## lm(formula = VC$len ~ VC$dose)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.226 -2.603  0.081  2.229  7.489 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     3.29       1.43    2.31    0.029 *  
## VC$dose        11.72       1.08   10.86  1.5e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.68 on 28 degrees of freedom
## Multiple R-squared:  0.808,  Adjusted R-squared:  0.801 
## F-statistic:  118 on 1 and 28 DF,  p-value: 1.51e-11
t.test<-10.860
t.star<-qt(0.975,28)
if(t.test<t.star) print("Conclude H_0, no dosage level effect") 
if(t.test>t.star) print("Conclude H_a: Significant dosage level effect")
## [1] "Conclude H_a: Significant dosage level effect"
p.value <- pt(-abs(t.test), 28, lower.tail=TRUE)

OJ.lm<-lm(OJ$len~OJ$dose)
summary(OJ.lm)
## 
## Call:
## lm(formula = OJ$len ~ OJ$dose)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.256 -3.798 -0.064  3.352  7.939 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    11.55       1.72    6.71  2.8e-07 ***
## OJ$dose         7.81       1.30    6.00  1.8e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.45 on 28 degrees of freedom
## Multiple R-squared:  0.563,  Adjusted R-squared:  0.547 
## F-statistic:   36 on 1 and 28 DF,  p-value: 1.82e-06
t.test<-6.001
t.star<-qt(0.975,28)
if(t.test<t.star) print("Conclude H_0, no dosage level effect") 
if(t.test>t.star) print("Conclude H_a: Significant dosage level effect")
## [1] "Conclude H_a: Significant dosage level effect"
p.value <- pt(-abs(t.test), 28, lower.tail=TRUE)
t.test(low$len, med$len, paired=FALSE)$conf
## [1] -11.984  -6.276
## attr(,"conf.level")
## [1] 0.95
t.test(low$len, high$len, paired=FALSE)$conf
## [1] -18.16 -12.83
## attr(,"conf.level")
## [1] 0.95
t.test(med$len, high$len, paired=FALSE)$conf
## [1] -8.996 -3.734
## attr(,"conf.level")
## [1] 0.95

Conclusions and Assumptions

While tooth growth attributable to differing treatment types (orange juice vs. ascorbic acid) appears to differ significantly, a simple Gosset’s t-Test does not support this conclusion. Separating data by dosage amount and retesting does show significant differences in tooth growth attributable to dosage amounts. Reanalysis favoring dosage amounts over treatment type is recommended.