Statistical Inference Part 2 Data Analysis for ToothGrowth

Synopsis

This project analyzes the ToothGrowth data in the R datasets package to explore and infer statistically the relationship between tooth growth and supplement and dose, using the following approaches:

1.Load the ToothGrowth data and perform some basic exploratory data analyses.
2.Provide a basic summary of the data.
3.Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.
4.State conclusions and assumptions.

1.Load the ToothGrowth data and perform some basic exploratory data analyses.

With reference to the R Documentation for ToothGrowth via help(ToothGrowth), the dataset consists of the response in the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (OJ for orange juice, or VC for ascorbic acid). Below is a summary of the data:

data(ToothGrowth)  # ToothGrowth dataset
summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

2.Provide a basic summary of the data.

At Figure 1, the boxplot suggests that orange juice is more effective in tooth growth than ascorbic acid generally. Cross-checking with the conditional plot (below the boxplot ), the average tooth length by dosage (red line) shows that the average tooth length is greater for doses of 0.5mg and 1mg of Vitamin C through orange juice. At 2 mg of Vitamin C, average tooth length for both orange juice and absorbic acid are the same.

Hence, at this exploratory stage, there seems to be a positive correlation between the tooth growth and the dosage, where orange juice is more effective for lower doses (0.5mg and 1mg, with no test data for 1.5mg) than ascobic acid and the 2mg may be be the maximum limit of both supplements.

3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

This section performs the confidence intervals and hypothesis test (Reference A.1.) via t-test by examining the impact of tooth growth by:
(a) Dosage Alone
(b) Supplement Alone
(c) Compare Supplement and Dosage

df<-data.frame(ToothGrowth)
names(df)[1]<-paste("Length")
names(df)[2]<-paste("Supplement")
names(df)[3]<-paste("Dose")

# Add 'Dosage': 0.5mg as Small Dosage, 1mg as Medium Dosage, 2mg as Large Dosage.
Dosage=sapply(as.character(df$Dose),function(x) as.factor(switch(x,'0.5'='SD','1'='MD','2'='LD')))
cbind(df,Dosage)

Test (a.1): Impact by Dosage Alone - Increase from 0.5mg to 1mg

set1<-subset(df,Dosage=='SD')$Length
set2<-subset(df,Dosage=='MD')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]

## [1] -11.983781  -6.276219

Result: When dosage is increased from 0.5mg to 1mg, this test shows that the confidence interval does not contain zero. Hence, we reject the null hypothesis that this dose increase does not increase tooth length.

Test (a.2): Impact by Dosage Alone - Increase from 1mg to 2mg

set1<-subset(df,Dosage=='MD')$Length
set2<-subset(df,Dosage=='LD')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]

## [1] -8.996481 -3.733519

Result: When the dosage is increased from 1mg to 2mg, the confidence interval does not contain zero. Hence, we reject the null hypothesis that this dose increase does not increase tooth length. This dosage-alone test shows the increased dosage leads to an increased tooth length.

Test (b.1): Impact by Supplement Alone - OJ vs VC

set1<-subset(df,Supplement=='VC')$Length
set2<-subset(df,Supplement=='OJ')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$p.value

## [1] 0.06063451

t$conf.int[1:2]

## [1] -7.5710156  0.1710156

Result: In this test, the p-value is greater than 0.5% and the confidence interval contains zero. Hence, we do not reject the null hypothesis. We are confident that the type of supplement alone does not affect tooth growth.

Test (c.1): Impact by Supplement and Dosage - Small Dosage of VC vs OJ

set1<-subset(df,Supplement=='VC' & Dosage=='SD')$Length
set2<-subset(df,Supplement=='OJ' & Dosage=='SD')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]

## [1] -8.780943 -1.719057

Result: The confidence interval does not contain zero. Hence, we reject the null hypothesis that small dosage of both supplement does not increase tooth length.

Test (c.2): Impact by Supplement and Dosage - Medium Dosage of VC vs OJ

set1<-subset(df,Supplement=='VC' & Dosage=='MD')$Length
set2<-subset(df,Supplement=='OJ' & Dosage=='MD')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]

## [1] -9.057852 -2.802148

Result: The confidence interval does not contain zero. Hence, we reject the null hypothesis that median dosage of both supplement does not increase tooth length.

Test (c.3): Impact by Supplement and Dosage - Large Dosage of VC vs OJ

set1<-subset(df,Supplement=='VC' & Dosage=='LD')$Length
set2<-subset(df,Supplement=='OJ' & Dosage=='LD')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]

## [1] -3.63807  3.79807

Result: The confidence interval contains zero. Hence, we fail to reject the null hypothesis that large dosage of both supplement does not increase tooth length.

4. State Conclusions and Assumptions.

The increase in dosage of either supplmenet will also increase the tooth length.
The Type of supplement alone does not affect tooth growth.
The supplement, orange juice (OJ), has greater impact to tooth growth than ascorbic acid (VC) for dosage at 0.5mg and 1mg.
However, When the dosage reaches 2mg, the impact by Orange Juice and ascorbic acid is similar.
The above conclusions assume that the data are not paired.

Appendix A - References

A.1. Confidence Intervals & Hypothesis Testing http://davidmlane.com/hyperstat/B15183.html

There is an extremely close relationship between confidence intervals and hypothesis testing. When a 95% confidence interval is constructed, all values in the interval are considered plausible values for the parameter being estimated. Values outside the interval are rejected as relatively implausible. If the value of the parameter specified by the null hypothesis is contained in the 95% interval then the null hypothesis cannot be rejected at the 0.05 level. If the value specified by the null hypothesis is not in the interval then the null hypothesis can be rejected at the 0.05 level. If a 99% confidence interval is constructed, then values outside the interval are rejected at the 0.01 level.

Appendix B - R Codes

The R codes for the figures plotted in this project:

# Plot Figure 1
# Boxplot
boxplot(len ~ supp * dose, data=ToothGrowth, 
        col=(c("orange","green")),
        main="Figure 1  
        Boxplot of Tooth Growth for 10 Guinea Pigs",
        xlab="Suppliment and Dose",ylab="Tooth Length")
legend('bottomright', c("OJ: Orange juice", "VC: Ascorbic acid"),
       fill = c("orange","green"),bty = "n")

# Conditional Plot
coplot(len ~ dose|supp, data=ToothGrowth, panel=panel.smooth, col=par("fg"),
       xlab="Dosage", ylab="Tooth Length", main="test")

Statistical Inference Part 2
Data Analysis for ToothGrowth

By CA

Jan 2015