StatInfProject Part2

Part 2 of Class Project

First Part: Load the ToothGrowth data and perform some basic exploratory data analyses.

So, here we will load in the dataset and check out the structure of the file.

library(datasets)
data(ToothGrowth)
head(ToothGrowth)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Looks like we are going to have to change the dose to be factor instead of number to we can do some plotting.

ToothGrowth$dose <- as.factor(ToothGrowth$dose)
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...

First, let’s just take a look at the data points when segmented by supplement method and dosage:

library(ggplot2)
qplot(len, fill=supp, facets=~supp~dose, data=ToothGrowth, binwidth=1)

Looking at the table above, I’d say that the higher dose of 2 milligrams is the most effective, but which one is better OJ or VC? We’ll need some confidence intervals to determine that. In any case, do the dosage amount matter? Let’s look at a box and whisker to compare dosage amounts.

ggplot(aes(x=dose, y=len), data=ToothGrowth) + 
geom_boxplot(aes(fill=dose)) + xlab("Miligrams") + ylab("Length")

Based on the box and whisker, definitely the more vitamin C the longer the teeth will grow.

Across the dosage amounts, which method seems to yield more teeth growth?

ggplot(aes(x=supp, y=len), data=ToothGrowth) + geom_boxplot(aes(fill=supp)) + xlab("Method") + ylab("Length")

Looks, like OJ might be the best delivery method across the 3 dosage levels, but we’ll need some CI’s to determine that for sure.

Part 2: Perform a basic summary of the data.

summary(ToothGrowth)

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

We have two supplement methods: OJ and Ascorbic Acid. And we have three dosage levels: .05, 1, and 2 milligrams. We want to know if the supplement method has any impact on teeth growth and if dosage levels do as well.

Part 3: Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

The first question is the vitamin C delivery method of OJ better than straight Ascorbic acid at growing guinue pig’s teeth. Let’s look at whether length is really a function of supplement method and compare equal variance test to unequal variance test. (Recall that the default confidence level for t.test is 95% in R, so I will not specify that in my t.tests below).

t.test(len~supp, data=ToothGrowth, paired=FALSE, var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1670064  7.5670064
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

t.test(len~supp, data=ToothGrowth, paired=FALSE, var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Okay, so the results of those are very similar, but it looks like since we have p-values greater than 5% and confidence intervals that contain 0, we cannot reject the null in favor of the alternative. The alternative would be that the difference in means is not zero, i.e., that there is difference between supplement methods.

Now let’s look at the dosage levels.

dose_05 <- ToothGrowth[ToothGrowth$dose==.5,1]
dose_1 <- ToothGrowth[ToothGrowth$dose==1,1]
dose_2 <- ToothGrowth[ToothGrowth$dose==2,1]

There is probably a better way to test all the combinations of doses, but I’m just going to slug it out this way and assume unequal variances:

dose_05to1 <- t.test(dose_05, dose_1, paired=FALSE, var.equal=FALSE)
dose_05to2 <- t.test(dose_05, dose_2, paired=FALSE, var.equal=FALSE)
dose_1to2 <- t.test(dose_1, dose_2, paired=FALSE, var.equal=FALSE)

No let’s look at the results:

## 
##  Welch Two Sample t-test
## 
## data:  dose_05 and dose_1
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean of x mean of y 
##    10.605    19.735

## 
##  Welch Two Sample t-test
## 
## data:  dose_05 and dose_2
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean of x mean of y 
##    10.605    26.100

## 
##  Welch Two Sample t-test
## 
## data:  dose_1 and dose_2
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean of x mean of y 
##    19.735    26.100

In all of these t.tests we have intervals that are negative, they do not contain zero, and they have small p-values. Therefore, we can accept the alternative hypothesis that the difference in means is not equal to 0.

Let’s look at the dosage levels in reverse order, comparing from larger dose to smaller dose.

dose_1to05 <- t.test(dose_1, dose_05, paired=FALSE, var.equal=FALSE)
dose_2to05 <- t.test(dose_2, dose_05, paired=FALSE, var.equal=FALSE)
dose_2to1 <- t.test(dose_2, dose_1, paired=FALSE, var.equal=FALSE)

And the results:

## 
##  Welch Two Sample t-test
## 
## data:  dose_1 and dose_05
## t = 6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.276219 11.983781
## sample estimates:
## mean of x mean of y 
##    19.735    10.605

## 
##  Welch Two Sample t-test
## 
## data:  dose_2 and dose_05
## t = 11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.83383 18.15617
## sample estimates:
## mean of x mean of y 
##    26.100    10.605

## 
##  Welch Two Sample t-test
## 
## data:  dose_2 and dose_1
## t = 4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.733519 8.996481
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

In all of these results, the p-value is very small, and the intervals do not contain zero. Therefore, we can reject the null in favor of the alternative that the difference in means is not equal to zero.

Conclusions and Assumptions

My conclusion is that we cannot say that there is difference between delivery methods (OJ vs VC) on teeth growth. However, we can say that there is a difference between dosage levels on teeth growth and that the higher the dose, the longer the teeth will grow. My assumptions for the t.test for dosage length I assumed unequal variances.