Confidence Interval and Hypothesis Testing

Overview

In this project, we will be looking at the data set “ToothGrowth” from the R package. This data comes from a study of the effect of vitamin C on tooth growth in guinea pigs.

The response is the length of the teeth in each of 10 guinea pigs at each of three dose levels of Vitamin C(0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid)

We will summarize the data set and look at the confidence intervals and hypothesis tests to compare tooth growth by supp and dose.

Exploring the Data

Let’s load the data into R first and look at the summary.

data("ToothGrowth")
## changing the dose into factor
ToothGrowth$dose <- factor(ToothGrowth$dose)
summary(ToothGrowth)

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

Before doing any sort of analysis and manipulation, let’s visualize the data and try to understand the features of the data.

Box Plot: Length Vs. Supplement type

Here we want to see how dose the length of teeth differ between each type of supplement across all levels of doses.

## Exploratory data anaylsis

require(ggplot2)
p1<- ggplot(ToothGrowth, aes(supp,len))+
     geom_boxplot(aes(fill=supp))+
     facet_grid(.~ dose)+
     xlab("Supplement Type") + ylab("Length (mm)")+
     ggtitle("Legth Vs. supplemnt type for each dose level")

p1

Looking at the box plot above, we can see that the tooth growth for OJ supplement at 0.5mg and 1.0mg has more observations with longer teeth as opposed to ascorbic acid (VC). However, at 2.0mg dose, VC supplement type has more observations with longer teeth.

Density Function

now let’s look at the distribution of the tooth length in the data.

# Distribution of the Length
p2 <- plot(density(ToothGrowth$len), col="blue", xlab="Length (mm)", lwd=2,
           main="Distribution of Tooth Length")
abline(v=mean(ToothGrowth$len),col="red", lty=2)

Looking at the density function, it seems that the length of a tooth is normally distributed. So we would be safe to assume that the observations are iid Normal random variables.

95% Confident Intervals and Hypothesis Testing

Interpretation: if we were to repeatedly perform the experiment on independent samples, about 95% of the intervals we obtain would contain the true mean difference that we are estimating.

Average difference in tooth lengths between the two Supplement Types

First let’s look at the confidence intervals for differences between the two supplement types. We will first subset the variables and group them accordingly. Then we will use t.test() to do the t-test.

require(dplyr)
g_OJ <- subset(ToothGrowth, ToothGrowth$supp=="OJ")
g_VC <- subset(ToothGrowth, ToothGrowth$supp=="VC")

## we want to compare the mean tooth length between treatments with OJ and VC.
test_supp <- t.test(g_OJ$len, g_VC$len, paired=T)
test_supp

## 
##  Paired t-test
## 
## data:  g_OJ$len and g_VC$len
## t = 3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.408659 5.991341
## sample estimates:
## mean of the differences 
##                     3.7

##rejection region for two sided test.
qt(0.975, 29) ## C-value

## [1] 2.04523

So the confidence interval for the difference between the mean tooth length of the two supplement types is given by [1.4086586, 5.9913414]. The mean of the differences is 3.7. Given the t test of 3.3(Which is greater than the critical value of 2.0452296), we can reject the Null Hypothesis that the true difference in means is equal to 0 for the two supplement type groups.

Pair-wise average difference in tooth lengths among different dose levels

Let’s start by taking the subsets and grouping them into different doses.

## we want to compare the mean tooth length between treatments with different doses.
g_0.5 <- subset(ToothGrowth, ToothGrowth$dose==0.5)
g_1.0 <- subset(ToothGrowth, ToothGrowth$dose==1)
g_2.0 <- subset(ToothGrowth, ToothGrowth$dose==2)

0.5mg Vs. 1.0mg

test_dose_1 <- t.test(g_1.0$len,g_0.5$len,paired=T)
test_dose_1

## 
##  Paired t-test
## 
## data:  g_1.0$len and g_0.5$len
## t = 6.9669, df = 19, p-value = 1.225e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.387121 11.872879
## sample estimates:
## mean of the differences 
##                    9.13

CI_1 <- test_dose_1$conf.int

qt(0.975, 19) ## C-value

## [1] 2.093024

So the confidence interval for the difference between the mean tooth length for doses 0.5mg and 1.0mg is given by [6.3871212, 11.8728788]. The mean of the differences is 9.13. Given the t-statistic of 6.97 and the critical value of 2.093, we can reject the Null Hypothesis that the true difference in the average tooth length is equal to 0 for the dose level 0.5mg and 1.0mg .

1.0mg Vs. 2.0mg

test_dose_2 <- t.test(g_2.0$len,g_1.0$len, paired=T)
test_dose_2

## 
##  Paired t-test
## 
## data:  g_2.0$len and g_1.0$len
## t = 4.6046, df = 19, p-value = 0.0001934
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.471814 9.258186
## sample estimates:
## mean of the differences 
##                   6.365

CI_2 <- test_dose_2$conf.int

So the confidence interval for the difference between the mean tooth length for doses 1.0mg and 2.0mg is given by [3.4718143, 9.2581857]. The mean of the differences is 6.365. Given the t-statistic of 4.6046 and the critical value of 2.093, we can reject the Null Hypothesis that the true difference in the average tooth length is equal to 0 for the dose level 1.0mg and 2.0mg .

0.5mg Vs. 2.0mg

test_dose_3 <- t.test(g_2.0$len,g_0.5$len, paired=T)
test_dose_3

## 
##  Paired t-test
## 
## data:  g_2.0$len and g_0.5$len
## t = 11.291, df = 19, p-value = 7.19e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.6228 18.3672
## sample estimates:
## mean of the differences 
##                  15.495

CI_3 <- test_dose_3$conf.int

likewise, the confidence interval for the difference between the mean tooth length for doses 0.5mg and 2.0mg is given by [3.4718143, 9.2581857]. The mean of the differences is 6.365. Given the t-statistic of 11.291 and the critical value of 2.093, we can reject the Null Hypothesis that the true difference in the average tooth length is equal to 0 for the dose level 0.5mg and 2.0mg .

Conclusion

Assuming that the observations are iid Normal, the T-tests implies that true population mean tooth length are not equal for the supplement types OJ and VC. Similarly, the true population mean tooth length across three different dose levels are not equal as well.