In this project, we will be looking at the data set “ToothGrowth” from the R package. This data comes from a study of the effect of vitamin C on tooth growth in guinea pigs.
The response is the length of the teeth in each of 10 guinea pigs at each of three dose levels of Vitamin C(0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid)
We will summarize the data set and look at the confidence intervals and hypothesis tests to compare tooth growth by supp and dose.
Let’s load the data into R first and look at the summary.
data("ToothGrowth")
## changing the dose into factor
ToothGrowth$dose <- factor(ToothGrowth$dose)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
Before doing any sort of analysis and manipulation, let’s visualize the data and try to understand the features of the data.
Here we want to see how dose the length of teeth differ between each type of supplement across all levels of doses.
## Exploratory data anaylsis
require(ggplot2)
p1<- ggplot(ToothGrowth, aes(supp,len))+
geom_boxplot(aes(fill=supp))+
facet_grid(.~ dose)+
xlab("Supplement Type") + ylab("Length (mm)")+
ggtitle("Legth Vs. supplemnt type for each dose level")
p1
Looking at the box plot above, we can see that the tooth growth for OJ supplement at 0.5mg and 1.0mg has more observations with longer teeth as opposed to ascorbic acid (VC). However, at 2.0mg dose, VC supplement type has more observations with longer teeth.
now let’s look at the distribution of the tooth length in the data.
# Distribution of the Length
p2 <- plot(density(ToothGrowth$len), col="blue", xlab="Length (mm)", lwd=2,
main="Distribution of Tooth Length")
abline(v=mean(ToothGrowth$len),col="red", lty=2)
Looking at the density function, it seems that the length of a tooth is normally distributed. So we would be safe to assume that the observations are iid Normal random variables.
First let’s look at the confidence intervals for differences between the two supplement types. We will first subset the variables and group them accordingly. Then we will use t.test() to do the t-test.
require(dplyr)
g_OJ <- subset(ToothGrowth, ToothGrowth$supp=="OJ")
g_VC <- subset(ToothGrowth, ToothGrowth$supp=="VC")
## we want to compare the mean tooth length between treatments with OJ and VC.
test_supp <- t.test(g_OJ$len, g_VC$len, paired=T)
test_supp
##
## Paired t-test
##
## data: g_OJ$len and g_VC$len
## t = 3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.408659 5.991341
## sample estimates:
## mean of the differences
## 3.7
##rejection region for two sided test.
qt(0.975, 29) ## C-value
## [1] 2.04523
So the confidence interval for the difference between the mean tooth length of the two supplement types is given by [1.4086586, 5.9913414]. The mean of the differences is 3.7. Given the t test of 3.3(Which is greater than the critical value of 2.0452296), we can reject the Null Hypothesis that the true difference in means is equal to 0 for the two supplement type groups.
Let’s start by taking the subsets and grouping them into different doses.
## we want to compare the mean tooth length between treatments with different doses.
g_0.5 <- subset(ToothGrowth, ToothGrowth$dose==0.5)
g_1.0 <- subset(ToothGrowth, ToothGrowth$dose==1)
g_2.0 <- subset(ToothGrowth, ToothGrowth$dose==2)
test_dose_1 <- t.test(g_1.0$len,g_0.5$len,paired=T)
test_dose_1
##
## Paired t-test
##
## data: g_1.0$len and g_0.5$len
## t = 6.9669, df = 19, p-value = 1.225e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 6.387121 11.872879
## sample estimates:
## mean of the differences
## 9.13
CI_1 <- test_dose_1$conf.int
qt(0.975, 19) ## C-value
## [1] 2.093024
So the confidence interval for the difference between the mean tooth length for doses 0.5mg and 1.0mg is given by [6.3871212, 11.8728788]. The mean of the differences is 9.13. Given the t-statistic of 6.97 and the critical value of 2.093, we can reject the Null Hypothesis that the true difference in the average tooth length is equal to 0 for the dose level 0.5mg and 1.0mg .
test_dose_2 <- t.test(g_2.0$len,g_1.0$len, paired=T)
test_dose_2
##
## Paired t-test
##
## data: g_2.0$len and g_1.0$len
## t = 4.6046, df = 19, p-value = 0.0001934
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.471814 9.258186
## sample estimates:
## mean of the differences
## 6.365
CI_2 <- test_dose_2$conf.int
So the confidence interval for the difference between the mean tooth length for doses 1.0mg and 2.0mg is given by [3.4718143, 9.2581857]. The mean of the differences is 6.365. Given the t-statistic of 4.6046 and the critical value of 2.093, we can reject the Null Hypothesis that the true difference in the average tooth length is equal to 0 for the dose level 1.0mg and 2.0mg .
test_dose_3 <- t.test(g_2.0$len,g_0.5$len, paired=T)
test_dose_3
##
## Paired t-test
##
## data: g_2.0$len and g_0.5$len
## t = 11.291, df = 19, p-value = 7.19e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 12.6228 18.3672
## sample estimates:
## mean of the differences
## 15.495
CI_3 <- test_dose_3$conf.int
likewise, the confidence interval for the difference between the mean tooth length for doses 0.5mg and 2.0mg is given by [3.4718143, 9.2581857]. The mean of the differences is 6.365. Given the t-statistic of 11.291 and the critical value of 2.093, we can reject the Null Hypothesis that the true difference in the average tooth length is equal to 0 for the dose level 0.5mg and 2.0mg .
Assuming that the observations are iid Normal, the T-tests implies that true population mean tooth length are not equal for the supplement types OJ and VC. Similarly, the true population mean tooth length across three different dose levels are not equal as well.