As the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.
useing help(ToothGrowth) can help us to have a look at the description of the data: The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).Lucky pigs!
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
data(ToothGrowth)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
ggplot(ToothGrowth, aes(x=dose, y=len, colour=supp)) +
geom_point(alpha=.5, size=5) +
scale_size_area() +
scale_colour_brewer(palette="Set1") +
stat_smooth(method=lm) +
theme(legend.position=c(1,0), legend.justification=c(1,0))+
ggtitle("Tooth length in relation to dose by supp") +
facet_grid(.~supp)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
ToothGrowth %>%
group_by(dose, supp) %>%
summarise(count=n(), mean=mean(len), sum=sum(len), S=sd(len), SE=S/sqrt(n()))
## Source: local data frame [6 x 7]
## Groups: dose
##
## dose supp count mean sum S SE
## 1 0.5 OJ 10 13.23 132.3 4.459709 1.4102837
## 2 0.5 VC 10 7.98 79.8 2.746634 0.8685620
## 3 1.0 OJ 10 22.70 227.0 3.910953 1.2367520
## 4 1.0 VC 10 16.77 167.7 2.515309 0.7954104
## 5 2.0 OJ 10 26.06 260.6 2.655058 0.8396031
## 6 2.0 VC 10 26.14 261.4 4.797731 1.5171757
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=supp)) +
geom_boxplot() +
theme(legend.position=c(1,0), legend.justification=c(1,0))+
ggtitle("Tooth length in relation to dose by supp")
From the plots above, it seems OJ has a better effect on teeth growth than VC, let’s make a test this hypothesis by hold a H0 as the mean of oj and vc is the same(u1=u2), so the alternative Ha is that the u1 != U2, let’s do the test step by step:
oj <- ToothGrowth %>% filter(supp=='OJ') %>% select(len)
vc <- ToothGrowth %>% filter(supp=='VC') %>% select(len)
x1 <- vc$len; x2 <- oj$len
n1 <-length(x1); n2 <-length(x2)
u1 <- mean(x1); u2 <- mean(x2)
var1 <- var(x1); var2 <- var(x2)
se <- sqrt((var1*(n1-1) + var2*(n2-1))/(n1+n2-2)*(1/n1+1/n2))
alpha <- .05
t <- (u2 - u1) / se
ci <- u1 + c(-1, 1) *se*qt(1-alpha/2, n1+n2-2)
p.value <- pt(t, n1+n2-2, lower.tail=F) * 2
t; p.value; ci; u1; u2
## [1] 1.915268
## [1] 0.06039337
## [1] 13.09633 20.83034
## [1] 16.96333
## [1] 20.66333
We can see that the OJ mean ‘20.66333’ is narrowly inside of our boundary of our 95% confidence interval [16.96333, 20.66333]. So we failed to reject the null hypothesis of no differnce in delivery method. The afore-mentioned steps can be done in R with just one step:
t.test(len ~ supp, data = ToothGrowth, paired = F, var.equal = T, alternative ="two.sided")
##
## Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1670064 7.5670064
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
So zero value is inside the ci region, we cannot rule out H0: no differnce hypothesis.
Let compare the dose effect:
t.test(len~dose, ToothGrowth, dose %in% c(1.0,0.5), paired = F, var.equal = T, alternative ="two.sided")
##
## Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983748 -6.276252
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
t.test(len~dose, ToothGrowth, dose %in% c(2.0,1.0), paired = F, var.equal = T, alternative ="two.sided")
##
## Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.994387 -3.735613
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
Both of the null hypothesis can be rejected for we got two very strong t stastic:-6.5, -12, which indicates that the increase of of dose level have a definite effect to the teeth growth.