echo = TRUE
library(ggplot2)
This report deals with analysis of the ToothGrowth data in the R data sets package. The data is set of 60 observations, length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1 and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
Data ToothGrowth will be loaded and analysed here.
data(ToothGrowth)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
plot(ToothGrowth)
The structure of the data set suggests the dose variable is numeric, but in fact, we have seen that actually there are only three dose levels of Vitamin C (0.5, 1 and 2 mg). A more clear view of this, is the third row (or the third column, they are equivalent) of the scatter plot matrix, provided above. We will transform the variable dose to a factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
summary(ToothGrowth$dose)
## 0.5 1 2
## 20 20 20
So, the mean of the len variable is 18.8133, but this refers to all the 60 observations of the data set. The mean for the two levels of the supply method (supp) are described as below.
meansupp = split(ToothGrowth$len, ToothGrowth$supp)
sapply(meansupp, mean)
## OJ VC
## 20.66333 16.96333
ggplot(aes(x=supp, y=len), data=ToothGrowth) + geom_boxplot(aes(fill=supp))+
xlab("Supplement type") +ylab("Tooth length")
e will now check the effect of vitamin C dose on tooth length. So first we will calculate the means of the three levels of dosage.
meandose = split(ToothGrowth$len, ToothGrowth$dose)
sapply(meandose, mean)
## 0.5 1 2
## 10.605 19.735 26.100
ggplot(aes(x=dose, y=len), data=ToothGrowth) + geom_boxplot(aes(fill=dose)) +
xlab("Dose in miligrams") +ylab("Tooth length")
Can we tell if the tooth length of the guinea pigs is greater when the delivery methods is the orange juice compared to the tooth length when the delivery method is the ascorbic acid? To answer that question we will use the provided data and perform a statistical t test for the difference if the means of two independent groups.
len<-ToothGrowth$len
supp<-ToothGrowth$supp
dose<-ToothGrowth$dose
Let Ho be the null hypotheses of equal means between the two groups, versus the alternative hypothesis that the two means are different. This means that a two sided test will be performed, and so the confidence interval is absolutely equivalent to the t-test.
First we will check the variances to see if they are close enough to be assumed to be equal
sapply(meansupp, var)
## OJ VC
## 43.63344 68.32723
Variances of the two groups seem pretty far, so in the test we will not assume equality of variances.
t.test(len[supp=="OJ"], len[supp=="VC"], paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: len[supp == "OJ"] and len[supp == "VC"]
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
As we can see from the results provided the p-value of this test is 0.06, which at a significance level of 5%, we have not enough evidence to reject the null hypothesis, which subsequently means that we cannot assume that the means of delivery of the vitamin has en effect on tooth length.
The confidence interval of the test is: (-0.171, 7.571), so we can see that the confidence interval contains zero (0), as the test proved.
We will test of the tooth length of the group with vitamin C dosage We will test all three combinations, but let us start with the test whether the mean tooth length of the of the group with vitamin C dose of 2mg, is equal to the group with vitamin C dosage of 1mg.
Let Ho be the null hypotheses of equal means between the two groups, versus the alternative hypothesis that the two means are different. This means that a two sided test will be performed, and so the confidence interval is absolutely equivalent to the t-test.
t.test(len[dose==2], len[dose==1], paired = FALSE, var.equal = TRUE)
##
## Two Sample t-test
##
## data: len[dose == 2] and len[dose == 1]
## t = 4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.735613 8.994387
## sample estimates:
## mean of x mean of y
## 26.100 19.735
As we can see from the results provided the p-value of this test is 0, which at a significance level of 5%, we have enough evidence to reject the null hypothesis, which subsequently means that we can assume that the means of dosage change from 1mg to 2mg creates an positive effect on tooth length, tooth length increases.
The confidence interval of the test is: (3.7356, 8.9944), so we can see that the confidence interval does not contain zero (0), as the test proved.
We will go on with the test whether the mean tooth length of the of the group with vitamin C dose of 1mg, is equal to the group with vitamin C dosage of 0.5mg.
Let Ho be the null hypotheses of equal means between the two groups, versus the alternative hypothesis that the two means are different. This means that a two sided test will be performed, and so the confidence interval is absolutely equivalent to the t-test.
t.test(len[dose==1], len[dose==0.5], paired = FALSE, var.equal = TRUE)
##
## Two Sample t-test
##
## data: len[dose == 1] and len[dose == 0.5]
## t = 6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 6.276252 11.983748
## sample estimates:
## mean of x mean of y
## 19.735 10.605
As we can see from the results provided the p-value of this test is 0, which at a significance level of 5%, we have enough evidence to reject the null hypothesis, which subsequently means that we can assume that the means of dosage change from 1mg to 2mg creates an positive effect on tooth length, tooth length increases.
The confidence interval of the test is: (6.2763, 11.9837), so we can see that the confidence interval does not contain zero (0), as the test proved.
We thing that is totally redundant to check if the mean tooth length differs between the groups of vitamin C dosage of 0.5 and 2mg, so we will just mention the 95% confidence interval, to prove that the value of zero is not included. (12.8365, 18.1535)
Up to now we have made 4 tests, and thus we have calculated 4 different pvalues for those tests.
We will now check the multiple comparisons by adjusting for
# Store pvalues in a vector
pValues<-c(t.test(len[supp=="OJ"], len[supp=="VC"], paired = FALSE, var.equal = FALSE)$p.value,
t.test(len[dose==2], len[dose==1], paired = FALSE, var.equal = TRUE)$p.value,
t.test(len[dose==1], len[dose==0.5], paired = FALSE, var.equal = TRUE)$p.value,
t.test(len[dose==2], len[dose==0.5], paired = FALSE, var.equal = TRUE)$p.value)
# Controls FWER
sum(p.adjust(pValues, method = "bonferroni") < 0.05)
## [1] 3
round(p.adjust(pValues, method = "bonferroni"),3)
## [1] 0.243 0.000 0.000 0.000
# Controls FDR
sum(p.adjust(pValues, method = "BH") < 0.05)
## [1] 3
round(p.adjust(pValues, method = "BH"),3)
## [1] 0.061 0.000 0.000 0.000
We can see that only the first p-value is considered big, but that was logical as even in the 5% significance level it was not small enough to reject the null hypothesis, so in the plots below, it is the p-value of the first test we made that shows up in the top right corner of the plots, and the other three falling one on another at the bottom left corner.
par(mfrow = c(1, 2))
plot(pValues, p.adjust(pValues, method = "bonferroni"), pch = 19)
plot(pValues, p.adjust(pValues, method = "BH"), pch = 19)
The results were very logical and quite predictable I would say, as it was shown from the box plots at the beginning.