author: Daria Alekseeva
In this report I present data analysis on ToothGrowth dataset from R library.
The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
ToothGrowth Format
A data frame with 60 observations on 3 variables.
[,1] len numeric Tooth length
[,2] supp factor Supplement type (VC or OJ).
[,3] dose numeric Dose in milligrams.
# load data
library(datasets)
data(ToothGrowth)
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
library(ggplot2)
ggplot(ToothGrowth, aes(x=factor(dose), y=len))+geom_boxplot()+facet_grid(~supp)+ggtitle("Analyzing ToothGrowth data")
On the plot we can see that teeth are longer with higher dose.
t.test(len ~ supp, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
Let’s create 3 sets for each dose.
a<-subset(ToothGrowth, dose==0.5)
b<-subset(ToothGrowth, dose==1.0)
c<-subset(ToothGrowth, dose==2.0)
Now let’s run hypothesis test on each of them.
t.test(len ~ supp, data=a, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
t.test(len ~ supp, data=b, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
t.test(len ~ supp, data=c, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
Null hypothesis #1: there is no difference on tooth length across OJ and VC.
Null hypothesis #2: there is no difference on tooth length with dose change.
The true mean has a probability of 95% of being in the interval between -0.17 and 7.57 assuming that the original random variable is normally distributed, and the samples are independent.
T-value is 1.91, p-value is 0.06, confidence interval contains zero so we fail to reject the null hypothesis #1. In other words, there is no effect from VC or OJ treatment itself.
Making conclusion about different doses we can say that for dose 0.5 and 1.0 there is a significant difference in means of VC and OJ groups is large. So we reject null hypothesis #2. With dose 2.0 it didn’t happen, mean differende in very low. We fail to reject null hypothesis #2.