In this report, we are analyzing the statistical aspects of ToothGrowth dataset. First we explore the basic properties of the data and then we do some exploratory analysis. At the end, we do some statistical inference tasks on the data by executing the t.test() function.
First we load the data and take the summary to see how the data looks like:
library(plyr)
library(datasets)
data("ToothGrowth")
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
As you can see, we have three variables. The first variable, len, is the length of odontoblasts( teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg), dose variable, with each of two delivery methods (orange juice or ascorbic acid), supp variable. We can look at the histogram of len variable:
hist(ToothGrowth$len,col = "green",xlab = "len")
First we want to calculate the mean of len variable with regard to supp and dose. To do that we use ddply function from plyr and then we plot the result. Please find the code in appendix, Code one.
As you can see in figure a, the average of length of teeth in those who expossed to OJ delivery method is much more than those who exposed to VC delivery method:
supp.growth
## supp growth
## 1 OJ 20.66333
## 2 VC 16.96333
On the other hand, figure b shows us that the average of teeth length increases when we expose the pigs to a higher dose level of vitamin C:
dose.growth
## dose growth
## 1 0.5 10.605
## 2 1.0 19.735
## 3 2.0 26.100
Moreover, if we group the len values with regard to both dose and supp, we would have six groups and figure c shows the average of each group:
supp.dose.growth
## supp dose growth
## 1 OJ 0.5 13.23
## 2 OJ 1.0 22.70
## 3 OJ 2.0 26.06
## 4 VC 0.5 7.98
## 5 VC 1.0 16.77
## 6 VC 2.0 26.14
If we execute t.test() function on each group from the previous paragraph( figure C), we can find the 95 percent confidence interval for the mean of each group. You can find the code in appendix, Code two.
## [,1] [,2]
## [1,] 10.039717 16.420283
## [2,] 19.902273 25.497727
## [3,] 24.160686 27.959314
## [4,] 6.015176 9.944824
## [5,] 14.970657 18.569343
## [6,] 22.707910 29.572090
We can compare different delivery methods and dose by conducting t.test() on the difference of means of each case. We just want to conduct that in five cases. You can find the code in apppendix, Code three.
## Group Pvalues
## 1 OJ vs VC 2.549842e-03
## 2 0.5 vs 1 1.225437e-06
## 3 1 vs 2 1.934186e-04
## 4 0.5 OJ vs 1 OJ 2.435140e-03
## 5 1 OJ vs 2 OJ 8.383912e-02
## 6 1 VC vs 2 VC 4.647951e-04
The Null hypothesis is that there is no difference in means of teeth length in different delivery methods or dosage.
Fist the teeth growth in those who exposed to higher dose level is more than those who deliver lower dose. Second, those who exposed to OJ method have shown more teeth growth than those who exposed to VC method.
The difference between VC-1( those who exposed to dose 1 and delivery method VC) and VC-2 is lower than the difference between OJ-1 and OJ-2. In the same way, by manipulating Code three, we can compare the differences between other cases too by comparing their p-values.
supp.growth=ddply(ToothGrowth,.(supp),summarise,growth=mean(len))
dose.growth=ddply(ToothGrowth,.(dose),summarise,growth=mean(len))
supp.dose.growth=ddply(ToothGrowth,.(supp,dose),summarize,growth=mean(len))
par(mfcol=c(3,1))
plot(x=supp.growth$supp,y=supp.growth$growth,xlab = "Supp",ylab = "Len",main="a")
plot(x=dose.growth$dose,y=dose.growth$growth,col="blue",xlab = "Dose",ylab ="Len",main = "b" , pch=16)
plot(x=factor(c("OJ-.5","OJ-1","OJ-2","VC-.5","VC-1","VC-2")),y=supp.dose.growth$growth,main="c",xlab="Supp-Doze")
OJ..5=ToothGrowth[ToothGrowth$supp=="OJ"&ToothGrowth$dose==.5,]
OJ.1=ToothGrowth[ToothGrowth$supp=="OJ"&ToothGrowth$dose==1,]
OJ.2=ToothGrowth[ToothGrowth$supp=="OJ"&ToothGrowth$dose==2,]
VC..5=ToothGrowth[ToothGrowth$supp=="VC"&ToothGrowth$dose==.5,]
VC.1=ToothGrowth[ToothGrowth$supp=="VC"&ToothGrowth$dose==1,]
VC.2=ToothGrowth[ToothGrowth$supp=="VC"&ToothGrowth$dose==2,]
rbind(
t.test(OJ..5$len)$conf,
t.test(OJ.1$len)$conf,
t.test(OJ.2$len)$conf,
t.test(VC..5$len)$conf,
t.test(VC.1$len)$conf,
t.test(VC.2$len)$conf
)
Groups=
Pvalues=data.frame(Group=c("OJ vs VC", "0.5 vs 1", "1 vs 2", "0.5 OJ vs 1 OJ", "1 OJ vs 2 OJ"), Pvalues=c(
t.test(len ~supp, data=ToothGrowth, paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="0.5"],
ToothGrowth$len[ToothGrowth$dose=="1"], paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="1"],
ToothGrowth$len[ToothGrowth$dose=="2"], paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="0.5" & ToothGrowth$supp=="OJ"],
ToothGrowth$len[ToothGrowth$dose=="1"& ToothGrowth$supp=="OJ"], paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="1"& ToothGrowth$supp=="OJ"],
ToothGrowth$len[ToothGrowth$dose=="2"& ToothGrowth$supp=="OJ"], paired=T)$p.value
))
Pvalues