This report analyzes the ToothGrowth dataset.
The below code loads the ToothGrowth data.
library(datasets)
data(ToothGrowth)
The below code plots boxplots for the data set using ggplot.
library(ggplot2)
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose)))+geom_boxplot()+facet_grid(.~supp)+ggtitle("Analyzing ToothGrowth data")
This shows that longer teeth tend to use a higher dose.
The code below provides a summary of the data set. It shows 2 supplements OJ and VC and 3 doses 0.5, 1, and 2 for in the dataset.
summary(ToothGrowth)
## len supp dose
## Min. : 4.2 OJ:30 Min. :0.50
## 1st Qu.:13.1 VC:30 1st Qu.:0.50
## Median :19.2 Median :1.00
## Mean :18.8 Mean :1.17
## 3rd Qu.:25.3 3rd Qu.:2.00
## Max. :33.9 Max. :2.00
Confidence intervals use the formula yBar -xBar + c(-1,1)tsqrt(xVar/30 + yVar/30), where 30 is the number of rows that are being taken at a time, yBar is the mean of the second half, xBar is the mean of the first half, t is the quantile, xVar is the variance of the first half, and yVar is the variance of the second half.
xBar<-mean(ToothGrowth$len[1:30])
yBar<-mean(ToothGrowth$len[31:60])
xVar<-(sd(ToothGrowth$len[1:30]))^2
yVar<-(sd(ToothGrowth$len[31:60]))^2
q<-(((xVar+yVar)/30)^2)/((((xVar/30)^2)+((yVar/30)^2))/29)
t<-qt(0.975, q)
yBar -xBar + c(-1,1)*t*sqrt(xVar/30 + yVar/30)
## [1] -0.171 7.571
t.test(len~supp, data=ToothGrowth, paired=FALSE)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.915, df = 55.31, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.171 7.571
## sample estimates:
## mean in group OJ mean in group VC
## 20.66 16.96
The below code splits the data set into 3 datasets, one for each of the doses. The hypothesis test is then performed on all 3 data sets (dose values 0.5, 1.0 and 2.0).
a<-subset(ToothGrowth, dose==0.5)
b<-subset(ToothGrowth, dose==1.0)
c<-subset(ToothGrowth, dose==2.0)
t.test(len~supp, data=a, paired=FALSE)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.17, df = 14.97, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719 8.781
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
t.test(len~supp, data=b, paired=FALSE)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.033, df = 15.36, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802 9.058
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
t.test(len~supp, data=c, paired=FALSE)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.798 3.638
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
Through the boxplot, it can be concluded that as tooth size increases, the doses tend to be higher. The confidence interval is (-0.171, 7.571). The hypothesis test has been performed taking paired as FALSE.