The report aims to analyze the ToothGrowth data in the R datasets package.
library(knitr)
# Setting the global options.
opts_chunk$set(fig.width=6, fig.height=4, warning=FALSE)
The dataset has the following columns:
[1] len numeric Tooth length
[2] supp factor Supplement type
[3] dose numeric Dose in milligrams/day
library(datasets)
data(ToothGrowth)
# Print few rows from the data frame
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
# Lets check the outliers and general tooth length based on dosage.
# One of the simple way to show the relationships between multiple
# variable is using multiple box-plots.
# Lets use 2 crossed factors for grouping the dataset and colour
# for ease of interpretation.
boxplot(len~supp*dose, data=ToothGrowth,
notch=FALSE,
col = (c("blue","red")),
main = "Tooth Growth Dataset",
xlab = "Suppliment and Dose",
ylab = "Length")
suppressMessages(library(dplyr))
ToothGrowth %>% group_by(supp, dose) %>% summarise_each(funs(mean))
## Source: local data frame [6 x 3]
## Groups: supp
##
## supp dose len
## 1 OJ 0.5 13.23
## 2 OJ 1.0 22.70
## 3 OJ 2.0 26.06
## 4 VC 0.5 7.98
## 5 VC 1.0 16.77
## 6 VC 2.0 26.14
The boxplot shows that there are no outliers in the given dataset. The average tooth length for dosage from OJ supplier is higher w.r.t. VC supplier. Also, in general (irrespective of supplier) higher dose implies longer teeths for 0.5 and 1.0 dosages.
Included figures highlight the means we are comparing.
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
unique(ToothGrowth$supp)
## [1] VC OJ
## Levels: OJ VC
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0
Now let’s do a t-interval comparing doses. We’ll show the two intervals, one assuming that the variances are equal and one assuming otherwise.
# Split the dataset based on dose.
TGDose0p5 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
rbind(
t.test(len ~ dose, paired = FALSE, var.equal = TRUE, data = TGDose0p5)$conf,
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose0p5)$conf
)
## [,1] [,2]
## [1,] -11.98375 -6.276252
## [2,] -11.98378 -6.276219
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose0p5)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
TGDose1p0 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))
rbind(
t.test(len ~ dose, paired = FALSE, var.equal = TRUE, data = TGDose1p0)$conf,
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose1p0)$conf
)
## [,1] [,2]
## [1,] -8.994387 -3.735613
## [2,] -8.996481 -3.733519
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose1p0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
TGDose2p0 <- subset(ToothGrowth, dose %in% c(2.0, 0.5))
rbind(
t.test(len ~ dose, paired = FALSE, var.equal = TRUE, data = TGDose2p0)$conf,
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose2p0)$conf
)
## [,1] [,2]
## [1,] -18.15352 -12.83648
## [2,] -18.15617 -12.83383
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose2p0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
rbind(
t.test(len ~ supp, paired = FALSE, var.equal = TRUE, data = ToothGrowth)$conf,
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = ToothGrowth)$conf
)
## [,1] [,2]
## [1,] -0.1670064 7.567006
## [2,] -0.1710156 7.571016
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
If we repeatedly perform the experiment on independent samples, about 95% of the intervals that we obtain will contain the true mean differences that we are estimating. Refer the confidence interval results above.