We will analyze the ToothGrowth data in the R datasets package. As part of the analysis, we will
Perform basic exploratory analyses
Summarize the data
Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.
library(datasets)
data("ToothGrowth")
varlen <- length(ToothGrowth)
colnames <- colnames(ToothGrowth)
obsv <- length(ToothGrowth$supp)
var1 <- colnames(ToothGrowth)[1]
var2 <- colnames(ToothGrowth)[2]
var3 <- colnames(ToothGrowth)[3]
unisupp <- unique(ToothGrowth$supp)
unidose <-unique(ToothGrowth$dose)
The dataset defines the Effect of Vitamin C on Tooth Growth in 10 Guinea Pigs.
There are 3 variables - len, supp, dose - with 60 observations.
The first variable - len - defines the length of the teeth(odontoblasts) and has numeric values.
The second variable - supp - defines the supplement delivery method and has two factors - VC, OJ
The third variable - dose - defines the dosage and has numeric values with 3 levels - 0.5, 1, 2
Relationship in the dataset can be summarized -
knitr::kable(summary(ToothGrowth))
| len | supp | dose | |
|---|---|---|---|
| Min. : 4.20 | OJ:30 | Min. :0.500 | |
| 1st Qu.:13.07 | VC:30 | 1st Qu.:0.500 | |
| Median :19.25 | NA | Median :1.000 | |
| Mean :18.81 | NA | Mean :1.167 | |
| 3rd Qu.:25.27 | NA | 3rd Qu.:2.000 | |
| Max. :33.90 | NA | Max. :2.000 |
We will explore the relationship between the Dosage and Tooth Length for each of the supplement type.
library(ggplot2)
qplot(dose,len,data=ToothGrowth, facets=~supp, main="Tooth growth by Dosage",xlab="Dosage Levels", ylab="Tooth length") + geom_boxplot(aes(fill = factor(dose)))
From the plot “Tooth growth by Dosage”, we see that the tooth length has a positive correlation with the dosage levels.
In the next plot, We will explore the relationship between the Supplement delivery type and Tooth Length.
library(ggplot2)
qplot(supp,len,data=ToothGrowth, main="Tooth growth by Supplement Delivery",xlab="Supplement Delivery Type", ylab="Tooth length") + geom_boxplot(aes(fill = factor(supp)))
From the plot “Tooth growth by Supplement Delivery”, we see that the Supplement delivery type - OJ - has led to higher tooth growth length than type VC. Hence we can say that supplement delivery type - OJ - is more effective than type VC
VC supplement is the first 30 records while OJ is second 30 records. So we will seperate them out. Also they are not paired and do not have equal variance.
vcgrp <- ToothGrowth$len[1:30]
ojgrp <- ToothGrowth$len[31:60]
conflevel <- t.test(vcgrp,ojgrp,paired = FALSE, var.equal = FALSE)$conf
pval <- t.test(vcgrp,ojgrp,paired = FALSE, var.equal = FALSE)$p.value
t.test(vcgrp,ojgrp,paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: vcgrp and ojgrp
## t = -1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7.5710156 0.1710156
## sample estimates:
## mean of x mean of y
## 16.96333 20.66333
From the above test with p-value of 0.0606345, the confidence level -7.5710156, 0.1710156 does not allow us to reject the null hypothesis.
We will conduct T tests between the 3 dosage levels 0.5, 1, 2 by considering 2 dosages in a test.
dosecomp1 <- ToothGrowth$len[ToothGrowth$dose == 0.5]
dosecomp2 <- ToothGrowth$len[ToothGrowth$dose == 1.0]
dosecomp3 <- ToothGrowth$len[ToothGrowth$dose == 2.0]
T Confidence test for dosecomp1 and dosecomp2 (dosages 0.5 and 1.0)
conflevel1 <- t.test(dosecomp1, dosecomp2,paired = FALSE, var.equal = FALSE)$conf
pval1 <- t.test(dosecomp1, dosecomp2,paired = FALSE, var.equal = FALSE)$p.value
t.test(dosecomp1, dosecomp2,paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: dosecomp1 and dosecomp2
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean of x mean of y
## 10.605 19.735
T Confidence test for dosecomp2 and dosecomp3 (dosages 1.0 and 2.0)
conflevel2 <- t.test(dosecomp2, dosecomp3,paired = FALSE, var.equal = FALSE)$conf
pval2 <- t.test(dosecomp2, dosecomp3,paired = FALSE, var.equal = FALSE)$p.value
t.test(dosecomp2, dosecomp3,paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: dosecomp2 and dosecomp3
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean of x mean of y
## 19.735 26.100
T Confidence test for dosecomp3 and dosecomp1 (dosages 2.0 and 0.5)
conflevel3 <- t.test(dosecomp3, dosecomp1,paired = FALSE, var.equal = FALSE)$conf
pval3 <- t.test(dosecomp3, dosecomp1,paired = FALSE, var.equal = FALSE)$p.value
t.test(dosecomp3, dosecomp1,paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: dosecomp3 and dosecomp1
## t = 11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 12.83383 18.15617
## sample estimates:
## mean of x mean of y
## 26.100 10.605
From the Tconfidence tests performed between 3 groups of dosage comparison,
for Dosages 0.5 and 1.0, the confidence level is -11.9837813, -6.2762187 and p-value is \(1.2683007\times 10^{-7}\)
for Dosages 1.0 and 2.0, the confidence level is -8.9964805, -3.7335195 and p-value is \(1.9064295\times 10^{-5}\)
for Dosages 2.0 and 0.5, the confidence level is 12.8338335, 18.1561665 and p-value is \(4.397525\times 10^{-14}\)
So it can be said that the null hypothesis can be rejected for the above T confidence tests.
It is assumed that each guinea pig had a 3 dose levels of each supplement delivery method.
The guinea pig populations are identical and independent for the test.
It is a random population sample of 10 pigs.
There is a positive correlation between Tooth length and dosage levels.
Supplement delivery type OJ led to longer teeth than delivery type VC. Hence OJ is a much effective delivery type.