The goal of this document is to study the Tooth Growth dataset and perform some basic exploratory analysis, provide a basic summary of the data, use confidence intervals or hypothesis tests and state some conclusions.
The response is the length of odontoblasts (cells responsible for tooth growth) in 60 Guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).
Variables:
len: numeric, Tooth length. supp: factor, Supplement type (VC or OJ). dose: numeric, Dose in milligrams/day.
library(datasets)
data(ToothGrowth)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
dim(ToothGrowth)
## [1] 60 3
Let us explore the «len» with respect to the factor «supp»
par(mfrow=c(1,4))
boxplot(ToothGrowth$len ~ ToothGrowth$supp, ylab="len", xlab="Supplement type")
boxplot(ToothGrowth$len ~ ToothGrowth$dose, ylab="len", xlab="dose Vitamine C")
hist(ToothGrowth[which(ToothGrowth$supp=='OJ'),]$len, xlab="OJ", xlim=c(0,40), ylab="len", ylim=c(0,10), main="OJ")
hist(ToothGrowth[which(ToothGrowth$supp=='VC'),]$len, xlab="VC",xlim=c(0,40),ylab="len", ylim=c(0,10), main="VC")
par(mfrow=c(1,1))
boxplot(ToothGrowth$len ~ ToothGrowth$supp*ToothGrowth$dose, data=ToothGrowth, col=(c("gold","darkgreen")),ylab="len", xlab="Supplement & dose", main="Tooth Growth")
This visual preliminary analysis shows the following:
1- There seems to be a positive correlation between dose and length. 2- Orange juice seems more effective than Ascorbic Acid with low (0.5) and medium (1.0) doses. 3- With high doses (2.0), the median length is identical with OJ and VC. 4- There seems to be a decreasing effectivity of Orange Juice: if we check the median length, the increment of length from 0.5 to 1.0 is greater than the increment from 1.0 to 2.0). This attenuation effect is not observed in the case of the ascorbic acid.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
by(ToothGrowth$len,INDICES = list(ToothGrowth$supp,ToothGrowth$dose),summary)
## : OJ
## : 0.5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.20 9.70 12.25 13.23 16.18 21.50
## --------------------------------------------------------
## : VC
## : 0.5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 5.95 7.15 7.98 10.90 11.50
## --------------------------------------------------------
## : OJ
## : 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.50 20.30 23.45 22.70 25.65 27.30
## --------------------------------------------------------
## : VC
## : 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.60 15.27 16.50 16.77 17.30 22.50
## --------------------------------------------------------
## : OJ
## : 2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22.40 24.58 25.95 26.06 27.08 30.90
## --------------------------------------------------------
## : VC
## : 2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.50 23.38 25.95 26.14 28.80 33.90
We are going to analyze the influence of both factors, dosage and delivery method, on tooth growth. As the population is small, we shall perform t tests.
We consider the following hypothesis for both factors (dosage and delivery method):
\(H_0\): there is no influence of the factor on teeth length. \(H_A\): research hypothesis, there is an influence of the factor on teeth legth.
We split the data according to dosage level
ToothGrowth_0.5 <- ToothGrowth[which(ToothGrowth$dose=='0.5'),]
ToothGrowth_1.0 <- ToothGrowth[which(ToothGrowth$dose=='1'),]
ToothGrowth_2.0 <- ToothGrowth[which(ToothGrowth$dose=='2'),]
mean_ToothGrowth_0.5<-round(mean(ToothGrowth_0.5$len), 2)
mean_ToothGrowth_1.0<-round(mean(ToothGrowth_1.0$len), 2)
mean_ToothGrowth_2.0<-round(mean(ToothGrowth_2.0$len), 2)
The means for each dose level (0,5; 1.0; 2.0 mg/day) are 10.61,19.73 and 26.1. We are going to perform two upper-tailed t-tests on the mean with \(\alpha\)=5%.
xbar1 <- round(mean_ToothGrowth_1.0, 2)
mu01 <- round(mean_ToothGrowth_0.5, 2)
s1 <- sd(ToothGrowth_0.5$len)
n1 <- length(ToothGrowth_0.5$len)
t1 <- round((xbar1-mu01)/(s1/sqrt(n1)), 2)
t1
## [1] 9.06
We compute the critical value at \(\alpha\)=5% significance level
alpha <- 0.05
t.alpha1 <- round(qt(1-alpha,df=n1-1), 2)
t.alpha1
## [1] 1.73
As the test statistic 9.06 > 1.73 we reject the null hypothesis at a significance level of 5%. Therefore, we conclude that a dose level of 1.0 mg/day has an influence on tooth length, and the Guinea pigs that underwent the treatment with this dose level have longer teeth than the Guinea pigs that only received 0.5 mg/day.
xbar2 <- round(mean_ToothGrowth_2.0, 2)
mu02 <- round(mean_ToothGrowth_1.0, 2)
s2 <- sd(ToothGrowth_1.0$len)
n2 <- length(ToothGrowth_1.0$len)
t2 <- round((xbar2-mu02)/(s2/sqrt(n2)), 2)
t2
## [1] 6.45
We compute the critical value at \(\alpha\)=5% significance level
alpha <- 0.05
t.alpha2 <- round(qt(1-alpha,df=n2-1), 2)
t.alpha2
## [1] 1.73
As the test statistic 6.45 > 1.73 we reject the null hypothesis at a significance level of 5%. Therefore, we conclude that a dose level of 2.0 mg/day has an influence on tooth length, and the Guinea pigs that underwent the treatment with this dose level have longer teeth than those who underwent the treatment with 1.0 mg/day.
We split the data according to delivery method:
ToothGrowth_OJ <- ToothGrowth[which(ToothGrowth$supp=='OJ'),]
ToothGrowth_VC <- ToothGrowth[which(ToothGrowth$supp=='VC'),]
mean_ToothGrowth_OJ<-round(mean(ToothGrowth_OJ$len), 2)
mean_ToothGrowth_VC<-round(mean(ToothGrowth_VC$len), 2)
The means for each delivery method (Orange Juice or Ascorbic Acid) are 20.66 and 16.96. We are going to perform two upper-tailed t-tests on the mean with \(\alpha\)=5%.
xbar3 <- round(mean_ToothGrowth_VC, 2)
mu03 <- round(mean_ToothGrowth_OJ, 2)
s3 <- sd(ToothGrowth_VC$len)
n3 <- length(ToothGrowth_VC$len)
t3 <- round((xbar3-mu03)/(s3/sqrt(n3)), 2)
t3
## [1] -2.45
We compute the critical value at \(\alpha\)=5% significance level
alpha <- 0.05
t.alpha3 <- round(qt(1-alpha,df=n3-1), 2)
t.alpha3
## [1] 1.7
As the test statistic -2.45 < 1.7 we do not reject the null hypothesis at a significance level of 5%. Therefore, we cannot conclude that the delivery method (Orange juice versus Ascorbic acid) has a significant impact on tooth length, and it cannot be stated that the Guinea pigs that underwent the treatment with Orange juice have longer teeth than those who received Ascorbic acid.