Overview

The goal of this document is to study the Tooth Growth dataset and perform some basic exploratory analysis, provide a basic summary of the data, use confidence intervals or hypothesis tests and state some conclusions.

Description of the data set

The response is the length of odontoblasts (cells responsible for tooth growth) in 60 Guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).

Variables:

len: numeric, Tooth length. supp: factor, Supplement type (VC or OJ). dose: numeric, Dose in milligrams/day.

Exploratory Analysis

Visual Exploration

library(datasets)
data(ToothGrowth)
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
dim(ToothGrowth)
## [1] 60  3

Let us explore the «len» with respect to the factor «supp»

par(mfrow=c(1,4))
boxplot(ToothGrowth$len ~ ToothGrowth$supp, ylab="len", xlab="Supplement type")
boxplot(ToothGrowth$len ~ ToothGrowth$dose, ylab="len", xlab="dose Vitamine C")
hist(ToothGrowth[which(ToothGrowth$supp=='OJ'),]$len, xlab="OJ", xlim=c(0,40), ylab="len", ylim=c(0,10), main="OJ")
hist(ToothGrowth[which(ToothGrowth$supp=='VC'),]$len, xlab="VC",xlim=c(0,40),ylab="len", ylim=c(0,10), main="VC")

par(mfrow=c(1,1))

boxplot(ToothGrowth$len ~ ToothGrowth$supp*ToothGrowth$dose, data=ToothGrowth, col=(c("gold","darkgreen")),ylab="len", xlab="Supplement & dose", main="Tooth Growth")

This visual preliminary analysis shows the following:

1- There seems to be a positive correlation between dose and length. 2- Orange juice seems more effective than Ascorbic Acid with low (0.5) and medium (1.0) doses. 3- With high doses (2.0), the median length is identical with OJ and VC. 4- There seems to be a decreasing effectivity of Orange Juice: if we check the median length, the increment of length from 0.5 to 1.0 is greater than the increment from 1.0 to 2.0). This attenuation effect is not observed in the case of the ascorbic acid.

Numeric exploration

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
by(ToothGrowth$len,INDICES = list(ToothGrowth$supp,ToothGrowth$dose),summary)
## : OJ
## : 0.5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.20    9.70   12.25   13.23   16.18   21.50 
## -------------------------------------------------------- 
## : VC
## : 0.5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20    5.95    7.15    7.98   10.90   11.50 
## -------------------------------------------------------- 
## : OJ
## : 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   14.50   20.30   23.45   22.70   25.65   27.30 
## -------------------------------------------------------- 
## : VC
## : 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13.60   15.27   16.50   16.77   17.30   22.50 
## -------------------------------------------------------- 
## : OJ
## : 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22.40   24.58   25.95   26.06   27.08   30.90 
## -------------------------------------------------------- 
## : VC
## : 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.50   23.38   25.95   26.14   28.80   33.90

Hypothesis Testing

We are going to analyze the influence of both factors, dosage and delivery method, on tooth growth. As the population is small, we shall perform t tests.

We consider the following hypothesis for both factors (dosage and delivery method):

\(H_0\): there is no influence of the factor on teeth length. \(H_A\): research hypothesis, there is an influence of the factor on teeth legth.

Dosage factor

We split the data according to dosage level

ToothGrowth_0.5 <- ToothGrowth[which(ToothGrowth$dose=='0.5'),]
ToothGrowth_1.0 <- ToothGrowth[which(ToothGrowth$dose=='1'),]
ToothGrowth_2.0 <- ToothGrowth[which(ToothGrowth$dose=='2'),]
mean_ToothGrowth_0.5<-round(mean(ToothGrowth_0.5$len), 2)
mean_ToothGrowth_1.0<-round(mean(ToothGrowth_1.0$len), 2)
mean_ToothGrowth_2.0<-round(mean(ToothGrowth_2.0$len), 2)

The means for each dose level (0,5; 1.0; 2.0 mg/day) are 10.61,19.73 and 26.1. We are going to perform two upper-tailed t-tests on the mean with \(\alpha\)=5%.

First test: \(\mu=19.73 > \mu_0 = 10.61\)

xbar1 <- round(mean_ToothGrowth_1.0, 2)
mu01 <- round(mean_ToothGrowth_0.5, 2)
s1 <- sd(ToothGrowth_0.5$len)
n1 <- length(ToothGrowth_0.5$len)
t1 <- round((xbar1-mu01)/(s1/sqrt(n1)), 2)
t1
## [1] 9.06

We compute the critical value at \(\alpha\)=5% significance level

alpha <- 0.05
t.alpha1 <- round(qt(1-alpha,df=n1-1), 2)
t.alpha1
## [1] 1.73

As the test statistic 9.06 > 1.73 we reject the null hypothesis at a significance level of 5%. Therefore, we conclude that a dose level of 1.0 mg/day has an influence on tooth length, and the Guinea pigs that underwent the treatment with this dose level have longer teeth than the Guinea pigs that only received 0.5 mg/day.

Second test: \(\mu=26.1 > \mu_0 = 19.73\)

xbar2 <- round(mean_ToothGrowth_2.0, 2)
mu02 <- round(mean_ToothGrowth_1.0, 2)
s2 <- sd(ToothGrowth_1.0$len)
n2 <- length(ToothGrowth_1.0$len)
t2 <- round((xbar2-mu02)/(s2/sqrt(n2)), 2)
t2
## [1] 6.45

We compute the critical value at \(\alpha\)=5% significance level

alpha <- 0.05
t.alpha2 <- round(qt(1-alpha,df=n2-1), 2)
t.alpha2
## [1] 1.73

As the test statistic 6.45 > 1.73 we reject the null hypothesis at a significance level of 5%. Therefore, we conclude that a dose level of 2.0 mg/day has an influence on tooth length, and the Guinea pigs that underwent the treatment with this dose level have longer teeth than those who underwent the treatment with 1.0 mg/day.

Delivery method factor

We split the data according to delivery method:

ToothGrowth_OJ <- ToothGrowth[which(ToothGrowth$supp=='OJ'),]
ToothGrowth_VC <- ToothGrowth[which(ToothGrowth$supp=='VC'),]

mean_ToothGrowth_OJ<-round(mean(ToothGrowth_OJ$len), 2)
mean_ToothGrowth_VC<-round(mean(ToothGrowth_VC$len), 2)

The means for each delivery method (Orange Juice or Ascorbic Acid) are 20.66 and 16.96. We are going to perform two upper-tailed t-tests on the mean with \(\alpha\)=5%.

Third test: \(\mu=20.66 > \mu_0 = 16.96\)

xbar3 <- round(mean_ToothGrowth_VC, 2)
mu03 <- round(mean_ToothGrowth_OJ, 2)
s3 <- sd(ToothGrowth_VC$len)
n3 <- length(ToothGrowth_VC$len)
t3 <- round((xbar3-mu03)/(s3/sqrt(n3)), 2)
t3
## [1] -2.45

We compute the critical value at \(\alpha\)=5% significance level

alpha <- 0.05
t.alpha3 <- round(qt(1-alpha,df=n3-1), 2)
t.alpha3
## [1] 1.7

As the test statistic -2.45 < 1.7 we do not reject the null hypothesis at a significance level of 5%. Therefore, we cannot conclude that the delivery method (Orange juice versus Ascorbic acid) has a significant impact on tooth length, and it cannot be stated that the Guinea pigs that underwent the treatment with Orange juice have longer teeth than those who received Ascorbic acid.