Synopsis - This document reads ToothGrowth data and performs exploratory analysis on it. Furthermore, it evaluates the correlation between
The data is observations from an experiment on 60 different guinea pigs to study the effect of vitamin C on tooth growth corresponding to different doses and supplements (Orange juice vs direct doses of vitamin C). The groups hence will be considered an independent groups rather than paired.
Get ToothGrowth data and provide a summary of this data
data(ToothGrowth)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
## Loading required package: ggplot2
Below plot shows relationship between dose and tooth growth of a subject, broken down by supplement.
qplot(data=ToothGrowth,x=len,y=dose,colour=supp, main="Dose vs Tooth Growth") + geom_smooth() + xlab("Tooth Length") + ylab("Suppliment Dose")
Get information required to calculate confidence interval
## Get vector of length for OJ and VC
OJ <- ToothGrowth[ToothGrowth$supp=="OJ",]$len
VC <- ToothGrowth[ToothGrowth$supp=="VC",]$len
## Get mean of above two
MeanOJ <- mean(ToothGrowth[ToothGrowth$supp=="OJ",]$len)
MeanVC <- mean(ToothGrowth[ToothGrowth$supp=="VC",]$len)
## Get variance
VarOJ <- var(OJ)
VarVC <- var(VC)
nx <- length(VC)
ny <- length(OJ)
Using R’s in built function t.test, following confidence intervals are returned
-0.1670064, 7.5670064
Now to verify this result, let use t-confidence interval formulas for independent group. The formula for this is
\[ \bar{Y} - \bar{X} (-1,+1) t_{n_x+n_y-2,1-\alpha/2} S_p (1/n_x + 1/n_y)^{1/2} \]
where \[ S_p = ({(n_x - 1)S_x^2 + (n_x - 1)S_x^2}/(n_x + n_y - 2))^{1/2}\]
Substituting the values defined above in R code chunk and calculating
MeanOJ - MeanVC + c(-1, 1) * qt(0.975,58) * sqrt(((nx-1)*VarVC+(ny-1)*VarNJ)/58) * sqrt((1/ny)+(1/nx))
yields -
-0.1670064, 7.5670064
Confidence intervals obtained from both the calculations are same, hence This we can say that in 95% of the cases the difference between tooth growth based on type of supplement given will lie in this interval. There is a strong support that direct dose of vitamin C has more effect on tooth growth over orange juice
We will compare the cases where the dose administered was 0.5 vs the cases where does administered was 2.0 cases. For the comparison derive the required values as shown below.
Get information required to calculate confidence interval
##Get tooth length value corresponding to doeses of 0.5 and 2.0
Len_Dose0.5 <- ToothGrowth[ToothGrowth$dose == 0.5,]$len
Len_Dose2.0 <- ToothGrowth[ToothGrowth$dose == 2.0,]$len
##Get mean and variance of above lengths
Mean_Dose0.5 <- mean(ToothGrowth[ToothGrowth$dose == 0.5,]$len)
Mean_Dose2.0 <- mean(ToothGrowth[ToothGrowth$dose == 2.0,]$len)
Var_Dose0.5 <- var(ToothGrowth[ToothGrowth$dose == 0.5,]$len)
Var_Dose2.0 <- var(ToothGrowth[ToothGrowth$dose == 2.0,]$len)
Using formula t.test(Len_Dose2.0, Len_Dose0.5, paired=F, var.equal=T) the interval obtained are
12.836481, 18.153519
Now using the same formula used above for supplement to calculate confidence interval and substituting values for doses, it will looks like
Mean_Dose0.5 - Mean_Dose2.0 + c(-1, 1) * qt(0.975,38) * sqrt( (19*Var_Dose0.5 + 19*Var_Dose2.0)/38) * sqrt((1/20)+(1/20))
and will yield confidence intervals as
12.836481, 18.153519
On comparing tooth length based on dose quantity (2.0 and 0.5), it was found that in 95% of the cases the difference in tooth length w.r.t the dose quantity would lie with in the interval derived above, which suggests that the increased in dose of ascorbic acid results in increased tooth growth.