Synopsis - This document reads ToothGrowth data and performs exploratory analysis on it. Furthermore, it evaluates the correlation between

1. Data Analysis, Exploration and Assumptions

The data is observations from an experiment on 60 different guinea pigs to study the effect of vitamin C on tooth growth corresponding to different doses and supplements (Orange juice vs direct doses of vitamin C). The groups hence will be considered an independent groups rather than paired.

1.1 Data Analysis

Get ToothGrowth data and provide a summary of this data

data(ToothGrowth)
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
## Loading required package: ggplot2

Below plot shows relationship between dose and tooth growth of a subject, broken down by supplement.

qplot(data=ToothGrowth,x=len,y=dose,colour=supp, main="Dose vs Tooth Growth") + geom_smooth() + xlab("Tooth Length") + ylab("Suppliment Dose")

1.2 Key Observations
  • The data has 60 observations of tooth length subject to various doses of vitamin C supplements.
  • Half of the subjects received OJ (Orange Juice) and remaining half received VC (Vitamin C)
  • The population was divided in three equal groups which received different doses of supplements, 0.5, 1.5 or 2.0
  • The plot depicts that the tooth growth has positive relationship with amount of dose
  • The plot also shows that VC supplement seems to have greater effect on tooth growth than OJ
1.3 Key Assumptions
  • Data points are iid normal
  • Variances are constant across groups
  • The groups are in depended as each observations has different subject

2. Compare tooth growth by supplement (OJ or VC)

2.1 Get Required Information using R

Get information required to calculate confidence interval

## Get vector of length for OJ and VC
OJ     <- ToothGrowth[ToothGrowth$supp=="OJ",]$len
VC     <- ToothGrowth[ToothGrowth$supp=="VC",]$len

## Get mean of above two
MeanOJ <- mean(ToothGrowth[ToothGrowth$supp=="OJ",]$len)
MeanVC <- mean(ToothGrowth[ToothGrowth$supp=="VC",]$len)

## Get variance
VarOJ  <- var(OJ)
VarVC  <- var(VC)
nx     <- length(VC)
ny     <- length(OJ)
2.2 Calculate Confidence Intervals and Verify

Using R’s in built function t.test, following confidence intervals are returned

-0.1670064, 7.5670064

Now to verify this result, let use t-confidence interval formulas for independent group. The formula for this is

\[ \bar{Y} - \bar{X} (-1,+1) t_{n_x+n_y-2,1-\alpha/2} S_p (1/n_x + 1/n_y)^{1/2} \]

where \[ S_p = ({(n_x - 1)S_x^2 + (n_x - 1)S_x^2}/(n_x + n_y - 2))^{1/2}\]

Substituting the values defined above in R code chunk and calculating

MeanOJ - MeanVC + c(-1, 1) * qt(0.975,58) * sqrt(((nx-1)*VarVC+(ny-1)*VarNJ)/58) * sqrt((1/ny)+(1/nx))

yields -

-0.1670064, 7.5670064

2.3 Inference

Confidence intervals obtained from both the calculations are same, hence This we can say that in 95% of the cases the difference between tooth growth based on type of supplement given will lie in this interval. There is a strong support that direct dose of vitamin C has more effect on tooth growth over orange juice

3. Compare tooth growth by dose (0.5 and 2.0)

We will compare the cases where the dose administered was 0.5 vs the cases where does administered was 2.0 cases. For the comparison derive the required values as shown below.

3.1 Get Required Information using R

Get information required to calculate confidence interval

##Get tooth length value corresponding to doeses of 0.5 and 2.0
Len_Dose0.5 <- ToothGrowth[ToothGrowth$dose == 0.5,]$len
Len_Dose2.0 <- ToothGrowth[ToothGrowth$dose == 2.0,]$len

##Get mean and variance of above lengths
Mean_Dose0.5 <- mean(ToothGrowth[ToothGrowth$dose == 0.5,]$len)
Mean_Dose2.0 <- mean(ToothGrowth[ToothGrowth$dose == 2.0,]$len)
Var_Dose0.5 <- var(ToothGrowth[ToothGrowth$dose == 0.5,]$len)
Var_Dose2.0 <- var(ToothGrowth[ToothGrowth$dose == 2.0,]$len)
3.2 Calculate Intervals and Verify

Using formula t.test(Len_Dose2.0, Len_Dose0.5, paired=F, var.equal=T) the interval obtained are

12.836481, 18.153519

Now using the same formula used above for supplement to calculate confidence interval and substituting values for doses, it will looks like

Mean_Dose0.5 - Mean_Dose2.0 + c(-1, 1) * qt(0.975,38) * sqrt( (19*Var_Dose0.5 + 19*Var_Dose2.0)/38) * sqrt((1/20)+(1/20))

and will yield confidence intervals as

12.836481, 18.153519

3.3 Inference

On comparing tooth length based on dose quantity (2.0 and 0.5), it was found that in 95% of the cases the difference in tooth length w.r.t the dose quantity would lie with in the interval derived above, which suggests that the increased in dose of ascorbic acid results in increased tooth growth.