Overview

We’re going to analyze the ToothGrowth data in the R datasets package. Majorly the below four points will be discussed.

1.Load the ToothGrowth data and perform some basic exploratory data analyses
2.Provide a basic summary of the data
3.Use confidence intervals and/or hypothesis tests to compare tooth growth by supplement and dose.
4.Conclusions and the assumptions needed for the conclusions

Exploring ToothGrowth Data

The ToothGrowth contain data that talks about Effect of Vitamin C on Tooth Growth in Guinea Pigs. The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice (OJ) or ascorbic acid(VJ) ).

data(ToothGrowth)
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Looking at the data, it has 60 observations on 3 variables.

[1] len - Tooth length (numeric)
[2] supp - Supplement type (VC or OJ) (factor)
[3] dose - Dose in milligrams (numeric)

Further, it can be seen, for both the delivery method and dosages, the number of observations are same, but its not clear if the two methods were tested on the same set of 30 guinea pigs or not, so assuming the samples are independent we will use paired=FALSE in our t.test

table(ToothGrowth$supp,ToothGrowth$dose)
##     
##      0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10
Group_OJ = ToothGrowth[ToothGrowth$supp == 'OJ',]
Group_VC = ToothGrowth[ToothGrowth$supp == 'VC',]

So, now lets see how the variance between these two supplement groups vary.

var(Group_OJ$len)
## [1] 43.63344
var(Group_VC$len)
## [1] 68.32723

It’s seems significantly different. This tells us, we should be using var.equal=FALSE option while performing t.test

Hypothesis and Confidence Interval

Starting with Supplement, we need to test if different Dose levels has any significance to ToothGrowth. In order to give an easier notation, from here on, I would use A for dose 0.5, B for dose 1.0 and C for dose 2.0.

Let’s split the hypothesis testing into two category. First, studying the effect of different dosages on tooth growth and secondly, studying the effect of different dosages for a given supplement.

(I) : Effect of different dosages on tooth growth.

So, we can form number of hypothesis on the basis of three dosages for both the supplements. Lets do that one by one.

[H1]: The difference of Mean (MD) length of tooth growth on OJ and VC supplement with dose = 0.5.

\(H_0 : MD = 0\)

\(H_a : MD <> 0\)

So, now to test the above null and alternative hypothesis. We will explore the data.

#Dose A = 0.5

Dose_A = ToothGrowth[ToothGrowth$dose == 0.5, ]
A = t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=Dose_A)

Now, lets look at the confidence interval and mean estimate.

A$conf.int
## [1] 1.719057 8.780943
## attr(,"conf.level")
## [1] 0.95
A$estimate
## mean in group OJ mean in group VC 
##            13.23             7.98

Since our 95% confidence interval doesn’t contain zero, it suggests, we should reject the null hypothesis. Also, by seeing through our estimate of mean, we can conclude that at 0.5 dose level, the OJ has higher statistical significance than VC.

[H2]: The difference of Mean (MD) length of tooth growth on OJ and VC supplement with dose = 1.0.

\(H_0 : MD = 0\)

\(H_a : MD <> 0\)

So, now to test the above null and alternative hypothesis. We will explore the data.

#Dose B = 1.0

Dose_B = ToothGrowth[ToothGrowth$dose == 1.0, ]
B = t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=Dose_B)

Now, lets look at the confidence interval and mean estimate.

B$conf.int
## [1] 2.802148 9.057852
## attr(,"conf.level")
## [1] 0.95
B$estimate
## mean in group OJ mean in group VC 
##            22.70            16.77

Once again our 95% confidence interval doesn’t contain zero, it suggests, we should reject the null hypothesis. Also, by seeing through our estimate of mean, we can conclude that at 1.0 dose level, the OJ has higher statistical significance than VC.

[H3]: The difference of Mean (MD) length of tooth growth on OJ and VC supplement with dose = 2.0.

\(H_0 : MD = 0\)

\(H_a : MD <> 0\)

So, now to test the above null and alternative hypothesis. We will explore the data.

#Dose C = 2.0

Dose_C = ToothGrowth[ToothGrowth$dose == 2.0, ]
C = t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=Dose_C)

Now, lets look at the confidence interval and mean estimate.

C$conf.int
## [1] -3.79807  3.63807
## attr(,"conf.level")
## [1] 0.95
C$estimate
## mean in group OJ mean in group VC 
##            26.06            26.14

Now our 95% confidence interval contains zero, it suggests, we can’t reject the null hypothesis. Also, by seeing through our estimate of mean, we can conclude that at 2.0 dose level, both OJ and VC has no difference in tooth growth.

(II) : Effect of different dosages on tooth growth for a given supplement

Now, lets turn to the other side of the story, we can form number of hypothesis on the basis of three dosages for a given supplements. Lets do that one by one.

[H1]: The difference of Mean (MD) length of tooth growth based on dose A and B for supplement OJ

\(H_0 : MD = 0\)

\(H_a : MD <> 0\)

So, now to test the above null and alternative hypothesis. We will explore the data.

#Dose A = 0.5 and B = 1.0 for Supplement OJ. 

OJ_A <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 0.5, ]
OJ_B <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 1.0, ]
OJ_C <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 2.0, ]

OJ_A_B = t.test(OJ_A$len, OJ_B$len, paired=FALSE, var.equal=FALSE)

Now, lets look at the confidence interval and mean estimate.

OJ_A_B$conf.int
## [1] -13.415634  -5.524366
## attr(,"conf.level")
## [1] 0.95
OJ_A_B$estimate
## mean of x mean of y 
##     13.23     22.70

Since our 95% confidence interval doesn’t contain zero, it suggests, we should reject the null hypothesis. Also, by seeing through our estimate of mean, we can conclude that at 1.0 dose level, the OJ has higher statistical significance than at 0.5 dose level for tooth growth.

[H2]: The difference of Mean (MD) length of tooth growth based on dose B and C for supplement OJ

\(H_0 : MD = 0\)

\(H_a : MD <> 0\)

So, now to test the above null and alternative hypothesis. We will explore the data.

#Dose B = 1.0 and C = 2.0 for Supplement OJ.

OJ_B_C = t.test(OJ_B$len, OJ_C$len, paired=FALSE, var.equal=FALSE)

Now, lets look at the confidence interval and mean estimate.

OJ_B_C$conf.int
## [1] -6.5314425 -0.1885575
## attr(,"conf.level")
## [1] 0.95
OJ_B_C$estimate
## mean of x mean of y 
##     22.70     26.06

Since our 95% confidence interval doesn’t contain zero, it suggests, we should reject the null hypothesis. Also, by seeing through our estimate of mean, we can conclude that at 2.0 dose level, the OJ has higher statistical significance than at 1.5 dose level for tooth growth. One more observation by looking at the mean estimate is that, the increase in tooth growth is more when the OJ dosage was increased from 0.5 to 1.0 compared to when its increased from 1.0 to 2.0.

[H3]: The difference of Mean (MD) length of tooth growth based on dose A and B for supplement VC

\(H_0 : MD = 0\)

\(H_a : MD <> 0\)

So, now to test the above null and alternative hypothesis. We will explore the data.

#Dose A = 0.5 and B = 1.0 for Supplement VC.

VC_A <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 0.5, ]
VC_B <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 1.0, ]
VC_C <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 2.0, ]
VC_A_B = t.test(VC_A$len, VC_B$len, paired=FALSE, var.equal=FALSE)

Now, lets look at the confidence interval and mean estimate.

VC_A_B$conf.int
## [1] -11.265712  -6.314288
## attr(,"conf.level")
## [1] 0.95
VC_A_B$estimate
## mean of x mean of y 
##      7.98     16.77

Since our 95% confidence interval doesn’t contain zero, it suggests, we should reject the null hypothesis. Also, by seeing through our estimate of mean, we can conclude that at 1.0 dose level, the VC has higher statistical significance than at 0.5 dose level for tooth growth.

[H4]: The difference of Mean (MD) length of tooth growth based on dose A and B for supplement VC

\(H_0 : MD = 0\)

\(H_a : MD <> 0\)

So, now to test the above null and alternative hypothesis. We will explore the data.

#Dose A = 0.5 and B = 1.0 for Supplement VC.

VC_B_C = t.test(VC_B$len, VC_C$len, paired=FALSE, var.equal=FALSE)

Now, lets look at the confidence interval and mean estimate.

VC_B_C$conf.int
## [1] -13.054267  -5.685733
## attr(,"conf.level")
## [1] 0.95
VC_B_C$estimate
## mean of x mean of y 
##     16.77     26.14

Since our 95% confidence interval doesn’t contain zero, it suggests, we should reject the null hypothesis. Also, by seeing through our estimate of mean, we can conclude that at 2.0 dose level, the VC has higher statistical significance than at 1.5 dose level for tooth growth. One more observation by looking at the mean estimate is that, the increase in tooth growth is more when the VC dosage was increased from 1.0 to 2.0 compared to when its increased from 0.5 to 1.0, which is the exact opposite of the case with OJ.

Concluding notes