Overview

The report aims to analyze the ToothGrowth data in the R datasets package.

library(knitr)
# Setting the global options.
opts_chunk$set(fig.width=6, fig.height=4, warning=FALSE)

Basic Exploratory Data Analyses of ToothGrowth Dataset

The dataset has the following columns:

[1] len numeric Tooth length
[2] supp factor Supplement type
[3] dose numeric Dose in milligrams/day

library(datasets)
data(ToothGrowth)

# Print few rows from the data frame
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
# Lets check the outliers and general tooth length based on dosage.
# One of the simple way to show the relationships between multiple 
# variable is using multiple box-plots.

# Lets use 2 crossed factors for grouping the dataset and colour 
# for ease of interpretation.
boxplot(len~supp*dose, data=ToothGrowth, 
        notch=FALSE, 
        col = (c("blue","red")),
        main = "Tooth Growth Dataset", 
        xlab = "Suppliment and Dose",
        ylab = "Length")

suppressMessages(library(dplyr))
ToothGrowth %>% group_by(supp, dose) %>% summarise_each(funs(mean))
## Source: local data frame [6 x 3]
## Groups: supp
## 
##   supp dose   len
## 1   OJ  0.5 13.23
## 2   OJ  1.0 22.70
## 3   OJ  2.0 26.06
## 4   VC  0.5  7.98
## 5   VC  1.0 16.77
## 6   VC  2.0 26.14

Observations

The boxplot shows that there are no outliers in the given dataset. The average tooth length for dosage from OJ supplier is higher w.r.t. VC supplier. Also, in general (irrespective of supplier) higher dose implies longer teeths for 0.5 and 1.0 dosages.

Provide a basic summary of the data.

Included figures highlight the means we are comparing.

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
unique(ToothGrowth$supp)
## [1] VC OJ
## Levels: OJ VC
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

T-Test using ToothGrowth Dosage Measure

Now let’s do a t-interval comparing doses. We’ll show the two intervals, one assuming that the variances are equal and one assuming otherwise.

# Split the dataset based on dose.
TGDose0p5 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
rbind(
    t.test(len ~ dose, paired = FALSE, var.equal = TRUE, data = TGDose0p5)$conf,
    t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose0p5)$conf 
    )
##           [,1]      [,2]
## [1,] -11.98375 -6.276252
## [2,] -11.98378 -6.276219
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose0p5)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735
TGDose1p0 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))
rbind(
    t.test(len ~ dose, paired = FALSE, var.equal = TRUE, data = TGDose1p0)$conf,
    t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose1p0)$conf 
    )
##           [,1]      [,2]
## [1,] -8.994387 -3.735613
## [2,] -8.996481 -3.733519
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose1p0)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100
TGDose2p0 <- subset(ToothGrowth, dose %in% c(2.0, 0.5))
rbind(
    t.test(len ~ dose, paired = FALSE, var.equal = TRUE, data = TGDose2p0)$conf,
    t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose2p0)$conf 
    )
##           [,1]      [,2]
## [1,] -18.15352 -12.83648
## [2,] -18.15617 -12.83383
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = TGDose2p0)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

T-Test using Supplement Type

rbind(
    t.test(len ~ supp, paired = FALSE, var.equal = TRUE, data = ToothGrowth)$conf,
    t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = ToothGrowth)$conf 
    )
##            [,1]     [,2]
## [1,] -0.1670064 7.567006
## [2,] -0.1710156 7.571016
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

If we repeatedly perform the experiment on independent samples, about 95% of the intervals that we obtain will contain the true mean differences that we are estimating. Refer the confidence interval results above.

Conclusions and the Assumptions

Assumption(s)

  1. The basic assumption is that the member of the sample population represents the true population.

Conclusion(s)

  • When conducted hypothesis testing over ToothGrowth dosage dataset, the P-values obtained are very small which suggests that null-hypothesis is incorrect and hence can be discarded - the central idea of a P-value is to assume that the null hypothesis is true.
  • The measure of the dosage has direct impact on the length of tooth growth.
  • For 0.5 and 1.0 mg/day dosages, the OJ supplement type has more effect on tooth growth than VC. At 2.0 mg/day dosage, both supplement type seem to function equally but ofcourse with different interval of length.
  • For T-Test on dosage, two intervals (i.e. one assuming that the variances are equal and one assuming otherwise) shows similar t-confidence intervals.