A html version is available at RPubs: http://rpubs.com/svicente99/Inference_Peer_Assesment_2


Synopsis

In this project we analyse the ToothGrowth data contained inside R datasets package. After a quick exploratory data analysis about them, we use confidence intervals and/or hypothesis tests to compare tooth growth by supplement and dosage. Lastly, some conclusions are established.

Introduction

The data object of this study

The data analysed were those in the ToothGrowth data provided as part the R {datasets} package. Before beginning the analysis, it is necessary to clarify details of the nature of the data where the information provided in R is either misleading or incorrect. The data consists of measurements of the mean size of the odontoblast cells harvested from the incisor teeth of a population of 60 guinea pigs. These animals were divided into 6 groups of 10 and consistently fed a diet with one of 6 Vitamin C supplement regimes for a period of 42 days. The Vitamin C was administered either in the form of Orange Juice (OJ) or chemically pure Vitamin C (VC) in aqueous solution. Each animal received the same daily dosage of Vitamin C (either 0.5, 1.0 or 2.0 milligrams) consistently. Since each combination of supplement type and dosage was given to 10 animals, this required a total of 60 animals for the study. After 42 days, the animals were euthanized, their incisor teeth were harvested and subject to analysis via optical microscopy to determine the length (in microns) of the odontoblast cells (the layer between the pulp and the dentine).

Main Reference
Follow this link in JN - The Journal of Nutrition.

Fig.1

Nomenclature

The ToothGrowth data set consists of 60 observations of the 3 variables:
  1. len => mean length of odontoblasts (in microns);
  2. supp => supplement type (OJ or VC);
  3. dose => dosage of vitamin C (in milligrams/day).

Data Processing

Loading ToothGrowth data to be processed:

library(datasets)
data(ToothGrowth)
TG <- ToothGrowth
head(TG)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

We have in total 60 rows corresponding to each animal used in this study. A brief summary about these data:

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Exploratory Data Analysis

boxplot(len ~ supp * dose, data=TG, ylab="Tooth Length", main="Tooth Growth - BOX PLOTS")
box("outer", col="maroon", lwd=3) 

Looking at these boxplots we take note a tendency of tooth length increasing in same direction of more dosage of vitamin C (from 0.5 to 2.0mg), supplemented either by Orange Juice (OJ) or synthetic acid ascorbic (VC).

However, the difference of supplementation methods are visually highlighted only for two minor dosages (0.5/1.0). To the highest (2.0mg) is quite imperceptible in relation to mean length (27 microns roughly) - there is no effect for delivery methods, presumption that ought to be validated by an hypothesis test further.


Statistical Inference

Subsetting observations

TG_0.5 <- TG[ TG$dose==0.5, ]
TG_1.0 <- TG[ TG$dose==1.0, ]
TG_2.0 <- TG[ TG$dose==2.0, ]
Thus, we subset:
    1. TG_0.5 => 20 samples of animals that intake 0.5 mg/day of vitamin C;
    2. TG_1.0 => 20 animals that intake 1.0 mg/day; and
    3. TG_2.0 => 20, intake 2.0 mg/day.

Premises

  • All animals used on this study can be considered equal in their weight, age and feeding.
  • Unpaired samples and these variances are not equal.

T-tests

Now, we may perform hypothesis tests between these subsets, to compare ‘len’ var.

Ttest_0.5 <- t.test (len ~ supp, paired = FALSE, var.equal = FALSE, data = TG_0.5)
Ttest_1.0 <- t.test (len ~ supp, paired = FALSE, var.equal = FALSE, data = TG_1.0)
Ttest_2.0 <- t.test (len ~ supp, paired = FALSE, var.equal = FALSE, data = TG_2.0)

The associated p-values are:

pValue0.5 <- Ttest_0.5$p.value  ## 0.5 mg/day
pValue1.0 <- Ttest_1.0$p.value  ## 1.0 mg/day
pValue2.0 <- Ttest_2.0$p.value  ## 2.0 mg/day

Decision about these tests:

Null hypothesis (H0):
no difference between treatments (OC-orange juice vs. VC-synthetic vitamin C)
e.g. bilateral (or two-sided) test
Thus, we may decide:
    1. Dose=0.5: p-value = 0.0063586 < 0.05 (5%) —–> strong presumption against null hypothesis
      E.g.: there is a difference between treatments OJ x VC [REJECTED].

    2. Dose=1.0: p-value = 0.0010384 < 0.05 (5%) —–> strong presumption against null hypothesis
      E.g.: there is a difference between treatments OJ x VC [REJECTED]

    3. Dose=2.0: p-value = 0.9638516 > 0.05 (5%) —–> low presumption against null hypothesis
      E.g.: there is no significance to reject a difference between treatments OJ x VC [DON’T REJECT]

Conclusion

  • Graphs and values showed above leave us to conclude that there is a difference in Tooth Growth while the doses are larger (from 0.5 to 2.0).
  • If we observe the boxplot and its results, could be affirmed that there is no other factor that will affect the growing process (dose quantity influences hardly). In other words, supplement types and dosage are independent.
  • OJ and VC delivery methods has no signicant difference between themselves in higher dosages of vitamin C (2.0 mg/day).