This report analyzes the ToothGrowth data in the R data sets package. The data is the result of measuring the affect of different dosage amounts of Vitamin C on the length of odontoblasts (teeth) of ten guinea pigs. There are two supplement types of Vitamin C tested, Orange Juice and Ascorbic Acid, and they are given in three three different milligram dosage amount, 0.5, 1.0, and 2.0. The report goes through the process of cleansing the dataset, conducting exploratory analysis, and a completing statistical inference around different categorizations to the length of the teeth.
Source code for this entire report can be found here:
http://github.com/mjfii/Statistical-Inference
More information regarding the source dataset can be found here:
https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ToothGrowth.html
From the source, we will load the ToothGrowth data into a data.table object, change the column names to something more meaningful, and declare a join key. In order to make categorizing a little more simple, we will add an additional column for Dosage by converting the 0.5 dose to ‘SM’, the 1.0 to ‘MD’, and the 2.0 to ‘LG’. A single observation is shown below.
# load data and make column names meaningful
dt<-data.table(ToothGrowth)
setnames(dt,c('len','supp','dose'),c('Length','Supplement','Dose'))
# add 'Dosage'and set the join key
dt<-dt[,Dosage:=sapply(as.character(dt$Dose),function(x) as.factor(switch(x,'0.5'='SM','1'='MD','2'='LG')))]
setkey(dt,Supplement,Dosage)
head(dt,1)
## Length Supplement Dose Dosage
## 1: 15.2 OJ 0.5 SM
The following result sets are two a simple exploratory methods to understand the content and the structure of the data.table that we will continue to analyze in later sections of the report.
summary(dt)
## Length Supplement Dose Dosage
## Min. : 4.20 OJ:30 Min. :0.500 SM:20
## 1st Qu.:13.07 VC:30 1st Qu.:0.500 MD:20
## Median :19.25 Median :1.000 LG:20
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
str(dt)
## Classes 'data.table' and 'data.frame': 60 obs. of 4 variables:
## $ Length : num 15.2 21.5 17.6 9.7 14.5 10 8.2 9.4 16.5 9.7 ...
## $ Supplement: Factor w/ 2 levels "OJ","VC": 1 1 1 1 1 1 1 1 1 1 ...
## $ Dose : num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
## $ Dosage : Factor w/ 3 levels "SM","MD","LG": 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, ".internal.selfref")=<externalptr>
## - attr(*, "sorted")= chr "Supplement" "Dosage"
To further conduct the exploratory analysis, we can plot Length against both Dosage and Supplement. When we do this we see that the larger the Dosage, the longer the tooth Length. However, it is slightly unclear as to which supplement is more effective, Orange Juice OJ or Ascorbic Acid VC.
In order to understand Vitamin C’s affect on tooth growth, we will conduct the following confidence interval testing scenarios:
For each of the comparisons, we will subset dt appropriately and utilize the t.test R function to determine each scenarios confidence interval, subset means, and p-value.
t1<-subset(dt,Dosage=='SM')$Length
t2<-subset(dt,Dosage=='MD')$Length
t<-t.test(t1,t2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]
## [1] -11.983781 -6.276219
If we increase the Vitamin C dose from 0.5 to 1.0 milligrams, the confidence interval does not contain zero, so we can reject the null hypothesis that this dose increase does not increase tooth length.
t1<-subset(dt,Dosage=='MD')$Length
t2<-subset(dt,Dosage=='LG')$Length
t<-t.test(t1,t2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]
## [1] -8.996481 -3.733519
Next, if we increase the Vitamin C dose from 1.0 to 2.0 milligrams, the confidence interval againg does not contain zero, so we can reject the null hypothesis that this dose increase does not increase tooth length.
In both of these scenarios, an increased dose amount leads to an increased tooth length.
t1<-subset(dt,Supplement=='VC')$Length
t2<-subset(dt,Supplement=='OJ')$Length
t<-t.test(t1,t2,paired=FALSE,var.equal=FALSE)
t$p.value
## [1] 0.06063451
t$conf.int[1:2]
## [1] -7.5710156 0.1710156
In this single comparison, the p-value is 0.061 and the confidence interval contains zero; so, here we do not reject the null hypothesis and conclude that the type of Vitamin C supplement alone does not affect tooth growth.
t1<-subset(dt,Supplement=='VC' & Dosage=='SM')$Length
t2<-subset(dt,Supplement=='OJ' & Dosage=='SM')$Length
t<-t.test(t1,t2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]
## [1] -8.780943 -1.719057
When we continue the analysis, and compare a ‘SM’ dosage of Ascorbic Acid to a’SM’ dosage of Orange Juice, we see the confidence interval does not contain zero, so we can reject the null hypothesis that supplement type with a ‘SM’ dosage does not affect tooth growth.
t1<-subset(dt,Supplement=='VC' & Dosage=='MD')$Length
t2<-subset(dt,Supplement=='OJ' & Dosage=='MD')$Length
t<-t.test(t1,t2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]
## [1] -9.057852 -2.802148
Next, we compare a ‘MD’ dosage of Ascorbic Acid to a ‘MD’ dosage of Orange Juice, and, again, we see the confidence interval does not contain zero; so, we can reject the null hypothesis that supplement type with a ‘MD’ dosage does not affect tooth growth.
t1<-subset(dt,Supplement=='VC' & Dosage=='LG')$Length
t2<-subset(dt,Supplement=='OJ' & Dosage=='LG')$Length
t<-t.test(t1,t2,paired=FALSE,var.equal=FALSE)
t$p.value
## [1] 0.9638516
t$conf.int[1:2]
## [1] -3.63807 3.79807
Lastly, we compare a ‘LG’ dosage of Ascorbic Acid to a ‘LG’ dosage of Orange Juice; this time, however, we observer the confidence interval contains zero and there is a p-value of almost 1.0. In turn, we do not reject the null hypothesis that supplement type with a ‘LG’ dosage does not affect tooth growth. Meaning, with a ‘LG’ Dosage, we cannot conclude which supplement type has a greater affect on tooth growth.