Josh Katz 02.06.16

Overview

The goal for Part 2 of this project is to analyze and make inferences using the R dataset ToothGrowth. Initially, the dataset attributes will be described with both numerical and graphical techniques. Assumptions will be made leading to population inferences using the dataset and statistical techniques. Finally, a conclusion section will describe any insights derived from the inferences.

Data Loading and Numerical Analysis

The code below is used to inspect and define the data:

data(ToothGrowth)
  ##ToothGrowth R dataset is loaded

?ToothGrowth
  ##Dataset background and attribute definitions
summary(ToothGrowth);str(ToothGrowth);head(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Values for each of the three attributes seem reasonable per information in ?ToothGrowth

Graphical Analysis

library(ggplot2)
qplot(supp,len,data=ToothGrowth,geom="boxplot",group=supp,fill=supp,
main="Growth of Guinea Pig Teeth based on
Vitamin C or Orange Juice Intake")+
  scale_fill_brewer(palette="Greens")+theme_classic()

library(ggplot2)
ToothGrowth$dose=factor(ToothGrowth$dose)
  #change dose to factor for plotting
qplot(dose,len,data=ToothGrowth,geom="boxplot",group=dose,fill=dose,
main="Growth of Guinea Pig Teeth based
on Dose (0.5/1/2 mg/day)")+
  scale_fill_brewer(palette="Reds")+theme_classic()

library(ggplot2)
qplot(dose,len,data=ToothGrowth,geom="boxplot",group=dose,fill=dose,
main="Growth of Guinea Pig Teeth based on 
Dose (0.5/1/2 mg/day) based on Delivery type 
(Vitamin C or Orange Juice)",facets=~supp)+
  scale_fill_brewer(palette="Oranges")+theme_classic()

Statistical Inference Testing

Confidence Intervals and T-tests will be used to compare the tooth growth of the guinea pigs based on type of Vitamin C delivery (Orange Juice or Asorbic Acid) and level of dose (0.5,1,2 mg/day)

Tooth Growth comparison based on Vitamin C delivery

vitc=subset(ToothGrowth,supp=="VC")
oj=subset(ToothGrowth,supp=="OJ")
  ##subset VC and OJ delivery groups

t.test(oj$len,vitc$len)
## 
##  Welch Two Sample t-test
## 
## data:  oj$len and vitc$len
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

Guinea Pig tooth length was not significantly different between Vitamin C and Orange Juice delivery types based on a p value > 0.05 and the 95% confidence interval of -0.1710156 7.5710156 that contains the null Hypothesis of 0 difference tooth growth.

T tests for comparison of dose levels

dose_half=subset(ToothGrowth,dose==0.5)
dose_one=subset(ToothGrowth,dose==1)
dose_two=subset(ToothGrowth,dose==2)
  ##subset 0.5,1, and 2 mg/day groups

t.test(dose_one$len,dose_half$len)
## 
##  Welch Two Sample t-test
## 
## data:  dose_one$len and dose_half$len
## t = 6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.276219 11.983781
## sample estimates:
## mean of x mean of y 
##    19.735    10.605
t.test(dose_two$len,dose_half$len)
## 
##  Welch Two Sample t-test
## 
## data:  dose_two$len and dose_half$len
## t = 11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.83383 18.15617
## sample estimates:
## mean of x mean of y 
##    26.100    10.605
t.test(dose_two$len,dose_one$len)
## 
##  Welch Two Sample t-test
## 
## data:  dose_two$len and dose_one$len
## t = 4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.733519 8.996481
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

All dose level comparisons were significantly different at p<0.05.

T tests to compare Vitamin C and Orange Juice Delivery types based on dose

vc_dose_half=subset(ToothGrowth,supp=="VC"& dose==0.5)
oj_dose_half=subset(ToothGrowth,supp=="OJ"& dose==0.5)
vc_dose_one=subset(ToothGrowth,supp=="VC"& dose==1)
oj_dose_one=subset(ToothGrowth,supp=="OJ"& dose==1)
vc_dose_two=subset(ToothGrowth,supp=="VC"& dose==2)
oj_dose_two=subset(ToothGrowth,supp=="OJ"& dose==2)
  ##subset each Vitamin C delivery based on dose

t.test(oj_dose_half$len,vc_dose_half$len)
## 
##  Welch Two Sample t-test
## 
## data:  oj_dose_half$len and vc_dose_half$len
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean of x mean of y 
##     13.23      7.98
t.test(oj_dose_one$len,vc_dose_one$len)
## 
##  Welch Two Sample t-test
## 
## data:  oj_dose_one$len and vc_dose_one$len
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean of x mean of y 
##     22.70     16.77
t.test(oj_dose_two$len,vc_dose_two$len)
## 
##  Welch Two Sample t-test
## 
## data:  oj_dose_two$len and vc_dose_two$len
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean of x mean of y 
##     26.06     26.14

Vitamin C delivery systems were significantly different at doses 0.5 and 1, p<0.05, but not at 2, p>0.05.

Conclusion

The conclusions below are based on the following dataset assumptions: 1) each observation was sampled without bias, 2) the observations are independent of each other, 3) the guinea pig population represented by the 60 pig sample is normally distributed, 4) the sample size is less than 10% of the population, and 5) a double-blind study was conducted.

All dose increases of either delivery (Ascorbic Acid or Orange Juice) showed significantly increased Guinea Pig tooth growth (p<0.05). Orange Juice compared to Ascorbic Acid resulted in significantly increased Guinea Pig tooth growth,only, when evaluated by dose for 2 of the three doses: (p<0.05) for doses 0.5 and 1 (mg/day), but not at 2 mg/day (p>0.05).

Computer Operating System and R Version used to run code above:

R version 3.2.3 (2015-12-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1