Project Description

Now in the second portion of the class, we’re going to analyze the ToothGrowth data in the R datasets package.

  1. Load the ToothGrowth data and perform some basic exploratory data analyses
  2. Provide a basic summary of the data.
  3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)
  4. State your conclusions and the assumptions needed for your conclusions.

Some criteria that you will be evaluated on

Project Simulation

The ToothGrowth dataset explains the relation between the growth of teeth of guinea pigs at each of three dose levels of Vitamin C (0.5, 1 and 2 mg) with each of two delivery methods(orange juice and ascorbic acid).

Load the ToothGrowth data and perform some basic exploratory data analyses

Load data, check the dataset by extracting its head records, as well as structure (str)

library(datasets)
data(ToothGrowth)
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

3 columns (len as num, supp as factor, dose as num)

Provide a basic summary of the data

We use summary command to describe data:

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Dataset ToothGrowth data consists of 60 rows.

boxplot(len ~  supp * dose, data=ToothGrowth, ylab="Tooth Length", main="Tooth Growth Data")

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose

I create a test per dose:

1. Dose = 0.5

lowest.dose <- ToothGrowth[ToothGrowth$dose == 0.5, ]
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=lowest.dose)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

95 percent confidence interval: 1.7190573, 8.7809427 which does not contain 0. OJ supplements at this dose is higher thatn with VC, as it has a higher mean.

2. Dose = 1.0

medium.dose <- ToothGrowth[ToothGrowth$dose == 1.0, ]
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=medium.dose)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

95 percent confidence interval: 2.8021482, 9.0578518 which does not contain 0. OJ supplements at this dose is higher thatn with VC, as it has a higher mean.

3. Dose = 2.0

high.dose <- ToothGrowth[ToothGrowth$dose == 2.0, ]
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=high.dose)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

95 percent confidence interval: -3.7980705, 3.6380705 which contains 0 and similar mean, so I conclude that both supplements have similar effects.

Now we repeat these tests by supplement:

OJ Supplement:

OJ.lowest <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 0.5, ]
OJ.medium <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 1.0, ]
OJ.largest <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 2.0, ]

t.test(OJ.lowest$len, OJ.medium$len, paired=FALSE, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  OJ.lowest$len and OJ.medium$len
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.415634  -5.524366
## sample estimates:
## mean of x mean of y 
##     13.23     22.70

Comparison

1. Low with Medium dose for OJ

Does not contain 0, and average is low for lowest: so increasing dose will increase toothgrow

t.test(OJ.medium$len, OJ.largest$len, paired=FALSE, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  OJ.medium$len and OJ.largest$len
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.5314425 -0.1885575
## sample estimates:
## mean of x mean of y 
##     22.70     26.06

2. Medium with High dose for OJ

Does not contain 0, and average is low for Meidum: so increasing dose will increase toothgrow

VC Supplement:

VC.lowest <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 0.5, ]
VC.medium <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 1.0, ]
VC.largest <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 2.0, ]

t.test(VC.lowest$len, VC.medium$len, paired=FALSE, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  VC.lowest$len and VC.medium$len
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.265712  -6.314288
## sample estimates:
## mean of x mean of y 
##      7.98     16.77

3. Low with Medium dose for VC

Does not contain 0, and average is low for lowest: so increasing dose will increase toothgrow

t.test(VC.medium$len, VC.largest$len, paired=FALSE, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  VC.medium$len and VC.largest$len
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.054267  -5.685733
## sample estimates:
## mean of x mean of y 
##     16.77     26.14

4. Medium with High dose for VC

Does not contain 0, and average is low for Medium: so increasing dose will increase toothgrow

Conclusions

Summarizing previous analysis:

OJ Supplements are better at 0.5, 1 doses than VC. No differences at 2.0 doses Bigger doses for the same supplement increate tooth growth We have used the following assumptions:

Variances are not equal Non paired data