Part 2: Basic Inferential Data Analysis Instructions

Now in the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.

Part 2.1: Load the ToothGrowth data and perform some basic exploratory data analyses

data("ToothGrowth")
dim(ToothGrowth)
## [1] 60  3
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Part 2.2: Provide a basic summary of the data

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

I wil split the data by supp and calcule len mean by dose.

library(dplyr)
ToothGrowth <- tbl_df(ToothGrowth)
OJ <- filter(ToothGrowth, supp=="OJ")
VC<- filter(ToothGrowth, supp=="VC")
aggregate(len~dose, OJ, mean)
aggregate(len~dose, VC, mean)

Part 2.3: Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose

Now we will compare tooth growth by supplement using a t-test.

t.test(len~supp,data=ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Since the p-value is greater than 0.05 and the confidence interval of the test contains zero we can say that supplement types seems to have no impact on Tooth growth based on this test.

Now we will compare tooth growth by supplement using a t-test.

Analyze dose = 0.5 vs. dose = 1.0

td_subset1 <- subset(ToothGrowth, ToothGrowth$dose %in% c(1.0,0.5))
t.test(len ~ dose, data=td_subset1)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

Analyze dose = 1.0 vs. dose = 2.0

td_subset2 <- subset(ToothGrowth, ToothGrowth$dose %in% c(1.0,2.0))
t.test(len ~ dose, data=td_subset2)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

Analyze dose = 0.5 vs. dose = 2.0

td_subset3 <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5,2.0))
t.test(len ~ dose, data=td_subset3)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

The p-value of each test was essentially zero and the confidence interval of each test does not cross over zero (0).

Based on this result we can assume that the average tooth length increases with an inceasing dose.

Conclusions and assumptions

Given the following assumptions:

  1. The sample is representative of the population
  2. The distribution of the sample means follows the Central Limit Theorem
  3. In reviewing our t-test analysis from above, we can conclude that supplement delivery method has no effect on tooth growth/length, however increased dosages do result in increased tooth length.