Overview

We’re going to analyze the ToothGrowth data in the R datasets package.

Load the ToothGrowth data and perform some basic exploratory data analyses

load the data

tg <-  ToothGrowth
str(tg)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

As per data, we can see there are only two level in ‘supp’ which is OJ and VC, lets check distinct value of ‘dose’

unique(tg$dose)
## [1] 0.5 1.0 2.0

there are only three level in dose, now we do all analysis around three level of dose with respect of two level of supply

Visual Plots for explotory analysis

Plot the basic data

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
g <- ggplot(aes(x=dose, y = len), data = tg) + 
        geom_point(aes(color = supp)) 
print(g)

BoxPlot dose and len

g <- ggplot(aes(x = factor(dose), y = len), data = tg) + 
        geom_boxplot(aes(fill = factor(dose)))
print(g)

BoxPlot dose and supply

g <- ggplot(aes(x = factor(supp), y = len), data = tg) + 
        geom_boxplot(aes(fill = factor(supp)))
print(g)

A basic summary of the data.

head(tg)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
summary(tg)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose

Lower <- subset(tg, dose %in% c(0.5, 1.0))
Middle <- subset(tg, dose %in% c(0.5, 2.0))
Upper <- subset(tg, dose %in% c(1.0, 2.0))

Now we will do t test on basis of doses

t.test(len ~ dose, paired = F, var.equal = F, data = Lower)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735
t.test(len ~ dose, paired = F, var.equal = F, data = Middle)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100
t.test(len ~ dose, paired = F, var.equal = F, data = Upper)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

t test on basis of supply

t.test(len ~ supp, paired = F, var.equal = F, data = tg)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

State your conclusions and the assumptions needed for your conclusions

  1. Supplement type has no effect on tooth growth as we can’t reject null hypothesis here
  2. Increasing the dose level leads to increased tooth growth, the clear growth of confidence interval allow us to reject null hypothesis