Overview

This exercise analyzes the ToothGrowth data in the R datasets package. It provides a summary of the data and also compares tooth growth by supp and dose.

Data Summary

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

The histogram of the len variable is not a perfect normal distribution but can be assumed to be normal for analysis. The two groups(VC and OJ) are assumed to be independent of each other

Statistical inference

Growth and supp

The Student’s T test is used to determine if there is a difference in the effect of OJ and VC on the tooth growth

## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len by ToothGrowth$supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1670064  7.5670064
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Thus it is observed That we fail to reject the null hypothesis. There isn’t a significant difference in the growth observed due to different suppliments.

Growth and dose

The dosage of the growth suppliment has 3 values 0.5, 1, 2. Two of these samples are compared at a time to determine if increasing the dosage has any effect on the growth. The database is divided into groups according the suppliment.

Group OJ

The data for OJ for dose = 0.5 is compared with dose = 1

## 
##  Two Sample t-test
## 
## data:  t1$len by t1$dose
## t = -5.0486, df = 18, p-value = 8.358e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.410814  -5.529186
## sample estimates:
## mean in group 0.5   mean in group 1 
##             13.23             22.70

It is observed that the mean increases when the dosage is increased (mean of group 0.5 is less than that of group 1)

The data for OJ for dose = 1 is compared with dose = 2

## 
##  Two Sample t-test
## 
## data:  t2$len by t2$dose
## t = -2.2478, df = 18, p-value = 0.03736
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.5005017 -0.2194983
## sample estimates:
## mean in group 1 mean in group 2 
##           22.70           26.06

It is again observed that the mean increased when the dosage is increased (mean of group 1 is less than that of group 2)

The data for VC for dose = 0.5 is compared with dose = 1

## 
##  Two Sample t-test
## 
## data:  t3$len by t3$dose
## t = -7.4634, df = 18, p-value = 6.492e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.264346  -6.315654
## sample estimates:
## mean in group 0.5   mean in group 1 
##              7.98             16.77

It is observed that the mean increased when the dosage is increased (mean of group 0.5 is less than that of group 1)

The data for VC for dose = 1 is compared with dose = 2

## 
##  Two Sample t-test
## 
## data:  t4$len by t4$dose
## t = -5.4698, df = 18, p-value = 3.398e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -12.96896  -5.77104
## sample estimates:
## mean in group 1 mean in group 2 
##           16.77           26.14

It is again observed that the mean increased when the dosage is increased (mean of group 1 is less than that of group 2)

Results

Thus it is observed that orange juice results in more growth than ascorbic acid. Added to this, increasing the dosage in both the cases increases the growth

Appendix

library(datasets)
library(ggplot2)
library(dplyr)
data(ToothGrowth)

hist(ToothGrowth$len, xlab="Length",main="histogram of length")

data1 <- group_by(ToothGrowth, ToothGrowth$supp) %>%
  summarise(
    count = n(),
    mean = mean(ToothGrowth$len, na.rm = TRUE),
    sd = sd(ToothGrowth$len, na.rm = TRUE)
  )
res <- t.test(ToothGrowth$len ~ ToothGrowth$supp, data = data1, var.equal = T)
res

t1 <- ToothGrowth[which(ToothGrowth$supp=="OJ" & ToothGrowth$dose<=1),]
t2 <- ToothGrowth[which(ToothGrowth$supp=="OJ" & ToothGrowth$dose>=1),]
t3 <- ToothGrowth[which(ToothGrowth$supp=="VC" & ToothGrowth$dose<=1),]
t4 <- ToothGrowth[which(ToothGrowth$supp=="VC" & ToothGrowth$dose>=1),]

dat1 <- group_by(t1, t1$dose) %>%
  summarise(
    count = n(),
    mean = mean(t1$len, na.rm = TRUE),
    sd = sd(t1$len, na.rm = TRUE)
  )
res1 <- t.test(t1$len ~ t1$dose, data = dat1, var.equal = T)
res1

dat2 <- group_by(t2, t2$dose) %>%
  summarise(
    count = n(),
    mean = mean(t2$len, na.rm = TRUE),
    sd = sd(t2$len, na.rm = TRUE)
  )
res2 <- t.test(t2$len ~ t2$dose, data = dat2, var.equal = T)
res2

dat3 <- group_by(t3, t3$dose) %>%
  summarise(
    count = n(),
    mean = mean(t3$len, na.rm = TRUE),
    sd = sd(t3$len, na.rm = TRUE)
  )

res3 <- t.test(t3$len ~ t3$dose, data = dat3, var.equal = T)
res3

dat4 <- group_by(t4, t4$dose) %>%
  summarise(
    count = n(),
    mean = mean(t4$len, na.rm = TRUE),
    sd = sd(t4$len, na.rm = TRUE)
  )

res4 <- t.test(t4$len ~ t4$dose, data = dat4, var.equal = T)
res4

Exploring and comparing the ToothGrowth R dataset

Eashani Deorukhkar

October 6, 2017