First, I make sure that everyone will be able to see the R code, set echo=“TRUE” for the whole document.

knitr:: opts_chunk$set(echo=TRUE, results = "asis", cache = TRUE)

This project is part 2 of course project for my Statistical Inference course that I took on Coursera. This is course number 6 out of 10 courses that I am taking for the Data Science Certificate from Johns Hopkins University through Cousera.

In this project, I am going to analyze the ToothGrowth data in the R datasets package.

First, Lets take a look at the data:

library(knitr)
kable(ToothGrowth[1:6,])
len supp dose
4.2 VC 0.5
11.5 VC 0.5
7.3 VC 0.5
5.8 VC 0.5
6.4 VC 0.5
10.0 VC 0.5

Summary of Toothgrowth data:

summary(ToothGrowth)
  len        supp         dose      

Min. : 4.20 OJ:30 Min. :0.500
1st Qu.:13.07 VC:30 1st Qu.:0.500
Median :19.25 Median :1.000
Mean :18.81 Mean :1.167
3rd Qu.:25.27 3rd Qu.:2.000
Max. :33.90 Max. :2.000

Boxplot of data:

par(mfrow = c(1,2))
boxplot(len~supp, ToothGrowth, xlab = "Supplement", ylab = "Tooth Length")
boxplot(len~dose, ToothGrowth, xlab = "Dose", ylab = "Tooth Length")

Hypotheses:

  1. Supplement OJ results in higher tooth length than supplement VC
  2. Dose 1 results in higher tooth length than dose 0.5
  3. Dose 2 results in higher tooth length than dose 1

Hypothesis 1

For Hypothesis 1, I do a t test for OJ and VC and find 95% confidential interval

t.test(ToothGrowth[ToothGrowth$supp == "OJ",1], ToothGrowth[ToothGrowth$supp == "VC",1])$conf

[1] -0.1710156 7.5710156 attr(,“conf.level”) [1] 0.95 Because 0 is in the confidential interval, I am rejecting the hypothesis 1.

Hypthesis 2

For Hypothesis 2, I do a t test for dose 0.5 and dose 1

t.test(ToothGrowth[ToothGrowth$dose == "0.5",1], ToothGrowth[ToothGrowth$dose == "1",1])$conf

[1] -11.983781 -6.276219 attr(,“conf.level”) [1] 0.95 Because 0 is NOT in the confidential interval, I am accepting hypothesis 2.

Hypthesis 3

For Hypothesis 2, I do a t test for dose 1 and dose 2

t.test(ToothGrowth[ToothGrowth$dose == "1",1], ToothGrowth[ToothGrowth$dose == "2",1])$conf

[1] -8.996481 -3.733519 attr(,“conf.level”) [1] 0.95

Because 0 is NOT in the confidential interval, I am accepting hypothesis 3.

Conclusion:

  1. Supplement OJ does NOT result in higher tooth length than supplement VC
  2. Dose 1 results in higher tooth length than dose 0.5
  3. Dose 2 results in higher tooth length than dose 1