First, I make sure that everyone will be able to see the R code, set echo=“TRUE” for the whole document.
knitr:: opts_chunk$set(echo=TRUE, results = "asis", cache = TRUE)
This project is part 2 of course project for my Statistical Inference course that I took on Coursera. This is course number 6 out of 10 courses that I am taking for the Data Science Certificate from Johns Hopkins University through Cousera.
In this project, I am going to analyze the ToothGrowth data in the R datasets package.
First, Lets take a look at the data:
library(knitr)
kable(ToothGrowth[1:6,])
| len | supp | dose |
|---|---|---|
| 4.2 | VC | 0.5 |
| 11.5 | VC | 0.5 |
| 7.3 | VC | 0.5 |
| 5.8 | VC | 0.5 |
| 6.4 | VC | 0.5 |
| 10.0 | VC | 0.5 |
summary(ToothGrowth)
len supp dose
Min. : 4.20 OJ:30 Min. :0.500
1st Qu.:13.07 VC:30 1st Qu.:0.500
Median :19.25 Median :1.000
Mean :18.81 Mean :1.167
3rd Qu.:25.27 3rd Qu.:2.000
Max. :33.90 Max. :2.000
par(mfrow = c(1,2))
boxplot(len~supp, ToothGrowth, xlab = "Supplement", ylab = "Tooth Length")
boxplot(len~dose, ToothGrowth, xlab = "Dose", ylab = "Tooth Length")
For Hypothesis 1, I do a t test for OJ and VC and find 95% confidential interval
t.test(ToothGrowth[ToothGrowth$supp == "OJ",1], ToothGrowth[ToothGrowth$supp == "VC",1])$conf
[1] -0.1710156 7.5710156 attr(,“conf.level”) [1] 0.95 Because 0 is in the confidential interval, I am rejecting the hypothesis 1.
For Hypothesis 2, I do a t test for dose 0.5 and dose 1
t.test(ToothGrowth[ToothGrowth$dose == "0.5",1], ToothGrowth[ToothGrowth$dose == "1",1])$conf
[1] -11.983781 -6.276219 attr(,“conf.level”) [1] 0.95 Because 0 is NOT in the confidential interval, I am accepting hypothesis 2.
For Hypothesis 2, I do a t test for dose 1 and dose 2
t.test(ToothGrowth[ToothGrowth$dose == "1",1], ToothGrowth[ToothGrowth$dose == "2",1])$conf
[1] -8.996481 -3.733519 attr(,“conf.level”) [1] 0.95
Because 0 is NOT in the confidential interval, I am accepting hypothesis 3.