Overview
This project is part 2 of the Statistical Inference course project by Coursera. The goal of this project is to do an exploratory and basic inferential data analysis. The dataset for this project is the ToothGrowth dataset in R.
Load and Read the data
library(datasets)
data("ToothGrowth")
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
Summary of the data
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
Exploratory data analysis
library(ggplot2)
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
ggplot(ToothGrowth, aes(x = supp, y = len)) +
geom_boxplot(aes(fill = supp))
ggplot(ToothGrowth, aes(x = dose, y = len, fill = supp)) +
geom_bar(stat = "identity") +
facet_grid(. ~ supp)
ggplot(ToothGrowth, aes(x = supp, y = len)) +
geom_boxplot(aes(fill = supp)) +
facet_wrap(~dose)
Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose
t.test(len ~ supp, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == 0.5, ])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == 1.0, ])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == 2.0, ])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
Conclusions
Since the p-values for dose 0.5 and 1.0 is very small, we reject the null hypotehsis and conclude that there is a significant correlation between the dose levels and tooth length. However, the p-value for dose 2.0 is too large so we fail to reject the null hypothesis and conclude that dose 2.0 does not have a significant effect on tooth length. Finally, the p-value for supplements is too large as well so we also conclude that supplements do not have a significant effect in tooth length.
Assumptions