This is the course project Part2 for the statistical inference from Coursera. Now in the second portion of the project, I am going to analyze the ToothGrowth data in the R datasets package.
1.0 Tooth Data Load
library(datasets)
library(ggplot2)
data(ToothGrowth)
1.1 Data Summary and basic exploratory data analyses
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot(aes(fill=factor(dose))) +
geom_point() + facet_grid(.~supp) +
ggtitle("Tooth Growth by Supplement and Dosage")
1.2 Hypothesis tests to compare tooth growth by supp.
#H0 : Mean_VC = Mean_OJ
#H1 : Mean_VC is not equal to Mean_OJ
t.test(len ~ supp, data=ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
Conclusion : The p-value of the test is 0.06 and the significance level is 5%. In additon, the confidence interval contains zero. Based on the fact the p-value > 0.05 and the confidence interval contains zero, it is hard to reject the null hypothesis. Supplement methods do not have any impact on tooth growth.
1.3 Hypothesis tests to compare tooth growth by dose.
#H0: Mean_levels are same regardless of doses.
#H1: Mean_levels are affected by the amount of doses.
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
dose_0.5_1.0 <- subset (ToothGrowth, dose %in% c(0.5, 1.0))
dose_0.5_2.0 <- subset (ToothGrowth, dose %in% c(0.5, 2.0))
dose_1.0_2.0 <- subset (ToothGrowth, dose %in% c(1.0, 2.0))
# assuming unequal variances between the two groups
t.test(len ~ dose, data = dose_0.5_1.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
t.test(len ~ dose, data = dose_0.5_2.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
t.test(len ~ dose, data = dose_1.0_2.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
Conclusion : For all three dose level pairs, the p-value is less than 0.05. Therefore we can reject null hypothesis. Based on the analysis, we can conclude that increasing doses affect the increase in tooth length.
Based on the above analyses, the supplement, OJ or VC were independently and identically distributed among the subjects, so supplement methods do not have any impact on tooth growth. However when applied with differenct doses would have a significant impact on the tooth growth.