There are two parts to this project
I will create a report to answer the questions presented in the project rubric. Given the nature of the series, I’ll use ‘knitr’ to create the report and convert to a pdf. Each pdf report will be no more than 3 pages with 3 pages of supporting appendix material if needed (code, figures, etc.).
In the second portion of this project, I am going to analyze the ToothGrowth data in the R datasets package.
Load the ToothGrowth data in the R datasets package and gain insight into the data by using the ‘head’, ‘tail’, and ‘str’ functions.
data("ToothGrowth")
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
tail(ToothGrowth)
## len supp dose
## 55 24.8 OJ 2
## 56 30.9 OJ 2
## 57 26.4 OJ 2
## 58 27.3 OJ 2
## 59 29.4 OJ 2
## 60 23.0 OJ 2
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Summarize the ToothGrowth data by using the ‘summary’ function.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
First load the appropiate package. Convert dose variable from numeric to factor then visualize tooth growth as a function of dose.
library(ggplot2)
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
ggplot(aes(x=dose, y=len), data=ToothGrowth) + geom_boxplot(aes(fill=dose))
Visualize tooth growth as a function of supplement type.
ggplot(aes(x=supp, y=len), data=ToothGrowth) + geom_boxplot(aes(fill=supp))
Check for group differences due to different supplement type. Assume unequal variances between the two groups.
t.test(len ~ supp, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
The above results indicate that the p-value is equal to 0.06 and the confidence interval contains zero. Thus, we fail to reject the null hypothesis that the different supplement types have no effect on tooth length.
Create three sub-groups per dose level pairs in order to check for group differences.
ToothGrowth.doses_0.5_1.0 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
ToothGrowth.doses_0.5_2.0 <- subset(ToothGrowth, dose %in% c(0.5, 2.0))
ToothGrowth.doses_1.0_2.0 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))
Check for group differences due to different dose levels of (0.5, 1.0). Assume unequal variances between the two groups.
t.test(len ~ dose, data = ToothGrowth.doses_0.5_1.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
Check for group differences due to different dose levels of (0.5, 2.0). Assume unequal variances between the two groups.
t.test(len ~ dose, data = ToothGrowth.doses_0.5_2.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
Check for group differences due to different dose levels of (1.0, 2.0). Assume unequal variances between the two groups.
t.test(len ~ dose, data = ToothGrowth.doses_1.0_2.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
For all three of the above t-tests, the resulting p-value is less than 0.5 and the confidence intervals do not contain zero. Thus, we reject the null hypothesis, and establish that increasing the dose level leads to an increase in tooth length.
Conclusions
Assumptions