In this report the ToothGrowth dataset in R was used to perform a basic inferential analysis, which has the following four parts:
1. Load the ToothGrowth data and perform some basic exploratory data analyses.
2. Provide a basic summary of the data.
3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.
4. Conclusions and assumptions.
ToothGrowth dataset and perform some basic exploratory data analysisLoad ToothGrowth dataset.
library(datasets)
library(ggplot2)
data(ToothGrowth)
Display the dataset’s structure.
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Display a summary of the dataset.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
The dataset has 60 observations. The variable len is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1 and 2) with each of two delivery methods (the variable supp, which is either orange juice or ascorbic acid).
Display the unique values for the variable len.
unique(ToothGrowth$len)
## [1] 4.2 11.5 7.3 5.8 6.4 10.0 11.2 5.2 7.0 16.5 15.2 17.3 22.5 13.6
## [15] 14.5 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5 23.3 29.5 17.6
## [29] 9.7 8.2 9.4 19.7 20.0 25.2 25.8 21.2 27.3 22.4 24.5 24.8 30.9 29.4
## [43] 23.0
Display the unique values for the variable dose.
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0
Display the unique values for the variable sup.
unique(ToothGrowth$sup)
## [1] VC OJ
## Levels: OJ VC
Convert the dose numerical variable into factors for plotting purposes.
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
Plot tooth length (len) vs. supplement delivery method (supp) by the dose amount (dose).
ggplot(data=ToothGrowth, aes(x=supp, y=len)) +
geom_boxplot(aes(fill=supp)) + xlab("Supplement Delivery") +
ylab("Tooth Length") + facet_grid(~ dose) +
ggtitle("Tooth Length vs. Delivery Method by Dose Amount") +
theme(plot.title = element_text(lineheight=.5, hjust=0.5, face="bold"))
Plot the tooth length (len) vs. the dose amount (dose) by the supplement delivery method (supp).
ggplot(data=ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot(aes(fill=dose)) + xlab("Dose Amount of Vitamin C") +
ylab("Tooth Length") + facet_grid(~ supp) +
ggtitle("Tooth Length vs. Dose Amount by Delivery Method") +
theme(plot.title = element_text(lineheight=.5, hjust=0.5, face="bold"))
supp and doseUse t-test to compare the tooth growth by supplement.
t.test(len ~ supp, data=ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
The p-value of this test is 0.06063, which is greater that than 0.05. The confidence interval of the test contains zero, which implies that the supplement type (orange juice vs. ascorbic acid) does not have a significant impact on the tooth growth.
Use t-test to compare the tooth growth as a function of dose. Analyze different pairs of the dose values.
dose = 0.5 vs. dose = 1.0tg.subset <- subset(ToothGrowth, ToothGrowth$dose %in% c(1.0,0.5))
t.test(len ~ dose, data=tg.subset)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
The p-value of this test is 1.268e-07, which is significantly greater that than 0.05. The confidence interval of the test does not contain zero, which implies that the doze difference between 0.5 and 1.0 has a significant impact on the tooth growth.
dose = 1.0 vs. dose = 2.0tg.subset <- subset(ToothGrowth, ToothGrowth$dose %in% c(1.0,2.0))
t.test(len ~ dose, data=tg.subset)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
The p-value of this test is 1.906-05, which is significantly greater that than 0.05. The confidence interval of the test does not contain zero, which implies that the doze difference between 1.0 and 2.0 has a significant impact on the tooth growth.
dose = 0.5 vs. dose = 2.0tg.subset <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5,2.0))
t.test(len ~ dose, data=tg.subset)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
The p-value of this test is 4.398e-14, which is significantly greater that than 0.05. The confidence interval of the test does not contain zero, which implies that the doze difference between 0.5 and 2.0 has a significant impact on the tooth growth.
The analysis assumed that
1. the sample is representative of the population,
2. guinea pigs are randomly assigned to different dose level categories and supplement type, and
3. the distribution of the sample means follows the Central Limit Theorem.
Based of the t-test analysis, it is concluded that
1. the tooth length is positively correlated with the dosage of vitamin C and
2. the supplement delivery method is found to have no significant effect.