This project investigates the tooth growth data set in R, looking at summary statistics and confidence intervals/hypothesis tests. The data relates to the length of odontoblasts in 60 guinea pigs. Each guinea pig received one of three doses of Vitamin C (0.5, 1.0, or 2.0 mg/day) by one of two delivery methods (orange juice or ascorbic acid).
data(ToothGrowth)
rows <- nrow(ToothGrowth)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
png("figure/len_dose_histogram.png", width = 800, height = 480)
par(mfrow=c(1,2))
a <- hist(ToothGrowth$len,
main = "Histogram of Tooth Growth Length",
xlab = "Length",
ylab = "Frequency")
b <- hist(ToothGrowth$dose,
main = "Histogram of Tooth Growth Dose",
xlab = "Dose",
ylab = "Frequency")
dev.off()
## quartz_off_screen
## 2
As you can see, there are 60 of records, with len, supp, and dose as the fields. 30 records have OJ as the supp field value, and 30 have VC. Dose ranges from 0.5 to 2.0, and len ranges from 4.2 to 33.9.
We will also break down average growth by the two fields using aggregate.
aggregate(len ~ supp, data = ToothGrowth, FUN = "mean")
## supp len
## 1 OJ 20.66333
## 2 VC 16.96333
aggregate(len ~ dose, data = ToothGrowth, FUN = "mean")
## dose len
## 1 0.5 10.605
## 2 1.0 19.735
## 3 2.0 26.100
In this section, we will investigate tooth growth by Supp and Dose using confidence intervals. Based on the summary previously performed, these certainly seem like major differences, but we'll use hypothesis testing to gain more information.
First, we will test the hypothesis that there is a correlation between supp and len.
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
Based on the Student's T Test, the p-value is 0.06 and the 95% confidence interval contains zero, so we cannot reject the null hypothesis that there is no effect betwee the supplements.
Now we will subset dose and perform the same test.
dose_0.5_1.0 <- subset (ToothGrowth, dose %in% c(0.5, 1.0))
dose_0.5_2.0 <- subset (ToothGrowth, dose %in% c(0.5, 2.0))
dose_1.0_2.0 <- subset (ToothGrowth, dose %in% c(1.0, 2.0))
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = dose_0.5_1.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = dose_0.5_2.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
t.test(len ~ dose, paired = FALSE, var.equal = FALSE, data = dose_1.0_2.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
For each of these comparisons, the p-value is below 0.5 (in fact close to zero) and the 95% confidence intervals do not contain zero, so we can reject the null hypothesis and say that increasing the dose results in increasing tooth length.
We can conclude that different doses leads to increased tooth growth, but we cannot say that different supplements leads to any difference in tooth growth.
For this analysis, we assume that the sample of 60 guinea pigs are representative of the universe of guinea pigs and that the animals were randomly assigned to the given doses and delivery methods. Additionally, for the t-test, we assume that the variances are not equal.