The ToothGrowth data from the datasets library contains data on the length response of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (a form of vitamin C and coded as VC).
Sys.setlocale("LC_TIME", "English")
library(datasets)
library(ggplot2)
library(dplyr)
We firs inspect the data. it has 60 rows and 3 variables:
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
len is the length, a numeric variable,supp is the delivery method, a factor variable. OJ
stands for orange juice and VC for ascorbic acid,dose is the dose levels of vitamin C: 0.5, 1, and 2
mg/daytable(ToothGrowth$supp, ToothGrowth$dose)
##
## 0.5 1 2
## OJ 10 10 10
## VC 10 10 10
Each supp/dose combination was tested on 10 guinea pigs.
Let us now consider the effect of the dose and the delivery method on
the length variable. Figure 1 shows six box plots of the
len variable comparing the two delivery methods in each
dose level. Just from looking at the box-plots, we see draw some
conclusions
ggplot(data=ToothGrowth, mapping = aes(y=len, colour = supp)) +
geom_boxplot() +
facet_grid(.~dose) +
theme_bw() +
labs(title = "Figure 1: Boxplot of the length variable for \neach delivery method/dose level pair")
supp = OC, as the dose increases the length
moves upwardssupp = VC, as the dose increases the length
moves upwardsWe store some statistical values on the box-plots together with the
mean and sample standard deviation and sample length in the data frame
summ. And for each dose-supp
pair, we also calculate the .95% t-confidence interval for the respected
mean.
summ <- ToothGrowth %>%
summarize(min = min(len),
median = median(len),
mean = mean(len),
max = max(len),
std = sd(len),
n = length(len),
.by = c(dose, supp)) %>%
mutate(stderr = std/sqrt(n),
t_score = qt(.975, n-1),
ci_lower = mean - t_score*stderr,
ci_upper = mean + t_score*stderr)
summ[order(summ$dose),]
We see that the two confidence intervals in case of dose = 2 are overlapping, and the sample means are near to each other.It is worth performing a hypothesis test to give a statistical proof whether in such big dosage the delivery methods lead to the same mean growth length.
\(H_0\): In case of dose = 2 the two means are equal: \[E(len | supp=VC, dose=2) = E(len | supp=OJ, dose=2)\].
\(H_a\): In case of dose = 2 the two means are not equal: \[E(len | supp=VC, dose=2) > E(len | supp=OJ, dose=2)\].
We perform a t-test assuming not equal variances.
with(ToothGrowth[ToothGrowth$dose == 2.0,],
t.test(len[supp=="VC"], len[supp=="OJ"], alternative = "greater"))
##
## Welch Two Sample t-test
##
## data: len[supp == "VC"] and len[supp == "OJ"]
## t = 0.046136, df = 14.04, p-value = 0.4819
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -2.9735 Inf
## sample estimates:
## mean of x mean of y
## 26.14 26.06
Because the confidence interval contains 0 and the p-value is 0.4 > 0.05, we fail to reject our \(H_0\). This means that based on the data in case of dose = 2 the delivery methods lead to the same mean growth length.
We also test whether in case of the orange juice delivery method the dose is relevant.
\(H_0\): In case of supp = OJ the means when dose = 1 and dose = 2 are equal: \[E(len | supp=OJ, dose=2) = E(len | supp=OJ, dose=1)\].
\(H_a\): In case of supp = OJ the means when dose = 1 and dose = 2 are not equal: \[E(len | supp=OJ, dose=2) > E(len | supp=OJ, dose=1)\].
Again, we perform a t-test assuming not equal variances.
oj_data <- subset(ToothGrowth, supp == "OJ" & dose %in% c(1, 2))
oj_data$dose <- factor(oj_data$dose, levels = c(2, 1))
t.test(len ~ dose, data = oj_data, alternative = "greater")
##
## Welch Two Sample t-test
##
## data: len by dose
## t = 2.2478, df = 15.842, p-value = 0.0196
## alternative hypothesis: true difference in means between group 2 and group 1 is greater than 0
## 95 percent confidence interval:
## 0.7486236 Inf
## sample estimates:
## mean in group 2 mean in group 1
## 26.06 22.70
Because the confidence interval does not contain 0, and the p-value less tan 0.05, we reject \(H_0\). This means that the difference in the mean growth length in case of dose 1 and 2 are significant.
So we can conclude that in case of both delivery methods higher dosage leads to greater tooth growth length. Comparing the two delivery methods, as height dosage as 2 does not result in a statistically difference in growth length. However, for lower dosage, orange juice seems more effective than ascorbic acid.