The analyzed dataset studies the effect of Vitamin C on Tooth Growth in Guinea Pigs. The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid.
First I load the data, clarify the names of the columns and the supplement method/source, make sure that the dose is handled as factor and finally summarize the basic properties of the dataframe.
data(ToothGrowth)
names(ToothGrowth)<-c("length", "supplement", "dose")
levels(ToothGrowth$supplement)<-c("orange juice", "ascorbic acid")
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
summary(ToothGrowth)
## length supplement dose
## Min. : 4.20 orange juice :30 0.5:20
## 1st Qu.:13.07 ascorbic acid:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
I prepare a figure to show both the effects of delivery methods, orange juice or ascorbic acid and the Vitamin C dose levels. In this figure all boxes are made based on 10-10 data points.
library(ggplot2)
ggplot(data=ToothGrowth, aes(x=dose, y=length, fill=supplement)) +
geom_boxplot() +
ggtitle("Tooth Growth for the Effect of Vitamin C") +
xlab("dose levels (mg/day)") +
ylab("length of odontoblasts")
The dataset consist of 60 rows of the studied 3 variables. An exploratory analysis indicates that the length variable is in a connection with the vitamin C dose. At dose levels of 0.5 and 1 mg/day a difference between the effects of supplement delivery methods can be also observed. In these cases orange juice seems more effective than ascorbic acid. At dose level of 2 mg/day the odontoblasts seem around 2.5 times longer than at dose level of 0.5 mg/day regardless of the delivery method.
Statistical hypothesis testing can confirm 1, if there is an effect of vitamin C on tooth growth and 2, if there is a difference in the effectiveness of the two delivery methods: orange juice or ascorbic acid. It can also show 3, if 1 mg/day from orange juice has the same effectiveness as 2 mg/day ascorbic acid (as the figure indicates).
Since low sample numbers are available (10/group), Student’s t-Tests are applied for the comparisons of two groups. These tests assume that the populations follow normal distributions (so the samples follow Student’s t distributions) and that the samples are independent. The two population variances are not assumed to be equal (Welch’s t-test).
When p-value ??? 0.05, the null hypothesis (equality of means) cannot be rejected. When p-value < 0.05, the null hypothesis is rejected and the alternative hypothesis is accepted, “the true difference in means is not equal to 0”.
p-value = 8.785e-05 There is a statistically significant difference.
t.test(ToothGrowth$length[ToothGrowth$dose==0.5 & ToothGrowth$supplement=="orange juice"], ToothGrowth$length[ToothGrowth$dose==1 & ToothGrowth$supplement=="orange juice"])
##
## Welch Two Sample t-test
##
## data: ToothGrowth$length[ToothGrowth$dose == 0.5 & ToothGrowth$supplement == and ToothGrowth$length[ToothGrowth$dose == 1 & ToothGrowth$supplement == "orange juice"] and "orange juice"]
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.415634 -5.524366
## sample estimates:
## mean of x mean of y
## 13.23 22.70
p-value = 0.0392 There is a statistically significant difference (however, very close to 0.05).
t.test(ToothGrowth$length[ToothGrowth$dose==1 & ToothGrowth$supplement=="orange juice"], ToothGrowth$length[ToothGrowth$dose==2 & ToothGrowth$supplement=="orange juice"])
##
## Welch Two Sample t-test
##
## data: ToothGrowth$length[ToothGrowth$dose == 1 & ToothGrowth$supplement == and ToothGrowth$length[ToothGrowth$dose == 2 & ToothGrowth$supplement == "orange juice"] and "orange juice"]
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.5314425 -0.1885575
## sample estimates:
## mean of x mean of y
## 22.70 26.06
p-value = 6.811e-07 There is a statistically significant difference.
t.test(ToothGrowth$length[ToothGrowth$dose==0.5 & ToothGrowth$supplement=="ascorbic acid"], ToothGrowth$length[ToothGrowth$dose==1 & ToothGrowth$supplement=="ascorbic acid"])
##
## Welch Two Sample t-test
##
## data: ToothGrowth$length[ToothGrowth$dose == 0.5 & ToothGrowth$supplement == and ToothGrowth$length[ToothGrowth$dose == 1 & ToothGrowth$supplement == "ascorbic acid"] and "ascorbic acid"]
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.265712 -6.314288
## sample estimates:
## mean of x mean of y
## 7.98 16.77
p-value = 9.156e-05 There is a statistically significant difference.
t.test(ToothGrowth$length[ToothGrowth$dose==1 & ToothGrowth$supplement=="ascorbic acid"], ToothGrowth$length[ToothGrowth$dose==2 & ToothGrowth$supplement=="ascorbic acid"])
##
## Welch Two Sample t-test
##
## data: ToothGrowth$length[ToothGrowth$dose == 1 & ToothGrowth$supplement == and ToothGrowth$length[ToothGrowth$dose == 2 & ToothGrowth$supplement == "ascorbic acid"] and "ascorbic acid"]
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.054267 -5.685733
## sample estimates:
## mean of x mean of y
## 16.77 26.14
p-value = 0.006359 There is a statistically significant difference.
t.test(ToothGrowth$length[ToothGrowth$dose==0.5 & ToothGrowth$supplement=="orange juice"], ToothGrowth$length[ToothGrowth$dose==0.5 & ToothGrowth$supplement=="ascorbic acid"])
##
## Welch Two Sample t-test
##
## data: ToothGrowth$length[ToothGrowth$dose == 0.5 & ToothGrowth$supplement == and ToothGrowth$length[ToothGrowth$dose == 0.5 & ToothGrowth$supplement == "orange juice"] and "ascorbic acid"]
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean of x mean of y
## 13.23 7.98
p-value = 0.001038 There is a statistically significant difference.
t.test(ToothGrowth$length[ToothGrowth$dose==1 & ToothGrowth$supplement=="orange juice"], ToothGrowth$length[ToothGrowth$dose==1 & ToothGrowth$supplement=="ascorbic acid"])
##
## Welch Two Sample t-test
##
## data: ToothGrowth$length[ToothGrowth$dose == 1 & ToothGrowth$supplement == and ToothGrowth$length[ToothGrowth$dose == 1 & ToothGrowth$supplement == "orange juice"] and "ascorbic acid"]
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean of x mean of y
## 22.70 16.77
p-value = 0.9639 The null hypothesis cannot be rejected, the two means are equal.
t.test(ToothGrowth$length[ToothGrowth$dose==2 & ToothGrowth$supplement=="orange juice"], ToothGrowth$length[ToothGrowth$dose==2 & ToothGrowth$supplement=="ascorbic acid"])
##
## Welch Two Sample t-test
##
## data: ToothGrowth$length[ToothGrowth$dose == 2 & ToothGrowth$supplement == and ToothGrowth$length[ToothGrowth$dose == 2 & ToothGrowth$supplement == "orange juice"] and "ascorbic acid"]
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean of x mean of y
## 26.06 26.14
p-value = 0.09653 The null hypothesis cannot be rejected, the two means are not statistically significantly different from each other.
t.test(ToothGrowth$length[ToothGrowth$dose==1 & ToothGrowth$supplement=="orange juice"], ToothGrowth$length[ToothGrowth$dose==2 & ToothGrowth$supplement=="ascorbic acid"])
##
## Welch Two Sample t-test
##
## data: ToothGrowth$length[ToothGrowth$dose == 1 & ToothGrowth$supplement == and ToothGrowth$length[ToothGrowth$dose == 2 & ToothGrowth$supplement == "orange juice"] and "ascorbic acid"]
## t = -1.7574, df = 17.297, p-value = 0.09653
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7.5643336 0.6843336
## sample estimates:
## mean of x mean of y
## 22.70 26.14
Both exploratory data analyses and statistical hypothesis testing confirm the following statements: