In this project we will analyze the effect of vitamin C on tooth growth in guinea pigs.
Let’s examine our dataset. (See code in Appendix 1)
## Observations: 60
## Variables: 3
## $ len (dbl) 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 5.2, 7.0, 16....
## $ supp (fctr) VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, ...
## $ dose (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1....
## NULL
len | supp | dose |
---|---|---|
Min. : 4.20 | OJ:30 | Min. :0.500 |
1st Qu.:13.07 | VC:30 | 1st Qu.:0.500 |
Median :19.25 | NA | Median :1.000 |
Mean :18.81 | NA | Mean :1.167 |
3rd Qu.:25.27 | NA | 3rd Qu.:2.000 |
Max. :33.90 | NA | Max. :2.000 |
We have 60 observations on 3 variables:
The length of the teeth has the following distribution:
We want analyze the effect of delivery method and dose of vitamin C on tooth growth. At first let’s check effect of delivery method. Null hypothesis (H0): the means between the two groups of supplement types are equal. Alternative hypothesis (H1): the means are different.
Each group has 30 observations and the length of the teeth is not normal distributed, therefore we will use t-test. (See code in Appendix 2)
Test statistic | df | P value | Alternative hypothesis |
---|---|---|---|
1.915 | 55.31 | 0.06063 | two.sided |
The p-value > 0.05, thus we do not have enough evidence at the 5% significance level to reject the null hypothesis. Thus delivery method does not have an effect on tooth length. Let’s visualize the differences.
Now let’s compare different doses of vitamin C. Again, we pairwise compare group means using t-test. (See code in Appendix 3)
lower | upper | p.value | |
---|---|---|---|
0.5mg - 1mg | -11.98 | -6.28 | 1.268e-07 |
1mg - 2mg | -9 | -3.73 | 1.906e-05 |
0.5mg - 2mg | -18.16 | -12.83 | 4.4e-14 |
In each pair, a 95% confidence interval does not include zero and therefore we can reject the null hypothesis about equality of means. Very low p-value (<0.001) indicates that differences are statistically significant with high confidence. Hence the length of the teeth is different at different doses of vitamin C. Let’s visualize it.
# load library and data
library(dplyr)
library(ggplot2)
library(pander)
library(datasets)
data("ToothGrowth")
# examine dataset
print(glimpse(ToothGrowth))
# basic statistics
pander(summary(ToothGrowth), caption = 'Basic statistics', justify='right')
# distribution of tooth length
g <-ggplot(ToothGrowth, aes(x = len)) +
geom_histogram(alpha = .4, binwidth = 2, colour = "black", fill = 'blue') +
scale_x_continuous(breaks = seq(from=0, to=50, by=5),
name = 'tooth length') +
scale_y_continuous(breaks = seq(from=0, to=10, by=1),
name = 'count') +
ggtitle('Distribution of tooth length of 60 guinea pigs')
print(g)
# t-test
pander(t.test(len ~ supp, data = ToothGrowth, paired = FALSE))
# distribution of tooth length by delivery method
g <-ggplot(ToothGrowth, aes(x = len)) +
geom_histogram(alpha = .4, binwidth = 2, colour = "black", fill = 'blue') +
scale_x_continuous(breaks = seq(from=0, to=50, by=5),
name = 'tooth length') +
scale_y_continuous(breaks = seq(from=0, to=10, by=1),
name = 'count') +
ggtitle('Distribution of tooth length of 60 guinea pigs')
print(g)
# t-test
df1 <- subset(ToothGrowth, dose %in% c(0.5, 1))
df2 <- subset(ToothGrowth, dose %in% c(1, 2))
df3 <- subset(ToothGrowth, dose %in% c(0.5, 2))
test1 <- t.test(len ~ factor(dose), data = df1, paired = FALSE)
test2 <- t.test(len ~ factor(dose), data = df2, paired = FALSE)
test3 <- t.test(len ~ factor(dose), data = df3, paired = FALSE)
stat <- data.frame(lower = c(test1$conf.int[1], test2$conf.int[1], test3$conf.int[1]), upper = c(test1$conf.int[2], test2$conf.int[2], test3$conf.int[2]), p.value = c(test1$p.value, test2$p.value, test3$p.value), row.names = c('0.5mg - 1mg', '1mg - 2mg', '0.5mg - 2mg'))
pander(stat, caption = '95% confidence interval of differences in group means',
justify='right', round=c(2,2,15))
# distribution of tooth length by dose of vitamin C
g <-ggplot(ToothGrowth, aes(factor(dose), len)) +
geom_violin(aes(fill = factor(dose))) +
scale_y_continuous(breaks = seq(from=0, to=50, by=5),
name = 'tooth length') +
scale_x_discrete(name = 'dose of vitamin C, mg') +
ggtitle('Distribution of tooth length by dose of vitamin C') +
guides(fill=FALSE)
print(g)