Overview

In this project we will analyze the effect of vitamin C on tooth growth in guinea pigs.

Exploratory Data Analyses and Data Summary

Let’s examine our dataset. (See code in Appendix 1)

## Observations: 60
## Variables: 3
## $ len  (dbl) 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 5.2, 7.0, 16....
## $ supp (fctr) VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, ...
## $ dose (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1....
## NULL

Basic statistics
len	supp	dose
Min. : 4.20	OJ:30	Min. :0.500
1st Qu.:13.07	VC:30	1st Qu.:0.500
Median :19.25	NA	Median :1.000
Mean :18.81	NA	Mean :1.167
3rd Qu.:25.27	NA	3rd Qu.:2.000
Max. :33.90	NA	Max. :2.000

We have 60 observations on 3 variables:

len - tooth length (from 4.2 to 33.9 with mean 18.81 and median 19.25).
supp - two delivery methods (orange juice or ascorbic acid).
dose - three dose levels of vitamin C (0.5, 1, and 2 mg).

The length of the teeth has the following distribution:

Tooth growth by delivery method and dose

We want analyze the effect of delivery method and dose of vitamin C on tooth growth. At first let’s check effect of delivery method. Null hypothesis (H0): the means between the two groups of supplement types are equal. Alternative hypothesis (H1): the means are different.
Each group has 30 observations and the length of the teeth is not normal distributed, therefore we will use t-test. (See code in Appendix 2)

Welch Two Sample t-test: `len` by `supp`
Test statistic	df	P value	Alternative hypothesis
1.915	55.31	0.06063	two.sided

The p-value > 0.05, thus we do not have enough evidence at the 5% significance level to reject the null hypothesis. Thus delivery method does not have an effect on tooth length. Let’s visualize the differences.

Now let’s compare different doses of vitamin C. Again, we pairwise compare group means using t-test. (See code in Appendix 3)

95% confidence interval of differences in group means
	lower	upper	p.value
0.5mg - 1mg	-11.98	-6.28	1.268e-07
1mg - 2mg	-9	-3.73	1.906e-05
0.5mg - 2mg	-18.16	-12.83	4.4e-14

In each pair, a 95% confidence interval does not include zero and therefore we can reject the null hypothesis about equality of means. Very low p-value (<0.001) indicates that differences are statistically significant with high confidence. Hence the length of the teeth is different at different doses of vitamin C. Let’s visualize it.

Summary

Delivery method does not have an effect on tooth length.
Dosage level of vitamin C has a positive effect on tooth length.

Appendix 1

# load library and data
library(dplyr)
library(ggplot2)
library(pander)
library(datasets)
data("ToothGrowth")

# examine dataset
print(glimpse(ToothGrowth))

# basic statistics
pander(summary(ToothGrowth), caption = 'Basic statistics', justify='right')

# distribution of tooth length
g <-ggplot(ToothGrowth, aes(x = len)) +
    geom_histogram(alpha = .4, binwidth = 2, colour = "black", fill = 'blue') +
    scale_x_continuous(breaks = seq(from=0, to=50, by=5),
                       name = 'tooth length') +
    scale_y_continuous(breaks = seq(from=0, to=10, by=1),
                       name = 'count') +
    ggtitle('Distribution of tooth length of 60 guinea pigs')
print(g)

Appendix 2

# t-test
pander(t.test(len ~ supp, data = ToothGrowth, paired = FALSE))

# distribution of tooth length by delivery method
g <-ggplot(ToothGrowth, aes(x = len)) +
    geom_histogram(alpha = .4, binwidth = 2, colour = "black", fill = 'blue') +
    scale_x_continuous(breaks = seq(from=0, to=50, by=5),
                       name = 'tooth length') +
    scale_y_continuous(breaks = seq(from=0, to=10, by=1),
                       name = 'count') +
    ggtitle('Distribution of tooth length of 60 guinea pigs')
print(g)

Appendix 3

# t-test
df1 <- subset(ToothGrowth, dose %in% c(0.5, 1))
df2 <- subset(ToothGrowth, dose %in% c(1, 2))
df3 <- subset(ToothGrowth, dose %in% c(0.5, 2))

test1 <- t.test(len ~ factor(dose), data = df1, paired = FALSE)
test2 <- t.test(len ~ factor(dose), data = df2, paired = FALSE)
test3 <- t.test(len ~ factor(dose), data = df3, paired = FALSE)

stat <- data.frame(lower = c(test1$conf.int[1], test2$conf.int[1], test3$conf.int[1]), upper = c(test1$conf.int[2], test2$conf.int[2], test3$conf.int[2]), p.value = c(test1$p.value, test2$p.value, test3$p.value), row.names = c('0.5mg - 1mg', '1mg - 2mg', '0.5mg - 2mg'))
pander(stat, caption = '95% confidence interval of differences in group means',
       justify='right', round=c(2,2,15))


# distribution of tooth length by dose of vitamin C
g <-ggplot(ToothGrowth, aes(factor(dose), len)) +
    geom_violin(aes(fill = factor(dose))) +
    scale_y_continuous(breaks = seq(from=0, to=50, by=5),
                       name = 'tooth length') +
    scale_x_discrete(name = 'dose of vitamin C, mg') +
    ggtitle('Distribution of tooth length by dose of vitamin C') +
    guides(fill=FALSE)
print(g)

Effect of vitamin C on tooth growth in guinea pigs

Roman Shmyrev