Overview

In this project we will analyze the effect of vitamin C on tooth growth in guinea pigs.

Exploratory Data Analyses and Data Summary

Let’s examine our dataset. (See code in Appendix 1)

## Observations: 60
## Variables: 3
## $ len  (dbl) 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 5.2, 7.0, 16....
## $ supp (fctr) VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, ...
## $ dose (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1....
## NULL
Basic statistics
len supp dose
Min. : 4.20 OJ:30 Min. :0.500
1st Qu.:13.07 VC:30 1st Qu.:0.500
Median :19.25 NA Median :1.000
Mean :18.81 NA Mean :1.167
3rd Qu.:25.27 NA 3rd Qu.:2.000
Max. :33.90 NA Max. :2.000

We have 60 observations on 3 variables:

The length of the teeth has the following distribution:

Tooth growth by delivery method and dose

We want analyze the effect of delivery method and dose of vitamin C on tooth growth. At first let’s check effect of delivery method. Null hypothesis (H0): the means between the two groups of supplement types are equal. Alternative hypothesis (H1): the means are different.
Each group has 30 observations and the length of the teeth is not normal distributed, therefore we will use t-test. (See code in Appendix 2)

Welch Two Sample t-test: len by supp
Test statistic df P value Alternative hypothesis
1.915 55.31 0.06063 two.sided

The p-value > 0.05, thus we do not have enough evidence at the 5% significance level to reject the null hypothesis. Thus delivery method does not have an effect on tooth length. Let’s visualize the differences.

Now let’s compare different doses of vitamin C. Again, we pairwise compare group means using t-test. (See code in Appendix 3)

95% confidence interval of differences in group means
  lower upper p.value
0.5mg - 1mg -11.98 -6.28 1.268e-07
1mg - 2mg -9 -3.73 1.906e-05
0.5mg - 2mg -18.16 -12.83 4.4e-14

In each pair, a 95% confidence interval does not include zero and therefore we can reject the null hypothesis about equality of means. Very low p-value (<0.001) indicates that differences are statistically significant with high confidence. Hence the length of the teeth is different at different doses of vitamin C. Let’s visualize it.

Summary

  1. Delivery method does not have an effect on tooth length.
  2. Dosage level of vitamin C has a positive effect on tooth length.

Appendix 1

# load library and data
library(dplyr)
library(ggplot2)
library(pander)
library(datasets)
data("ToothGrowth")

# examine dataset
print(glimpse(ToothGrowth))

# basic statistics
pander(summary(ToothGrowth), caption = 'Basic statistics', justify='right')

# distribution of tooth length
g <-ggplot(ToothGrowth, aes(x = len)) +
    geom_histogram(alpha = .4, binwidth = 2, colour = "black", fill = 'blue') +
    scale_x_continuous(breaks = seq(from=0, to=50, by=5),
                       name = 'tooth length') +
    scale_y_continuous(breaks = seq(from=0, to=10, by=1),
                       name = 'count') +
    ggtitle('Distribution of tooth length of 60 guinea pigs')
print(g)

Appendix 2

# t-test
pander(t.test(len ~ supp, data = ToothGrowth, paired = FALSE))

# distribution of tooth length by delivery method
g <-ggplot(ToothGrowth, aes(x = len)) +
    geom_histogram(alpha = .4, binwidth = 2, colour = "black", fill = 'blue') +
    scale_x_continuous(breaks = seq(from=0, to=50, by=5),
                       name = 'tooth length') +
    scale_y_continuous(breaks = seq(from=0, to=10, by=1),
                       name = 'count') +
    ggtitle('Distribution of tooth length of 60 guinea pigs')
print(g)

Appendix 3

# t-test
df1 <- subset(ToothGrowth, dose %in% c(0.5, 1))
df2 <- subset(ToothGrowth, dose %in% c(1, 2))
df3 <- subset(ToothGrowth, dose %in% c(0.5, 2))

test1 <- t.test(len ~ factor(dose), data = df1, paired = FALSE)
test2 <- t.test(len ~ factor(dose), data = df2, paired = FALSE)
test3 <- t.test(len ~ factor(dose), data = df3, paired = FALSE)

stat <- data.frame(lower = c(test1$conf.int[1], test2$conf.int[1], test3$conf.int[1]), upper = c(test1$conf.int[2], test2$conf.int[2], test3$conf.int[2]), p.value = c(test1$p.value, test2$p.value, test3$p.value), row.names = c('0.5mg - 1mg', '1mg - 2mg', '0.5mg - 2mg'))
pander(stat, caption = '95% confidence interval of differences in group means',
       justify='right', round=c(2,2,15))


# distribution of tooth length by dose of vitamin C
g <-ggplot(ToothGrowth, aes(factor(dose), len)) +
    geom_violin(aes(fill = factor(dose))) +
    scale_y_continuous(breaks = seq(from=0, to=50, by=5),
                       name = 'tooth length') +
    scale_x_discrete(name = 'dose of vitamin C, mg') +
    ggtitle('Distribution of tooth length by dose of vitamin C') +
    guides(fill=FALSE)
print(g)