by Richard Bagnall
July 2015
Six groups of ten guinea pigs were fed a diet supplemented with vitamin C at 0.5, 1.0 and 2.0 milligrams per day in the form of orange juice (OJ) or chemically pure vitamin C (VC). After 42 days, the length of the odontoblast cells were measured using optical microscopy. The tooth growth data therefore consist of 60 observations of the 3 variables: mean length of odontoblasts (microns), supplement type (OJ or VC) and vitamin C dose (miligrams/day).
Aim: Use R to explore the sturcture of the tooth growth data
# load data
df <-ToothGrowth
# show structure of the data
str(df)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Change ‘dose’ class from numeric to factor and show a summary of the data, split by supplement type.
# change dose to factor
df$dose <- as.factor(df$dose)
# summary of OJ and VC data
lapply(split(df$len, df$supp), function(x) {summary(x)})
## $OJ
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.20 15.52 22.70 20.66 25.72 30.90
##
## $VC
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 11.20 16.50 16.96 23.10 33.90
# compare variance of the OJ and VC data
lapply(split(df$len, df$supp), function(x) {var(x)})
## $OJ
## [1] 43.63344
##
## $VC
## [1] 68.32723
Plot the data as a box plot, with a dot plot overlay, split by supplement dose and coloured by supplement type (R code is shown in Appendix 1).
Figure 1. Boxplot of tooth growth data split by vitamin C dose. Black horizontal lines in boxplot shows the mean value; red circles show orange juice supplement data points; blue circles show chemically pure vitamin C supplement data points.
CONCLUSIONS OF EXPLORATORY DATA ANALYSIS.
Perform a to compare the odontoblast length between OJ and VC treated guinea pigs, and at each dose (see appendix 2 for R code).
Table 1. Comparison of Tooth Growth by Supplement Type
Supplement Comparison Group | 95% CI | p-value |
---|---|---|
OJ compared to VC (0.5 mg) | -8.781, -1.719 | 0.006 |
OJ compared to VC (1.0 mg) | -9.058, -2.802 | 0.001 |
OJ compared to VC (2.0 mg) | -3.638, 3.798 | 0.964 |
OJ compared to VC (full data set) | -0.171, 7.571 | 0.061 |
A 95% confidence interval measures the bounds where 95% of values fall within. Compared to chemically pure vitamin C, orange juice yields a shorter odontoblast length at a dose of 0.5 and 1.0 milligrams per day, since the interval is entirely below zero at 95% confidence. There is no difference in odontoblast length between vitamin C delivered in chemically pure form or as orange juice at a dose of 2.0 milligrams per day, since the interval contains zero at 95% confidence. In the overall data set, there is no difference in odontoblast length when vitamin C is delivered in chemically pure form or as orange juice as the confidence interval contains zero.
Perform a to compare the odontoblast length between different doses of suppelment (see appendix 3 for R code).
Table 2. Comparison of Tooth Growth by Vitamin C Dose
Supplement Comparison Group | 95% CI | p-value |
---|---|---|
0.5 mg compared to 1.0 mg (VC) | 6.314, 11.266 | 0.0000007 |
0.5 mg compared to 2.0 mg (VC) | 14.418, 21.902 | 0.00000005 |
1.0 mg compared to 2.0 mg (VC) | 5.686, 13.054 | 0.0000916 |
0.5 mg compared to 1.0 mg (OJ) | 5.524, 13.416 | 0.0000878 |
0.5 mg compared to 2.0 mg (OJ) | 9.325, 16.335 | 0.0000013 |
1.0 mg compared to 2.0 mg (OJ) | 0.189, 6.531 | 0.039 |
Since none of the intervals include zero in any comparisons, increasing the dose of vitamin C from 0.5 to 1.0 or 2.0 milligrams per day, delivered in either chemically pure form or as orange juice, increases the odontoblast length.
The following assumptions were made in the t tests:
# load ggplot2
suppressMessages(suppressWarnings(library(ggplot2)))
# box plot of data points
p <- ggplot(df, aes(x=dose, y=len, fill=supp)) + geom_boxplot()
# overlay dot points
p + geom_dotplot(binaxis='y',
stackdir='center',
position=position_dodge(0.75),
dotsize=0.5, binwidth=1)
# add title and legends
+ ggtitle("Distribution of Tooth Growth Data by Supplement Type")
+ xlab("Vitamin C dose (milligrams/day)")
+ ylab("Odontoblast length (microns)")
+ theme(plot.title = element_text(size = 14.0),
axis.title = element_text(size = 10.0))
+ guides(fill = guide_legend(title="Supplement type"))
+ scale_fill_discrete(labels=c("Orange Juice", "Pure Vitamin C"))
low_supp_conf <-t.test(df[c(1:10),1], df[c(31:40),1], paired=FALSE, var.equal = FALSE)$conf
low_supp_p <-t.test(df[c(1:10),1], df[c(31:40),1], paired=FALSE, var.equal = FALSE)$p.value
mid_supp_conf <-t.test(df[c(11:20),1], df[c(41:50),1], paired=FALSE, var.equal = FALSE)$conf
mid_supp_p <-t.test(df[c(11:20),1], df[c(41:50),1], paired=FALSE, var.equal = FALSE)$p.value
high_supp_conf <-t.test(df[c(21:30),1], df[c(51:60),1], paired=FALSE, var.equal = FALSE)$conf
high_supp_p <-t.test(df[c(21:30),1], df[c(51:60),1], paired=FALSE, var.equal = FALSE)$p.value
full_supp_conf <- t.test(len ~ supp, paired=FALSE, var.equal = FALSE, data=df)$conf
full_supp_p <- t.test(len ~ supp, paired=FALSE, var.equal = FALSE, data=df)$p.value
options(scipen=999)
vc_low_mid_conf <-t.test(df[c(11:20),1], df[c(1:10),1], paired=FALSE, var.equal = FALSE)$conf
vc_low_mid_p <-t.test(df[c(11:20),1], df[c(1:10),1], paired=FALSE, var.equal = FALSE)$p.value
vc_low_high_conf <-t.test(df[c(21:30),1], df[c(1:10),1], paired=FALSE, var.equal = FALSE)$conf
vc_low_high_p <-t.test(df[c(21:30),1], df[c(1:10),1], paired=FALSE, var.equal = FALSE)$p.value
vc_mid_high_conf <-t.test(df[c(21:30),1], df[c(11:20),1], paired=FALSE, var.equal = FALSE)$conf
vc_mid_high_p <-t.test(df[c(21:30),1], df[c(11:20),1], paired=FALSE, var.equal = FALSE)$p.value
oj_low_mid_conf <-t.test(df[c(41:50),1], df[c(31:40),1], paired=FALSE, var.equal = FALSE)$conf
oj_low_mid_p <-t.test(df[c(41:50),1], df[c(31:40),1], paired=FALSE, var.equal = FALSE)$p.value
oj_low_high_conf <-t.test(df[c(51:60),1], df[c(31:40),1], paired=FALSE, var.equal = FALSE)$conf
oj_low_high_p <-t.test(df[c(51:60),1], df[c(31:40),1], paired=FALSE, var.equal = FALSE)$p.value
oj_mid_high_conf <-t.test(df[c(51:60),1], df[c(41:50),1], paired=FALSE, var.equal = FALSE)$conf
oj_mid_high_p <-t.test(df[c(51:60),1], df[c(41:50),1], paired=FALSE, var.equal = FALSE)$p.value