The ToothGrowth
dataset represents a study of the effect
of Vitamin C on tooth growth in 60 guinea pigs.
The length of odontoblasts (cells responsible for tooth growth) is the response variable. Each animal was given one of three vitamin C dose levels (0.5, 1, and 2 mg/day) via one of two delivery methods: orange juice or ascorbic acid (a form of vitamin C and coded as VC), making 10 observations for each dosage/supplement combination.
The report finds that dosage levels have a significant impact on teeth length. Looking at all dosage levels, the supplement type could not be determined to be significant, however, breaking the data down to dosage levels, it finds that there is significant impact at dosage levels of 0.5 and 1.0 mg/day.
Where not included inline, all code for the following report can be found in the Appendix.
The ToothGrowth
dataset can be accessed via the
data
function in R and consists of 60 observations on 3
variables.
len numeric Tooth length
supp factor Supplement type (VC or OJ)
dose numeric Dose in milligrams
data(ToothGrowth); str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
For this study, we will convert the dose variable to a discrete factor.
<- ToothGrowth %>% mutate(dose = as.factor(dose)) ToothGrowth
We start with a look at how the teeth length is distributed with respect to dosage levels and supplement types.
supp | dose | count | min | q25 | median | q75 | max | mean | sd | skew |
---|---|---|---|---|---|---|---|---|---|---|
OJ | 0.5 | 10 | 8.2 | 9.700 | 12.25 | 16.175 | 21.5 | 13.23 | 4.46 | 0.51 |
OJ | 1 | 10 | 14.5 | 20.300 | 23.45 | 25.650 | 27.3 | 22.70 | 3.91 | -0.80 |
OJ | 2 | 10 | 22.4 | 24.575 | 25.95 | 27.075 | 30.9 | 26.06 | 2.66 | 0.43 |
VC | 0.5 | 10 | 4.2 | 5.950 | 7.15 | 10.900 | 11.5 | 7.98 | 2.75 | 0.16 |
VC | 1 | 10 | 13.6 | 15.275 | 16.50 | 17.300 | 22.5 | 16.77 | 2.52 | 1.08 |
VC | 2 | 10 | 18.5 | 23.375 | 25.95 | 28.800 | 33.9 | 26.14 | 4.80 | 0.19 |
Next, we will look at the distribution of teeth length for each variable to check for normality:
While none of the results show a classic Gaussian curve, distributions are close enough to normal for a T-analysis.
We will test two hypotheses: that average teeth length is equal for dosages 0.5 & 1.0, and similarly for dosages 1.0 & 2.0.
\(H_{01}: \mu_{0.5} = \mu_{1.0}\), \(H_{A1}: \mu_{0.5} \ne \mu_{1.0}\)
\(H_{02}: \mu_{0.5} = \mu_{1.0}\), \(H_{A2}: \mu_{0.5} \ne \mu_{1.0}\)
P-value | CI Lower | CI Upper | |
---|---|---|---|
0.5:1.0 | 1.00e-07 | -11.983781 | -6.276219 |
1.0:2.0 | 1.91e-05 | -8.996481 | -3.733519 |
Comparing teeth lengths for dosages of 0.5 & 1.0mg/day, and also for 1.0 & 2.0mg/day, there is significant difference on both intervals to reject \(H_{01}\) & \(H_{02}\).
Both tests report very low P-values and 95% confidence intervals that do not include zero.
We will test four hypotheses:
\(H_{01}: \mu_{OJ} = \mu_{VC}\), \(H_{A1}: \mu_{OJ} \ne \mu_{VC}\)
\(H_{02}: \mu_{OJ0.5} = \mu_{VC0.5}\), \(H_{A2}: \mu_{OJ0.5} \ne \mu_{VC0.5}\)
\(H_{03}: \mu_{OJ1.0} = \mu_{VC1.0}\), \(H_{A3}: \mu_{OJ1.0} \ne \mu_{VC1.0}\)
\(H_{04}: \mu_{OJ2.0} = \mu_{VC2.0}\), \(H_{A4}: \mu_{OJ2.0} \ne \mu_{VC2.0}\)
P-value | CI Lower | CI Upper | |
---|---|---|---|
All dosages | 0.0606345 | -0.1710156 | 7.571016 |
Dosage 0.5 | 0.0063586 | 1.7190573 | 8.780943 |
Dosage 1.0 | 0.0010384 | 2.8021482 | 9.057852 |
Dosage 2.0 | 0.9638516 | -3.7980705 | 3.638070 |
Across all dosages, although there is probable cause to believe that supplement type has an affect on teeth length, at the 95% confidence level, we are unable to reject the null hypothesis with a p-value of 0.06 and a confidence interval that includes zero.
Drilling down into the dosage levels however, we see that dosage levels at 0.5 and 1.0 mg/day do show a significant difference and only the higher dosage of 2.0 shows insignificant difference between supplements.
Therefore, we reject \(H_{02}\) & \(H_{03}\) while failing to reject \(H_{01}\) & \(H_{04}\).
Based on this analysis, there is statistically significant difference for test groups receiving 0.5, 1.0 and 2.0 mg/day of supplement to say that in similar samples, by increasing dosage from 0.5 to 1.0 mg/day and from 1.0 to 2.0mg/day we would expect to see an increase in teeth length in at least 95% of test subjects.
While there was strong suggestion of a similar relationship between supplement types across all subjects, with 94% seeing an increase for Orange Juice, we fail to reject the null hypothesis that there is no effect.
Examining effects of different supplement types at each dosage level, a clearer picture emerges, where we see that orange juice is significantly different at 0.5 and 1.0 mg/day (99.4% and 99.0% respectively) but almost no difference at 2.0 mg/day (only 4%). We would therefore reject the null hypothesis at dosage levels of 0.5 and 1.0 mg/day and fail to reject at 2.0 mg/day. In other words, orange juice is considered more effective at lower dosage levels, but at 2.0 mg/day, there is no significant difference.
There are a lot of unknowns about the test subjects:
Additionally, with only 10 subjects per vector, it’s very difficult to make any generalizations about the population.
Future studies would include a control group and greater sample size or many more sample sets.
This report uses the following libraries: dplyr
,
tidyr
, ggplot2
, kableExtra
,
moments
# boxplot of len, dose on x axis, split by supp
<- ToothGrowth %>%
p ggplot(aes(y=len, x=as.factor(dose), fill=supp)) +
geom_boxplot() +
ggtitle("Tooth Length by Supplement Type and Dosage") +
xlab("Dosage (mg/day)") +
ylab("Tooth Length (mm)") +
guides(fill=guide_legend(title="Supplement")) +
theme_minimal()
# create kable of stat summaries
<- ToothGrowth %>% group_by(supp, dose) %>%
tg_summary summarise(
count=n(),
min=min(len),
q25=quantile(len, 0.25),
median=median(len),
q75=quantile(len, 0.75),
max=max(len),
mean=round(mean(len),2),
sd=round(sd(len),2),
skew=round(skewness(len),2)) %>%
kbl() %>%
kable_styling(bootstrap_options = c("condensed"))
# histogram of len by supp with overlaid density curve
<- ToothGrowth %>%
gsupp ggplot(aes(x=len)) +
geom_histogram(
alpha = 0.75,
bins = 30,
aes(y = stat(density)),
position="identity",
fill="orange",
col="darkgrey"
+
) geom_line(
aes(y = ..density..),
colour = 'red',
stat = 'density',
size = 1.5,
alpha = 0.6
+
) facet_wrap(~supp) +
ggtitle ("Distribution of Teeth Length by Supplement Type") +
xlab("Teeth Length") +
ylab("Density") +
theme_minimal()
# histogram of len by dose with overlaid density curve
<- ToothGrowth %>%
gdose ggplot(aes(x=len)) +
geom_histogram(
alpha = 0.75,
bins = 30,
aes(y = stat(density)),
position="identity",
fill="orange",
col="darkgrey"
+
) geom_line(
aes(y = ..density..),
colour = 'red',
stat = 'density',
size = 1.5,
alpha = 0.6
+
) facet_wrap(~dose) +
ggtitle ("Distribution of Teeth Length by Dosage Level (mg/day)") +
xlab("Teeth Length") +
ylab("Density") +
theme_minimal()
len
for dose
and
supp
# get variation of len by dose and supp to test if vars are equal or not
<- ToothGrowth %>%
varDose group_by(dose) %>%
summarise(round(var(len),2))
<- ToothGrowth %>%
varSupp group_by(supp) %>%
summarise(round(var(len),2))
dose
# run 2 t-tests for each dose interval (0.5:1, 1:2)
<- t.test(
td1 ~ dose,
len paired = FALSE,
var.equal = FALSE,
data = subset(ToothGrowth, dose %in% c(0.5, 1))
)<- t.test(
td2 ~ dose,
len paired = FALSE,
var.equal = FALSE,
data = subset(ToothGrowth, dose %in% c(1, 2))
)
# create dataframe of useful stats
<- data.frame(
df_dose $p.value,
td1$conf.int[1],
td1$conf.int[2]
td1
)<- rbind(
df_dose c(td2$p.value, td2$conf.int[1], td2$conf.int[2])
df_dose,
)
rownames(df_dose) = c("0.5:1.0","1.0:2.0")
# create kable from summary
%>%
df_dose kbl(col.names = c("P-value", "CI Lower", "CI Upper")) %>%
kable_styling(
ootstrap_options = c("condensed"),
full_width = F,
position = "float_left"
)
supp
# run 4 t-tests for supp: all dosages, dosage=0.5, dosage=1, dosage=2
<- t.test(
tsupp ~ supp,
len paired = FALSE,
var.equal = FALSE,
data = ToothGrowth
)<- t.test(
tsuppd1 ~ supp,
len paired = FALSE,
var.equal = FALSE,
data = subset(ToothGrowth, dose == 0.5)
)<- t.test(
tsuppd2 ~ supp,
len paired = FALSE,
var.equal = FALSE,
data = subset(ToothGrowth, dose == 1)
)<- t.test(
tsuppd3 ~ supp,
len paired = FALSE,
var.equal = FALSE,
data = subset(ToothGrowth, dose == 2)
)
# create dataframe of useful stats
<- data.frame(
df_supp $p.value,
tsupp$conf.int[1],
tsupp$conf.int[2]
tsupp
)<- rbind(
df_supp
df_supp, c(tsuppd1$p.value, tsuppd1$conf.int[1], tsuppd1$conf.int[2])
)<- rbind(
df_supp
df_supp, c(tsuppd2$p.value, tsuppd2$conf.int[1], tsuppd2$conf.int[2])
)<- rbind(
df_supp
df_supp, c(tsuppd3$p.value, tsuppd3$conf.int[1], tsuppd3$conf.int[2])
)
rownames(df_supp) = c("All dosages", "Dosage 0.5", "Dosage 1.0", "Dosage 2.0")
# create kable from summary
%>%
df_supp kbl(col.names = c("P-value", "CI Lower", "CI Upper")) %>%
kable_styling(
bootstrap_options = c("condensed"),
full_width = F,
position = "float_left"
)