The purpose of the this data analysis is to analyze the ToothGrowth data set by comparing the guinea pig tooth growth by supplement and dose. First, I will do exploratory data analysis on the data set. Then I will do the comparison with confidence intervals/p-values in order to make conclusions about the tooth growth.
library(ggplot2)
library(datasets)
library(gridExtra)
Here is a basic summary of the data in the form of descriptive statistics.
my_data <- ToothGrowth
summary(my_data)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
str(my_data)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
my_data$dose <- as.factor(my_data$dose)
my_data$avg <- ifelse(my_data$len > mean(my_data$len), "Above", "Below")
my_data$avg <- as.factor(my_data$avg)
Below we have the only numerical variable in the data set: Tooth Length. We can see that the distribution is not normally distributed but the mean and median are very close to each other. Supplement Type does not seem to have a big impact on the length variable. We can also see that the dosage type does seem to affect the tooth length. More specifically the dosage type of 2 tends to occure with high tooth length and dosage of 0.5 with low tooth lengths.
a <- ggplot(my_data,aes(x=len))+
geom_histogram(aes(y = ..density..), binwidth=1, colour = "blue", fill = "darkgrey")+
geom_density(colour = "blue")+
geom_vline(aes(xintercept = mean(my_data$len)), linetype = "longdash", color = "blue")+
geom_vline(aes(xintercept = median(my_data$len)), linetype = "longdash", color = "red")+
stat_function(fun=dnorm, args=list(mean=mean(my_data$len), sd=sd(my_data$len)))+
xlab("")+
ggtitle("Distribution of Tooth Length")+
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text(), legend.key = element_rect(colour = "black"))
b <- ggplot(my_data,aes(x=len))+
geom_histogram(aes(y = ..count.., fill = supp), colour = "black", binwidth=1)+
xlab("")+
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text(), legend.key = element_rect(colour = "black"))
c <- ggplot(my_data,aes(x=len))+
geom_histogram(aes(y = ..count.., fill = dose), colour = "black", binwidth=1)+
xlab("")+
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text(), legend.key = element_rect(colour = "black"))
grid.arrange(a, heights = c(2.5/4, 1.5/4), ncol = 1, arrangeGrob(b, c, ncol = 2))
Below we can see box plots of both Supplement Type and Dosage.
a <- ggplot(my_data, aes(my_data$supp, my_data$len)) +
ylab("Tooth Length") +
xlab("Supplement Type") +
ggtitle("Tooth Length by Supplement Type") +
stat_boxplot(geom ='errorbar')+
geom_boxplot(outlier.shape = NA) +
geom_jitter(aes(colour=my_data$len), position = position_jitter(width = 0.2)) +
scale_colour_gradient2(low = "blue", mid = "red",high = "black", midpoint = mean(my_data$len), name="Tooth Length")+
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text(), legend.key = element_rect(colour = "black"))
b <- ggplot(my_data, aes(my_data$dose, my_data$len)) +
ylab("Tooth Length") +
xlab("Dose in Milligrams") +
ggtitle("Tooth Length by Dose in Milligrams") +
stat_boxplot(geom ='errorbar')+
geom_boxplot(outlier.shape = NA) +
geom_jitter(aes(colour=my_data$len), position = position_jitter(width = 0.2)) +
scale_colour_gradient2(low = "blue", mid = "red",high = "black", midpoint = mean(my_data$len), name="Tooth Length")+
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text(), legend.key = element_rect(colour = "black"))
grid.arrange(a, b, ncol = 2)
In line with what we saw before, the difference between Supplement Types is not statistically significant. Null hypothesis can not be rejected with a p-value of 0.06. The differences between Dose in Milligrams are statistically significant with p-values of under 0.05.
pairwise.t.test(my_data$len, my_data$supp)
##
## Pairwise comparisons using t tests with pooled SD
##
## data: my_data$len and my_data$supp
##
## OJ
## VC 0.06
##
## P value adjustment method: holm
pairwise.t.test(my_data$len, my_data$dose)
##
## Pairwise comparisons using t tests with pooled SD
##
## data: my_data$len and my_data$dose
##
## 0.5 1
## 1 1.3e-08 -
## 2 4.4e-16 1.4e-05
##
## P value adjustment method: holm
Below we can see clear differences between Dosages. Delivery Type could hold information within Dosages.
ggplot(my_data, aes(x=supp, y=len)) +
stat_boxplot(geom ='errorbar')+
geom_boxplot(aes(fill=supp), outlier.shape = NA) +
geom_jitter(aes(colour=len), position = position_jitter(width = 0.2)) +
scale_colour_gradient2(low = "white", mid = "gray", high = "black", midpoint = mean(my_data$len), name="Tooth Length")+
xlab("Supplement Type") +
ylab("Tooth Length") +
facet_grid(~ dose) +
ggtitle("Tooth Length vs. Delivery Type by Dose Amount") +
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text(), legend.key = element_rect(colour = "black"))
Below we can see Tooth Length compared to Delivery Type by Dose Amount.
ggplot(my_data, aes(x=dose, y=len)) +
stat_boxplot(geom ='errorbar')+
geom_boxplot(aes(fill=dose), outlier.shape = NA) +
geom_jitter(aes(colour=len), position = position_jitter(width = 0.2)) +
scale_colour_gradient2(low = "white", mid = "gray", high = "black", midpoint = mean(my_data$len), name="Tooth Length")+
xlab("Supplement Type") +
ylab("Tooth Length") +
facet_grid(~ supp) +
ggtitle("Tooth Length vs. Delivery Type by Dose Amount") +
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text(), legend.key = element_rect(colour = "black"))
Next I’d like to see the differences of those Tooth Length values that are above the mean value and those below the mean.
Here we can see that Supplement Type does seem be a differentiator for the mean of Tooth Growth. Dose in Milligrams is clearly a good predictor for if the length is above or below the mean.
a <- ggplot(my_data, aes(supp)) +
geom_bar(aes(fill = avg), position = "fill", colour="black") +
geom_hline(aes(yintercept = prop.table(table(my_data$avg))["Above"]),linetype = "dashed") +
xlab("Supplement Type") +
ylab ("Proportion") +
ggtitle("Proportion of Above\nand Below Average Tooth Length\nby Supplement Type")+
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text())
b <- ggplot(my_data, aes(dose)) +
geom_bar(aes(fill = avg), position = "fill", colour="black") +
geom_hline(aes(yintercept = prop.table(table(my_data$avg))["Above"]),linetype = "dashed") +
xlab("Dose in Milligrams") +
ylab ("Proportion") +
ggtitle("Proportion of Above\nand Below Average Tooth Length\nby Dose in Milligrams")+
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text())
grid.arrange(a, b, ncol = 2)
Dose is also a good predictor within the above and below groups! Supplement Type is not a good predictor within the above and below groups.
a <- ggplot(my_data,aes(dose, len)) +
stat_boxplot(geom ='errorbar')+
geom_boxplot(aes(colour=avg)) +
facet_grid(~avg) +
xlab("") +
ylab ("Tooth Length") +
ggtitle("Above and Below Average\nTooth Length by Dose in Milligrams")+
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text(), legend.key = element_rect(colour = "black"))
b <- ggplot(my_data,aes(supp, len)) +
stat_boxplot(geom ='errorbar')+
geom_boxplot(aes(colour=avg)) +
facet_grid(~avg) +
xlab("") +
ylab ("Tooth Length") +
ggtitle("Above and Below Average\nTooth Length by Supplement Type")+
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text(), legend.key = element_rect(colour = "black"))
grid.arrange(a, b, ncol = 2)
Digging a bit deeper we can not see a difference when controlling for both Supplement Type and Dose in Milligrams.
ggplot(my_data,aes(avg, len)) +
stat_boxplot(geom ='errorbar')+
geom_boxplot(aes(colour=avg)) +
facet_grid(dose~supp) +
xlab("") +
ylab ("Tooth Length") +
ggtitle("Above and Below Average Tooth Length\nby Supplement Type and Dose in Milligrams")+
theme(axis.line = element_line(), axis.text=element_text(color='black'), axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text(), legend.key = element_rect(colour = "black"))
Supplement type seem to have little to no impact on tooth growth.
Inreasing the dose level leads to increased tooth growth.
Members of the sample population, i.e. the 60 observations, are representative of the entire population. If this assumption is true the we can generalize the results.