Now in the second portion of the class, we’re going to analyze the ToothGrowth data in the R datasets package:
This is the explanation about the data given in R help file:
The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
Dataframe has 60 Observations and 3 variables. Both Supplement groups have 30 observations.
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
After creating a pairs plot of all the variables and a faceted plot showing tooth grotwh per supplement and dose combination, it is clear that dose is the biggest influencing factor on tooth growth. They have a correllation of 0.803. Supplement OJ causes a bit more growth with the dose of 0.5, but no iportant difference in the higher doses. It’s correllation with tooth growth is 0.75. Supplement VC and growth have a correllation of 0.899.
The plots can be found from the Appendix.
## Warning in ggpairs(data = tooth, colour = "supp", title = "Pairplot of All
## Variables from the Tooth Growth Dataset", : Extra arguments: 'colour' are
## being ignored. If these are meant to be aesthetics, submit them using the
## 'mapping' variable within ggpairs with ggplot2::aes or ggplot2::aes_string.
##
## Call:
## lm(formula = len ~ dose + supp, data = tooth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.600 -3.700 0.373 2.116 8.800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.2725 1.2824 7.231 1.31e-09 ***
## dose 9.7636 0.8768 11.135 6.31e-16 ***
## suppVC -3.7000 1.0936 -3.383 0.0013 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.236 on 57 degrees of freedom
## Multiple R-squared: 0.7038, Adjusted R-squared: 0.6934
## F-statistic: 67.72 on 2 and 57 DF, p-value: 8.716e-16
From the above results we can interpret that without any supplements, the average tooth length under test is 9.2725 units.
The coefficient of dose is 9.7636. This indicates that the length of tooth will increase by 9.7636 units with an one unit increase in dose, asumming that there is no change in the supplement type.
The coefficient of suppVC is -3.7. This means that the estimated increase in tooth length is 3.7 units less for the same dosage of VC than OJ.
Running a t-test between tooth growth and supplement gives us the below results
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
With this result we can’t reject the null hypothesis that supplement type does not have a significant effect on tooth growth. Let’s see the effect of supplement type with different dosages.
Dose of 0.5:
## [1] 0.006358607
## mean in group OJ mean in group VC
## 13.23 7.98
Dose of 1.0:
## [1] 0.001038376
## mean in group OJ mean in group VC
## 22.70 16.77
Dose of 2.0:
## [1] 0.9638516
## mean in group OJ mean in group VC
## 26.06 26.14
The T-tests show that dosages of 0.5 and 1.0 have relatively low p-value (0.006358 and 0.001038). This means that there is a significant difference of tooth growth explained by the supplement type at these dosages.
Thw 2.0 dosage that has a p-value of 0.963851 tells us that the supplement type does not factor in so much with the highest dosage. This can also be seen in the exploratory plots in the appendix.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Libraries and options
library(datasets)
library(knitr)
library(dplyr)
library(ggplot2)
library(GGally)
# Load the data
data("ToothGrowth")
tooth <- tbl_df(ToothGrowth)
# Pairsplot
plot1 <- ggpairs(data = tooth,
colour = 'supp',
title = 'Pairplot of All Variables from the Tooth Growth Dataset',
upper = list(continuous = 'points'),
lower = list(continuous = "cor"),
axisLabels = 'show',
verbose = F
)
# Compare supplements on growth
plot2 <- ggplot(tooth, aes(dose, len)) +
geom_point() +
facet_grid(. ~ supp) +
labs(title = 'Tooth Growth by Supplement and Dose Combination',
x = 'Supplement and Dose Combination',
y = 'Growth')
# Fit a linear Model
toothModel <- lm(len ~ dose + supp, data = tooth)
# T-Tests
tSupp <- t.test(len ~ supp, data = tooth,
paired = F,
var.equal = F)
dose0.5 <- subset (tooth, dose == 0.5)
dose1.0 <- subset (tooth, dose == 1.0)
dose2.0 <- subset (tooth, dose == 2.0)
tTest0.5 <- t.test (len ~ supp, paired = F, var.equal = F, data = dose0.5)
tTest1.0 <- t.test (len ~ supp, paired = F, var.equal = F, data = dose1.0)
tTest2.0 <- t.test (len ~ supp, paired = F, var.equal = F, data = dose2.0)