The study question was to examine the data for a correlation between the length of certain tooth cells in guinea pigs and two types of supplements that administered to them.
We found a positive relationship between supplementation and tooth growth. We also found that the type of supplement does make a difference (OJ is superior) at 0.5 and 1mg doses, but not at 2mg.
We will mostly rely on the “tidyverse” family of packages for data manipulation.
# Load Packages and get the Data
# Install pacakages
check_and_install_package <- function(package_name){
if(!package_name %in% installed.packages()){
install.packages(package_name)
}
}
check_and_load_package <- function(package_name){
if(!package_name %in% (.packages())){
library(package_name, character.only = TRUE)
}
}
check_and_install_package("tidyverse")
check_and_load_package("tidyverse")
check_and_install_package("GGally")
check_and_load_package("GGally")
check_and_install_package("rstatix")
check_and_load_package("rstatix")
check_and_install_package("ggpubr")
check_and_load_package("ggpubr")
(.packages())
Standrard R description of the data:
The Effect of Vitamin C on Tooth Growth in Guinea Pigs
The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
tooth_raw <- ToothGrowth
summary(tooth_raw)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
str(tooth_raw)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
We note we only have 60 observations balanced accross both groups. That means only 10 observations per category (i.e. supplement and dose combination). No missing values. For a quick exploration we run the ggpairs() function from the GGally package.
ggpairs(tooth_raw
,ggplot2::aes(colour=supp, alpha = 0.8)#,
#upper = list(continuous = "density", combo = "box_no_facet")
)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We note the following from the plots above: 1. The OJ supplements seemed
to have resulted in a distribution with higher quantities of longer
cells, 2. The median length is also a bit larger than that produced by
the VC, but since there is a fair amount of overlap of the interquartile
range, we’ll have to rely on our confidence intervals and hypothesis
testing to determine if the difference is meaningful. 3. It seems that
with both supplements there was a bit of diminishing returns in the
sense that doubling the dose from 1.0 to 2.0mg didn’t have as profound
an effect as going from 0.5 to 1. This trend was more intense for the
OJ, hence it has a slightly lower linear correlation of length to
dose.
We draw a heatmap of the data to get a bit of a feel for how they compare, and also look at the means broken down by supplement and dose
data_heat <- tooth_raw %>%
mutate(item = rep(1:10, 6)) %>%
pivot_wider(names_from = c(supp, dose), values_from = len) %>%
select(-item)%>%
sapply(as.numeric)
heatmap(as.matrix(data_heat))
data_means <- tooth_raw %>%
pivot_wider(names_from = c(supp, dose), values_from = len, values_fn = mean)
data_var <- tooth_raw %>%
pivot_wider(names_from = c(supp, dose), values_from = len, values_fn = var)
data_plus <- rbind(data_means, data_var) %>%
mutate(Stat = c("Means", "Variance"))
data_plus
We see that OJ seems significantly better than VC at the 0.5 and 1.0mg doses but at a much larger variance. At 2.0mg the two types of supplements perform very similarly on average, but the VC this time has a much larger variance.
Null hypothesis: There is no difference in in mean cell length as you vary the dose of OJ. I.e. mu_1 - mu_0.5 = 0 Alternate Hypothesis: The cells are longer for a higher dose. I.e. mu_1 - mu_0.5 > 0
data <- tooth_raw %>%
mutate(item = rep(1:10, 6)) %>%
filter(supp == "OJ") %>%
pivot_wider(names_from = dose, values_from = len) %>%
select(-item)
t.test(data$`1`-data$`0.5`, alternative = "greater")
##
## One Sample t-test
##
## data: data$`1` - data$`0.5`
## t = 4.1635, df = 9, p-value = 0.001218
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
## 5.300497 Inf
## sample estimates:
## mean of x
## 9.47
This shows that we will reject the null hypothesis. Similar for the rest of the dose comparisons for both OJ and VC.
Let’s also check for OJ comparing 1 and 2mg doses:
t.test(data$`2`-data$`1`, alternative = "greater")
##
## One Sample t-test
##
## data: data$`2` - data$`1`
## t = 1.9435, df = 9, p-value = 0.04192
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
## 0.1908169 Inf
## sample estimates:
## mean of x
## 3.36
Notice the p-value is almost 5%, but we’ll still reject the null hypothesis.
Now we compare supplements:
Null hypothesis: There is no difference in in mean cell length outcome depending on whether OJ or VC is given for a specific dose. I.e. mu_OJ_1.0 - mu_VC_1.0 = 0 Alternate Hypothesis: The cells are longer for a higher dose. I.e. mu_OJ_1.0 - mu_VC_1.0 > 0
data2 <- tooth_raw %>%
mutate(item = rep(1:10, 6)) %>%
filter(dose == 1.0) %>%
pivot_wider(names_from = supp, values_from = len) %>%
select(-item)
t.test(data2$OJ-data2$VC, alternative = "greater")
##
## One Sample t-test
##
## data: data2$OJ - data2$VC
## t = 3.3721, df = 9, p-value = 0.004115
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
## 2.706401 Inf
## sample estimates:
## mean of x
## 5.93
We see that there is a significant difference between supplements at 1.0 mg dose.
Let’s look at 2 mg:
Null hypothesis: There is no difference in in mean cell length outcome depending on whether OJ or VC is given for a specific dose. I.e. mu_OJ_2.0 - mu_VC_2.0 = 0 Alternate Hypothesis: The cells are longer for a higher dose. I.e. mu_OJ_2.0 - mu_VC_2.0 != 0
data3 <- tooth_raw %>%
mutate(item = rep(1:10, 6)) %>%
filter(dose == 2.0) %>%
pivot_wider(names_from = supp, values_from = len) %>%
select(-item)
t.test(data3$OJ-data3$VC, alternative = "two.sided")
##
## One Sample t-test
##
## data: data3$OJ - data3$VC
## t = -0.042592, df = 9, p-value = 0.967
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -4.328976 4.168976
## sample estimates:
## mean of x
## -0.08
The p=value is very large in this instance, so we cannot reject the null hypothesis, ie. that the type of supplement makes no difference at this dose.
We conclude that dose of supplement does have a positive outcome on tooth cell length. We also conclude that type of supplement does make difference (OJ is superior) at 0.5 and 1mg doses, but not at 2mg.
Assumptions were as follows: I leaned on this (datanovia page)[https://www.datanovia.com/en/lessons/t-test-assumptions/independent-t-test-assumptions/].
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group. - O.K.
No significant outliers in the two groups - O.K. no extreme outliers
outliers <- tooth_raw %>%
group_by(supp, dose) %>%
identify_outliers(len)
outliers
normality <- tooth_raw %>%
group_by(supp, dose) %>%
shapiro_test(len)
normality
You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.
#qq_data <- tooth_raw %>%
# pivot_longer(names_to = len, values_to = c(dose, supp))
#qq_data
ggqqplot(tooth_raw, x = "len", facet.by = c("supp", "dose"))
## Warning: The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
All the points fall approximately along the (45-degree) reference line,
for each group. So we can assume normality of the data.
tooth_raw %>% levene_test(len ~ supp)
The p-value of the Levene’s test is not significant, suggesting that there is no significant difference between the variances of the two groups. Therefore we can use the standard t-test.
coplot(len ~ dose | supp, data = tooth_raw, panel = panel.smooth,xlab = "Comparing Tooth Growth
between different supplements and different dosis", col=c("royalblue4"), pch=21)
h1 <- t.test(len ~ supp, data = ToothGrowth)
h1
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
h2 <- t.test(len ~ supp, data = subset(ToothGrowth, dose == 0.5))
h2
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98