In the current project we will analyze the effect of vitamin C on tooth growth in guinea pigs. We will look to see if the differences in the supplement and the dose administered have an impact on growth. For this we will use different statistical techniques, especially different hypothesis tests.
The packages to be used for the project are:
library(datasets)
library(ggplot2)
library(dplyr)
library(viridis)
The “datasets” package will be used to obtain the data.
library(datasets)
We get the official description of the data we use.
library(help = "datasets")
We will use the “ToothGrowth” data. The description of these is: The Effect of Vitamin C on Tooth Growth in Guinea Pigs.
data(ToothGrowth)
We perform a general analysis on the data.
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
The first data that the data frame contains.
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
We obtain a statistical summary.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
We will make two graphs to observe the behavior of the data and formulate hypotheses.
ggplot(aes(x=factor(dose), y=len), data=ToothGrowth) +
geom_boxplot(aes(fill=supp), color = "#FAA3F4") +
scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
xlab("Dose") +
ylab("Length")
ggplot(aes(x=factor(dose), y=len), data=ToothGrowth) +
geom_violin(aes(fill=supp), color = "#FAA3F4") +
scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
xlab("Dose") +
ylab("Length")
With the behavior of the graphs, we can make a hypothesis about which supplement generates the greatest impact on length and which implies that the dose is increased.
We will perform two sets of hypothesis tests: The first will focus on analyzing the two supplements and the second on the dose implemented.
The data is assumed to be identically distributed and independent. Likewise, the variances for each subset of data that we will create for each hypothesis are different. Finally the normality condition will be tested and with this we can perform the Student’s t test.
We will use the shapiro test to check normality, as it is a good indicator of normality for small data sets.
shapiro.test(ToothGrowth$len)
##
## Shapiro-Wilk normality test
##
## data: ToothGrowth$len
## W = 0.96743, p-value = 0.1091
The p-value is greater than 0.5 serves not to reject the null hypothesis and we can assume that the data have normal distribution.
ggplot(ToothGrowth,aes(len)) +
geom_histogram(fill="#9F7EC4", color="#FAA3F4", binwidth=6) +
xlab("Data length") +
ggtitle("Data distribution")
The null hypothesis is that the two supplements have the same impact on length. The alternative hypothesis would be that the OJ supplement has a greater impact on the length of the teeth.
Supp_VC <- filter(ToothGrowth,supp == "VC")$len
Supp_OJ <- filter(ToothGrowth,supp == "OJ")$len
t.test(Supp_OJ, Supp_VC, alternative = "greater", paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: Supp_OJ and Supp_VC
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.4682687 Inf
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
We reject the null hypothesis because the p-value is smaller than 0.05 so we opted for the alternative hypothesis which implies that the OJ supplement has a greater impact than the VC supplement.
We will do three hypothesis tests to see if increasing the dose decreases or increases the length of the teeth.
dose_05 <- filter(ToothGrowth,dose == 0.5)$len
dose_1 <- filter(ToothGrowth,dose == 1)$len
dose_2 <- filter(ToothGrowth,dose == 2)$len
The null hypothesis is that the dose of 0.5 and the dose of 1 have the same impact on length. The alternative hypothesis would be that the dose of 1 has a greater impact on the length of the teeth.
t.test(dose_1, dose_05, alternative = "greater", paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: dose_1 and dose_05
## t = 6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 6.753323 Inf
## sample estimates:
## mean of x mean of y
## 19.735 10.605
We reject the null hypothesis because the p-value is less than 0.05, so we opted for the alternative hypothesis that implies that the dose of 1 has a greater impact than the dose of 0.5.
The null hypothesis is that the dose of 1 and the dose of 2 have the same impact on length. The alternative hypothesis would be that the dose of 1 has a greater impact on the length of the teeth.
t.test(dose_2, dose_1, alternative = "greater", paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: dose_2 and dose_1
## t = 4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 4.17387 Inf
## sample estimates:
## mean of x mean of y
## 26.100 19.735
We reject the null hypothesis because the p-value is less than 0.05, so we opted for the alternative hypothesis that implies that the dose of 2 has a greater impact than the dose of 1.
The null hypothesis is that the dose of 0.5 and the dose of 2 have the same impact on length. The alternative hypothesis would be that the dose of 2 has a greater impact on the length of the teeth.
t.test(dose_2, dose_05, alternative = "greater", paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: dose_2 and dose_05
## t = 11.799, df = 36.883, p-value = 2.199e-14
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 13.27926 Inf
## sample estimates:
## mean of x mean of y
## 26.100 10.605
We reject the null hypothesis because the p-value is less than 0.05, so we opted for the alternative hypothesis that implies that the dose of 2 has a greater impact than the dose of 0.5.
It is concluded that the difference between the supplement and the administered dose affects the growth of the teeth of guinea pigs. In the case of supplements, OJ reflects higher growth than VC. And in the case of the dose, it is noted that the higher the dose implies greater growth.