Synopsis

In the current project we will analyze the effect of vitamin C on tooth growth in guinea pigs. We will look to see if the differences in the supplement and the dose administered have an impact on growth. For this we will use different statistical techniques, especially different hypothesis tests.

Development

Packages

The packages to be used for the project are:

library(datasets)
library(ggplot2)
library(dplyr)
library(viridis)

Exploratory analysis

The “datasets” package will be used to obtain the data.

library(datasets)

We get the official description of the data we use.

library(help = "datasets")

We will use the “ToothGrowth” data. The description of these is: The Effect of Vitamin C on Tooth Growth in Guinea Pigs.

data(ToothGrowth)

We perform a general analysis on the data.

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

The first data that the data frame contains.

head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Summary of data

We obtain a statistical summary.

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

We will make two graphs to observe the behavior of the data and formulate hypotheses.

ggplot(aes(x=factor(dose), y=len), data=ToothGrowth) + 
        geom_boxplot(aes(fill=supp), color = "#FAA3F4") +
        scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
        xlab("Dose") +
        ylab("Length")

ggplot(aes(x=factor(dose), y=len), data=ToothGrowth) + 
        geom_violin(aes(fill=supp), color = "#FAA3F4") +
        scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
        xlab("Dose") +
        ylab("Length")

With the behavior of the graphs, we can make a hypothesis about which supplement generates the greatest impact on length and which implies that the dose is increased.

Hypothesis testing

We will perform two sets of hypothesis tests: The first will focus on analyzing the two supplements and the second on the dose implemented.

  • Assumptions

The data is assumed to be identically distributed and independent. Likewise, the variances for each subset of data that we will create for each hypothesis are different. Finally the normality condition will be tested and with this we can perform the Student’s t test.

We will use the shapiro test to check normality, as it is a good indicator of normality for small data sets.

shapiro.test(ToothGrowth$len)
## 
##  Shapiro-Wilk normality test
## 
## data:  ToothGrowth$len
## W = 0.96743, p-value = 0.1091

The p-value is greater than 0.5 serves not to reject the null hypothesis and we can assume that the data have normal distribution.

ggplot(ToothGrowth,aes(len)) + 
        geom_histogram(fill="#9F7EC4", color="#FAA3F4", binwidth=6) +
        xlab("Data length") + 
        ggtitle("Data distribution") 

  • Supplement hypothesis testing

The null hypothesis is that the two supplements have the same impact on length. The alternative hypothesis would be that the OJ supplement has a greater impact on the length of the teeth.

Supp_VC <- filter(ToothGrowth,supp == "VC")$len
Supp_OJ <- filter(ToothGrowth,supp == "OJ")$len

t.test(Supp_OJ, Supp_VC, alternative = "greater", paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  Supp_OJ and Supp_VC
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.4682687       Inf
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

We reject the null hypothesis because the p-value is smaller than 0.05 so we opted for the alternative hypothesis which implies that the OJ supplement has a greater impact than the VC supplement.

  • Doses hypothesis testing

We will do three hypothesis tests to see if increasing the dose decreases or increases the length of the teeth.

dose_05 <- filter(ToothGrowth,dose == 0.5)$len
dose_1 <- filter(ToothGrowth,dose == 1)$len
dose_2 <- filter(ToothGrowth,dose == 2)$len
  • Dose 0.5 vs Dose 1

The null hypothesis is that the dose of 0.5 and the dose of 1 have the same impact on length. The alternative hypothesis would be that the dose of 1 has a greater impact on the length of the teeth.

t.test(dose_1, dose_05, alternative = "greater", paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  dose_1 and dose_05
## t = 6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  6.753323      Inf
## sample estimates:
## mean of x mean of y 
##    19.735    10.605

We reject the null hypothesis because the p-value is less than 0.05, so we opted for the alternative hypothesis that implies that the dose of 1 has a greater impact than the dose of 0.5.

  • Dose 1 vs Dose 2

The null hypothesis is that the dose of 1 and the dose of 2 have the same impact on length. The alternative hypothesis would be that the dose of 1 has a greater impact on the length of the teeth.

t.test(dose_2, dose_1, alternative = "greater", paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  dose_2 and dose_1
## t = 4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  4.17387     Inf
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

We reject the null hypothesis because the p-value is less than 0.05, so we opted for the alternative hypothesis that implies that the dose of 2 has a greater impact than the dose of 1.

  • Dose 0.5 vs Dose 2

The null hypothesis is that the dose of 0.5 and the dose of 2 have the same impact on length. The alternative hypothesis would be that the dose of 2 has a greater impact on the length of the teeth.

t.test(dose_2, dose_05, alternative = "greater", paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  dose_2 and dose_05
## t = 11.799, df = 36.883, p-value = 2.199e-14
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  13.27926      Inf
## sample estimates:
## mean of x mean of y 
##    26.100    10.605

We reject the null hypothesis because the p-value is less than 0.05, so we opted for the alternative hypothesis that implies that the dose of 2 has a greater impact than the dose of 0.5.

Conclusion

It is concluded that the difference between the supplement and the administered dose affects the growth of the teeth of guinea pigs. In the case of supplements, OJ reflects higher growth than VC. And in the case of the dose, it is noted that the higher the dose implies greater growth.