In this project I will analize the ToothGrowth data in the R datasets package.
The analysis will be separated in four parts
# First load the data
data(ToothGrowth)
# See how it is structured
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# See first 5 rows
head(ToothGrowth, 5)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
First load libraries
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.3
# convert dose column from a numeric to a factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
T.growth <- ToothGrowth %>%
group_by(supp, dose) %>%
summarise(len = mean(len))
ggplot(ToothGrowth, aes(dose, len, fill = dose)) +
geom_boxplot(size = 1, aes(colour = dose)) + facet_grid(.~supp) + ggtitle("Tooth Growth lenght related to dosis")+
xlab("Dose (mg)") + ylab("Teeth length")
# Create 2 groups of data depending on the type of supplement
group_oj <- ToothGrowth$len[ToothGrowth$supp == 'OJ']
group_vc <- ToothGrowth$len[ToothGrowth$supp == 'VC']
# Separate data depending only of dose
group_dose_0.5 <- ToothGrowth$len[ToothGrowth$dose == 0.5]
group_dose_1 <- ToothGrowth$len[ToothGrowth$dose == 1]
group_dose_2 <- ToothGrowth$len[ToothGrowth$dose == 2]
# Separate data depending on dose and type of supplement
group_oj_0.5 <- ToothGrowth$len[ToothGrowth$dose == 0.5 & ToothGrowth$supp == 'OJ']
group_oj_1 <- ToothGrowth$len[ToothGrowth$dose == 1 & ToothGrowth$supp == 'OJ']
group_oj_2 <- ToothGrowth$len[ToothGrowth$dose == 2 & ToothGrowth$supp == 'OJ']
group_vc_0.5 <- ToothGrowth$len[ToothGrowth$dose == 0.5 & ToothGrowth$supp == 'VC']
group_vc_1 <- ToothGrowth$len[ToothGrowth$dose == 1 & ToothGrowth$supp == 'VC']
group_vc_2 <- ToothGrowth$len[ToothGrowth$dose == 2 & ToothGrowth$supp == 'VC']
General conditions:
Will see if there is a relation between doses, that is if one makes the tooth larger than the other
t.test(group_oj, group_vc, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: group_oj and group_vc
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.4682687 Inf
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
Because the p-value is less than alpha = 0.05 we have to reject the null hypothesis, which is that there is no difference in the medians of both supplements. Therefore it seems to be a relationship, that OJ results in greater tooth growth.
Compare dose 0.5 against dose 1
t.test(group_dose_0.5, group_dose_1, paired = FALSE, alternative = "less", var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: group_dose_0.5 and group_dose_1
## t = -6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -6.753323
## sample estimates:
## mean of x mean of y
## 10.605 19.735
Because the p value is smaller than alpha we have to reject the null hypothesis that both dose have the same effect and go for the alternative, which is that dose 1 have a greater effect on lenght that dose 0.5.
Compare dose 1 against dose 2
t.test(group_dose_1, group_dose_2, paired = FALSE, alternative = "less", var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: group_dose_1 and group_dose_2
## t = -4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -4.17387
## sample estimates:
## mean of x mean of y
## 19.735 26.100
Same result as the previous case. We have to reject the null hypothesis because p value is smaller than alpha (0.05) and consider that dose 2 have a greater effect than dose 1.
Compare OJ and OC for dose 0.5
t.test(group_oj_0.5, group_vc_0.5, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: group_oj_0.5 and group_vc_0.5
## t = 3.1697, df = 14.969, p-value = 0.003179
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 2.34604 Inf
## sample estimates:
## mean of x mean of y
## 13.23 7.98
Here, the null hypothesis has to be rejected due to the small p-value and consider the alternative hypothesis true, which is that supplement OJ with dose 0.5 has a greater effect than supplement VC with the same dose.
Compare OJ and OC for dose 1
t.test(group_oj_1, group_vc_1, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: group_oj_1 and group_vc_1
## t = 4.0328, df = 15.358, p-value = 0.0005192
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 3.356158 Inf
## sample estimates:
## mean of x mean of y
## 22.70 16.77
Similar case than the one with dose 0.5. Here the p-value is small and the null hypothesis has to be rejected and consider the alternative hypothesis true, which is that supplement OJ with dose 1 has a greater effect than supplement VC with the same dose.
Compare OJ and OC for dose 2
t.test(group_oj_2, group_vc_2, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: group_oj_2 and group_vc_2
## t = -0.046136, df = 14.04, p-value = 0.5181
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -3.1335 Inf
## sample estimates:
## mean of x mean of y
## 26.06 26.14
Now, in this test, the p-value is greater than alpha, therefore, we can not reject the null hypothesis, and consider that both supplements with dose 2 have the same effect.
To make a better hypothesis testing, the sample should be increased in order to have more defined variance and try to find other relations.