First we load the data into the workspace
library(ggplot2)
library(datasets)
data(ToothGrowth)
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
Description of the structure of the ToothGrowth dataframe
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
ToothGrowth$dose <- as.factor(ToothGrowth$dose) # Transform dose variable into a factor as it can only get the values 0.5, 1 and 2 (mg/day)
table(ToothGrowth$dose, ToothGrowth$supp)
##
## OJ VC
## 0.5 10 10
## 1 10 10
## 2 10 10
Summary of the three variables of the dataframe, and also add separetly the standard deviation of len (as it is the only continious variable)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
sd(ToothGrowth$len)
## [1] 7.649315
Next a boxplot visualization which is divided in two facets depending on the type of supplement (orange juice or ascorbic acid) and divided by dose.
p <- ggplot(ToothGrowth, aes(x=factor(dose),y=len,fill=factor(dose)))
p + geom_boxplot() + facet_grid(.~supp) +
scale_x_discrete("Dose (Milligrams per day)") +
scale_y_continuous("Tooth length") +
ggtitle("Boxplot separating by Supplement Type and amount of Dosage") + theme_minimal() + scale_fill_brewer(palette="Spectral")
First we compare the length of tooth depending on the dose by using a t.test.
t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 0.5)) # For doses 1 and 2
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 1)) # For doses 0.5 and 2
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 2)) # For doses 0.5 and 1
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
Second we compare the length of tooth depending on the dose, but also separating by type of supplement. First for VC
t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 0.5 & ToothGrowth$supp == 'VC')) # For doses 1 and 2
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.054267 -5.685733
## sample estimates:
## mean in group 1 mean in group 2
## 16.77 26.14
t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 1 & ToothGrowth$supp == 'VC')) # For doses 0.5 and 2
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -10.3878, df = 14.327, p-value = 4.682e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -21.90151 -14.41849
## sample estimates:
## mean in group 0.5 mean in group 2
## 7.98 26.14
t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 2 & ToothGrowth$supp == 'VC')) # For doses 0.5 and 1
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.265712 -6.314288
## sample estimates:
## mean in group 0.5 mean in group 1
## 7.98 16.77
Second for OJ
t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 0.5 & ToothGrowth$supp == 'OJ')) # For doses 1 and 2
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.5314425 -0.1885575
## sample estimates:
## mean in group 1 mean in group 2
## 22.70 26.06
t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 1 & ToothGrowth$supp == 'OJ')) # For doses 0.5 and 2
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -7.817, df = 14.668, p-value = 1.324e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -16.335241 -9.324759
## sample estimates:
## mean in group 0.5 mean in group 2
## 13.23 26.06
t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 2 & ToothGrowth$supp == 'OJ')) # For doses 0.5 and 1
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.415634 -5.524366
## sample estimates:
## mean in group 0.5 mean in group 1
## 13.23 22.70
After the exploratory analysis and supported by the t.tests applied I can conclude the next ideas: