In the second part of the project we will analyze the ToothGrowth data in the R datasets package.
data(ToothGrowth)
df <- ToothGrowth
str(df)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(df)
Basic summary
summary(df)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
Graphing the length distribution
df %>% ggplot(aes(x = len)) + geom_histogram(binwidth = 3, color="black", fill="gray", aes(y=..density..) ) +
geom_vline(xintercept = mean(df$len), color = "red", size=1.0) +
stat_function(fun=dnorm, args=list(mean= mean(df$len), sd=sd(df$len)), color="blue", size =1) +
stat_density(geom = "line", color = "red", size =1)
Let’s use the Shapiro-Wilk test of normality
stats::shapiro.test(df$len)
##
## Shapiro-Wilk normality test
##
## data: df$len
## W = 0.96743, p-value = 0.1091
From the the p-value > 0.05 the distribution of the data is not significantly different from normal distribution. We can assume the normality.
df %>% group_by(supp) %>%
summarise(Mean = mean(len))
df %>% group_by(dose) %>%
summarise(Mean = mean(len))
df %>% group_by(supp, dose) %>%
summarise(Mean = mean(len))
df %>% ggplot(aes(x = factor(supp), y = len)) + geom_boxplot(aes(fill = factor(dose)))
Ttest for dose .5 mg:
t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == .5, ])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
T-test for dose 1 mg:
t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == 1, ])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
T-test for dose 2 mg:
t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == 2, ])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
The p-values for the smaller dosages (0.5 and 1) are bellow 0.01, but are above it for the highest dosage. From that and from the graphs we can conclude that:
The type of supplement is relevant in smaller dosages with Orange Juice having a higher effect than vitamin C on teeth lenght For higher doses there is no difference between the types of supplementation.