This is the second part of the Statistical Inference Course Project. It will consist on basic inferential data analysis of the ToothGrowth data in the R datasets package. This data records the effect of vitamin C on tooth growth in guinea pigs.
library(plyr)
library(dplyr)
library(ggplot2)
library(datasets) #open datasets package
data("ToothGrowth") #load data
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Using the str function, we see that this dataset consists of 60 observations (n=60) and three variables:
len: tooth length (numeric)supp: supplement type (factor; VC or OJ)dose: dose in mg/day of Vitamin C (numeric; 0.5, 1, 2)Let’s do a quick comparision of supplement type at each dose level.
data=ToothGrowth
data$dose <- as.factor(data$dose) #make dose a factor
data$supp <- mapvalues(data$supp, from = c("OJ", "VC"), to = c("Orange Juice", "Ascorbic Acid")) #change supp level names
g <- ggplot(data, aes(x=dose, y=len))
g + facet_wrap(.~supp) +
geom_boxplot(aes(fill=dose)) +
labs(x="Dose mg/day", y="Tooth Length",
title="Guinea Pig Tooth Length vs. Vitamin C Dose")
Looking at the conditional boxplots, tooth length increases as vitamin C dosage increases. Also at 0.5 and 1 mg/day doses, orange juice (OJ) yields longer tooth length compared to ascorbic acid (VC). However, as 2 mg/day, tooth length seems unaffected by supplement type.
To support the summary, the table below shows the mean length of each supplement for each dose level.
#calculate average length per supp per dose
data %>%
group_by(supp, dose) %>%
summarise(average_length = mean(len))
## # A tibble: 6 x 3
## # Groups: supp [2]
## supp dose average_length
## <fct> <fct> <dbl>
## 1 Orange Juice 0.5 13.2
## 2 Orange Juice 1 22.7
## 3 Orange Juice 2 26.1
## 4 Ascorbic Acid 0.5 7.98
## 5 Ascorbic Acid 1 16.8
## 6 Ascorbic Acid 2 26.1
We will now compare the supplements across the dataset and at each dosage level.
Given m1 = mean len given OJ & m2 = mean len given VC;
H_o: m1 = m2
OJ <- data$len[data$supp=="Orange Juice"] #subset OJ lengths
VC <- data$len[data$supp=="Ascorbic Acid"] #subset VC lengths
test1 <- t.test(OJ, VC, paired = FALSE, var.equal = FALSE)
print(test1)
##
## Welch Two Sample t-test
##
## data: OJ and VC
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
With a p-value of 0.0606345, we do not reject H_o at an alpha level 0.05. Furthermore the 95% confidence interval of the two means includes 0, so this supports the result.
Given m1 = mean len given OJ & m2 = mean len given VC
H_o: m1 = m2 vs. H_a: m1 > m2
OJ_5 <- data$len[data$supp=="Orange Juice" & data$dose==0.5]
VC_5 <- data$len[data$supp=="Ascorbic Acid" & data$dose==0.5]
test2 <- t.test(OJ_5, VC_5, paired = FALSE, var.equal = FALSE)
print(test2)
##
## Welch Two Sample t-test
##
## data: OJ_5 and VC_5
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean of x mean of y
## 13.23 7.98
With a p-value of 0.0063586, we reject H_o at an alpha level 0.05. Furthermore the 95% confidence interval of the two means does not includes 0, so this supports the result.
Given m1 = mean len given OJ & m2 = mean len given VC;
H_o: m1 = m2 vs. H_a: m1 > m2
OJ_1 <- data$len[data$supp=="Orange Juice" & data$dose==1]
VC_1 <- data$len[data$supp=="Ascorbic Acid" & data$dose==1]
test3 <- t.test(OJ_1, VC_1, paired = FALSE, var.equal = FALSE)
print(test3)
##
## Welch Two Sample t-test
##
## data: OJ_1 and VC_1
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean of x mean of y
## 22.70 16.77
With a p-value of 0.0063586, we reject H_o at an alpha level 0.05. Furthermore the 95% confidence interval of the two means does not includes 0, so this supports the result.
Given m1 = mean len given OJ & m2 = mean len given VC;
H_o: m1 = m2 vs. H_a: m1 > m2
OJ_2 <- data$len[data$supp=="Orange Juice" & data$dose==2]
VC_2 <- data$len[data$supp=="Ascorbic Acid" & data$dose==2]
test4 <- t.test(OJ_2, VC_2, paired = FALSE, var.equal = FALSE)
print(test4)
##
## Welch Two Sample t-test
##
## data: OJ_2 and VC_2
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean of x mean of y
## 26.06 26.14
With a p-value of 0.0063586, we do not reject H_o at an alpha level 0.05. Furthermore the 95% confidence interval of the two means includes 0, so this supports the result.
Given the following assumptions:
We conclude that across doses there is no significant difference between the two supplements; orange juice (OJ) and ascorbic acid (VC).
However, looking at each dose level, there is a significant difference between OJ and VC at dose levels 0.5 mg/day and 1 mg/day. In both cases, OJ yielded a higher mean tooth length.
At a dose level of 2 mg/day there is no significant difference between supplements. This may account for the failure to reject the Test 1 null hypothesis.