In this project, I am going to analyze the ToothGrowth data of the R data sets package which describes the effect of vitamin C on tooth growth in guinea pigs. The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
The ToothGrowth data set is a data frame with 60 observations on 3 variables:
len: Tooth length in millimeters (numeric)supp: Supplement type (factor variable with levels VC and OJ)dose: Dose in milligrams (numeric)Since dose has only three dose levels of Vitamin C (0.5, 1, and 2 mg), I will treat it as a factor in the subsequent analysis.
In total, we deal with 60 unique guinea pigs. Each pig was assigned to a group which received a specific dose level of vitamin C. In addition, in each of those groups, two different delivery methods (orange juice or ascorbic acid) were applied leaving 10 pigs per subgroup.
The average guinea pig tooth length is 18.813 with a standard deviation of 7.65.
Let’s jump into dataset
#Structure of Toothgrowth data
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
#Top 6 rows of Toothgrowth data
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
#Distribution among dose and supp of Toothgrowth data
with(ToothGrowth, table(dose, supp))
## supp
## dose OJ VC
## 0.5 10 10
## 1 10 10
## 2 10 10
aggregate(ToothGrowth$len,list(ToothGrowth$supp,ToothGrowth$dose),mean)
## Group.1 Group.2 x
## 1 OJ 0.5 13.23
## 2 VC 0.5 7.98
## 3 OJ 1.0 22.70
## 4 VC 1.0 16.77
## 5 OJ 2.0 26.06
## 6 VC 2.0 26.14
aggregate(ToothGrowth$len,list(ToothGrowth$supp,ToothGrowth$dose),sd)
## Group.1 Group.2 x
## 1 OJ 0.5 4.459709
## 2 VC 0.5 2.746634
## 3 OJ 1.0 3.910953
## 4 VC 1.0 2.515309
## 5 OJ 2.0 2.655058
## 6 VC 2.0 4.797731
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
ggplot(ToothGrowth, aes(x = dose, y = len, fill = dose))+
geom_boxplot()+
facet_grid(. ~ supp)+
labs(title = "Tooth Length vs. Dose for supplement type OJ & VC",
x = "Doses", y = "Tooth Length")
From the above plot, we can see that dose 0.5 and 1 of Orange juice has better effect on teeth growth of pigs than ascorbic acid. To test this hypothesis, lets conduct t-test.
OJ_w_dose0.5 <- subset(ToothGrowth, supp == "OJ" & dose == "0.5")
VC_w_dose0.5 <- subset(ToothGrowth, supp == "VC" & dose == "0.5")
dose0.5 <- tidy(t.test(OJ_w_dose0.5$len, VC_w_dose0.5$len))
dose0.5
## estimate estimate1 estimate2 statistic p.value parameter conf.low
## 1 5.25 13.23 7.98 3.169733 0.006358607 14.96875 1.719057
## conf.high
## 1 8.780943
OJ_w_dose1 <- subset(ToothGrowth, supp == "OJ" & dose == "1")
VC_w_dose1 <- subset(ToothGrowth, supp == "VC" & dose == "1")
dose1 <- tidy(t.test(OJ_w_dose1$len, VC_w_dose1$len))
dose1
## estimate estimate1 estimate2 statistic p.value parameter conf.low
## 1 5.93 22.7 16.77 4.03277 0.001038376 15.35767 2.802148
## conf.high
## 1 9.057852
OJ_w_dose2 <- subset(ToothGrowth, supp == "OJ" & dose == "2")
VC_w_dose2 <- subset(ToothGrowth, supp == "VC" & dose == "2")
dose2 <- tidy(t.test(OJ_w_dose2$len, VC_w_dose2$len))
dose2
## estimate estimate1 estimate2 statistic p.value parameter conf.low
## 1 -0.08 26.06 26.14 -0.0461361 0.9638516 14.03982 -3.79807
## conf.high
## 1 3.63807
Therefore, since the p-value of OJ vs VC at dose = 0.5 and OJ vs VC at dose = 1 are less than 0.05, and also since their confidence interval does not contain 0, we conclude that there is a significant difference in the difference between their averages. However for OJ vs VC at dose = 2 the difference in the average is not significant since the p-value is not less than the confidence interval that contains zero.
In conclusion, We are 95% confident that dose 0.5 and dose 1 of OJ result in longer tooth length than dose 0.5 and dose 1 of VC. However, at the highest dose of 2, there is no statistically significant difference between the effects of OJ and VC. This concludes this assignment.