Overview of the assignment

In this project, I am going to analyze the ToothGrowth data of the R data sets package which describes the effect of vitamin C on tooth growth in guinea pigs. The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).

The ToothGrowth data set is a data frame with 60 observations on 3 variables:

Since dose has only three dose levels of Vitamin C (0.5, 1, and 2 mg), I will treat it as a factor in the subsequent analysis.

In total, we deal with 60 unique guinea pigs. Each pig was assigned to a group which received a specific dose level of vitamin C. In addition, in each of those groups, two different delivery methods (orange juice or ascorbic acid) were applied leaving 10 pigs per subgroup.

Exploratory data analysis

The average guinea pig tooth length is 18.813 with a standard deviation of 7.65.

Let’s jump into dataset

#Structure of Toothgrowth data
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
#Top 6 rows of Toothgrowth data
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
#Distribution among dose and supp of Toothgrowth data
with(ToothGrowth, table(dose, supp))
##      supp
## dose  OJ VC
##   0.5 10 10
##   1   10 10
##   2   10 10
Let’s look into mean of tooth length by dose and supplement type
aggregate(ToothGrowth$len,list(ToothGrowth$supp,ToothGrowth$dose),mean)
##   Group.1 Group.2     x
## 1      OJ     0.5 13.23
## 2      VC     0.5  7.98
## 3      OJ     1.0 22.70
## 4      VC     1.0 16.77
## 5      OJ     2.0 26.06
## 6      VC     2.0 26.14
Let’s look into Standard Deviation of tooth length by dose and supplement type
aggregate(ToothGrowth$len,list(ToothGrowth$supp,ToothGrowth$dose),sd)
##   Group.1 Group.2        x
## 1      OJ     0.5 4.459709
## 2      VC     0.5 2.746634
## 3      OJ     1.0 3.910953
## 4      VC     1.0 2.515309
## 5      OJ     2.0 2.655058
## 6      VC     2.0 4.797731
convert dose to factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
Let’s create a plot with length varying on dose and supplement type
ggplot(ToothGrowth, aes(x = dose, y = len, fill = dose))+
                  geom_boxplot()+
                  facet_grid(. ~ supp)+
                  labs(title = "Tooth Length vs. Dose for supplement type OJ & VC",
                  x = "Doses", y = "Tooth Length")

Hypothesis tests

From the above plot, we can see that dose 0.5 and 1 of Orange juice has better effect on teeth growth of pigs than ascorbic acid. To test this hypothesis, lets conduct t-test.

Comparision of Dose 0.5 for OJ and VC

OJ_w_dose0.5 <- subset(ToothGrowth, supp == "OJ" & dose == "0.5") 
VC_w_dose0.5 <- subset(ToothGrowth, supp == "VC" & dose == "0.5") 
dose0.5 <- tidy(t.test(OJ_w_dose0.5$len, VC_w_dose0.5$len))
dose0.5
##   estimate estimate1 estimate2 statistic     p.value parameter conf.low
## 1     5.25     13.23      7.98  3.169733 0.006358607  14.96875 1.719057
##   conf.high
## 1  8.780943

Comparision of Dose 1 for OJ and VC

OJ_w_dose1 <- subset(ToothGrowth, supp == "OJ" & dose == "1") 
VC_w_dose1 <- subset(ToothGrowth, supp == "VC" & dose == "1") 
dose1 <- tidy(t.test(OJ_w_dose1$len, VC_w_dose1$len))
dose1
##   estimate estimate1 estimate2 statistic     p.value parameter conf.low
## 1     5.93      22.7     16.77   4.03277 0.001038376  15.35767 2.802148
##   conf.high
## 1  9.057852

Comparision of Dose 2 for OJ and VC

OJ_w_dose2 <- subset(ToothGrowth, supp == "OJ" & dose == "2") 
VC_w_dose2 <- subset(ToothGrowth, supp == "VC" & dose == "2") 
dose2 <- tidy(t.test(OJ_w_dose2$len, VC_w_dose2$len))
dose2
##   estimate estimate1 estimate2  statistic   p.value parameter conf.low
## 1    -0.08     26.06     26.14 -0.0461361 0.9638516  14.03982 -3.79807
##   conf.high
## 1   3.63807

Conclusion

Therefore, since the p-value of OJ vs VC at dose = 0.5 and OJ vs VC at dose = 1 are less than 0.05, and also since their confidence interval does not contain 0, we conclude that there is a significant difference in the difference between their averages. However for OJ vs VC at dose = 2 the difference in the average is not significant since the p-value is not less than the confidence interval that contains zero.

In conclusion, We are 95% confident that dose 0.5 and dose 1 of OJ result in longer tooth length than dose 0.5 and dose 1 of VC. However, at the highest dose of 2, there is no statistically significant difference between the effects of OJ and VC. This concludes this assignment.