Project Overview

In this project we analyze the ToothGrowth data set. The data set gives the results of an experiment to determine the effect of two supplements (Vitamin C and Orange Juice), each at three different doses (0.5, 1 or 2 mg) on tooth length in guinea pigs. The len variable gives the tooth growth, the supp variable gives the supplement type and the dose variable gives the supplement dose. We will perform an analysis to compare tooth growth by supp and dose.

Loading Necessary Libraries

For our analysis the following libraries need to be loaded.

library(datasets)
library(ggplot2)
library(dplyr)
library(knitr)
library(printr)

Looking at Data

We first look at the structure of the ToothGrowth data set:

data <- ToothGrowth
str(data)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

\(\;\)

Its also useful to look at the first few records of the data:

knitr::kable(head(ToothGrowth),align = 'c')
len supp dose
4.2 VC 0.5
11.5 VC 0.5
7.3 VC 0.5
5.8 VC 0.5
6.4 VC 0.5
10.0 VC 0.5

Basic Descriptive Statistics and Exploratory Data Analysis

In this section we provide a basic summary of the data as well as exploratory data analysis to better observe the supplements and dosage effects on tooth growth.

We start off with a basic summary of the three variables in the data set:

summary(data)
len supp dose
Min. : 4.20 OJ:30 Min. :0.500
1st Qu.:13.07 VC:30 1st Qu.:0.500
Median :19.25 NA Median :1.000
Mean :18.81 NA Mean :1.167
3rd Qu.:25.27 NA 3rd Qu.:2.000
Max. :33.90 NA Max. :2.000

\(\;\)

Now we find a summary statistics of tooth growth by supplement:

group_supp <- group_by(data, supp)
summary <- summarise(group_supp, count= n(), mean=mean(len), median=median(len), 
                     "standard deviation" = sd(len))
summaryData <- as.data.frame(summary)
kable (summaryData,digits = 3,align = 'c')
supp count mean median standard deviation
OJ 30 20.663 22.7 6.606
VC 30 16.963 16.5 8.266

The summary indicates that the average of tooth growth for orange juice is more than vitamin C. The Box plot of tooth length by supplement below also confirms a higher tooth growth for orange juice.

ggplot(data=data, aes(x=supp, y=len))+
        geom_boxplot(aes(fill=supp))+
        xlab("Supplement type") +
        ylab("Tooth length")+ 
        ggtitle(" Boxplot of tooth lenth by supplement type ")

\(\;\)

Now we find a summary statistics of tooth growth by supplement and dosage :

group_supp <- group_by(data, supp, dose)
summary <- summarise(group_supp, count= n(), mean=mean(len), median=median(len), 
                     "standard deviation" = sd(len))
summaryData <- as.data.frame(summary)
kable (summaryData,digits = 3,align = 'c')
supp dose count mean median standard deviation
OJ 0.5 10 13.23 12.25 4.460
OJ 1.0 10 22.70 23.45 3.911
OJ 2.0 10 26.06 25.95 2.655
VC 0.5 10 7.98 7.15 2.747
VC 1.0 10 16.77 16.50 2.515
VC 2.0 10 26.14 25.95 4.798

According to this result, the average tooth growth in both OJ and VC categories increase as dosage increases. Also the average of growth in orange juice category is higher for all dosage comparing to the vitamin C category. The Box plot below of the tooth length by supplement type and dosage also confirm this statement.

ggplot(data=data, aes(x=factor(dose), y=len)) +
        geom_boxplot(aes(fill=factor(dose)))+
        facet_grid(.~supp) +
        xlab("Dose") +
        ylab("Tooth length")+
        ggtitle(" Boxplot of tooth lenth by supplement type and does")

Hypothesis Test for Tooth Growth Comparison

As mentioned in the project overview, the goal of this analysis is to analyze the effect of two supplements (Vitamin C and Orange Juice), each at three different doses (0.5, 1 or 2 mg) on tooth length in guinea pigs. In the previous section we speculated that the overall effect of orange juice (OJ) on tooth growth is more than vitamin C (VC). In this section we try to verify this statement by performing several hypothesis test.

Testing tooth growth by supplement

First we explore the effect of supplement type on tooth growth by performing a two-sample t-test for the difference in tooth length by supplement (without including the dosage). We assume a confidence level of 95% and unequal variances.

test <- t.test(len ~ supp, data= data, var.equal = FALSE, paired=FALSE ,conf.level = .95)

result1 <- data.frame( "t-statistic"  = test$statistic, 
                       "df" = test$parameter,
                        "p-value"  = test$p.value,
                        "lower CL" = test$conf.int[1],
                        "upper CL" = test$conf.int[2],
                        "OJ mean" = test$estimate[1],
                        "VC mean" = test$estimate[2],
                         row.names = "OJ vs VC ")

kable(x = round(result1,3),align = 'c' ,
      caption = "Summary of two sample t-test for tooth growth by supplement")
Summary of two sample t-test for tooth growth by supplement
t.statistic df p.value lower.CL upper.CL OJ.mean VC.mean
OJ vs VC 1.915 55.309 0.061 -0.171 7.571 20.663 16.963

The fore, we do not reject the Null hypothesis , as the p.value is 0.061 which is greater than the 0.05 threshold (in a test with 95% confidence), and also the confidence interval contains 0. Therefor we dont have sufficient statistical evidence for the difference in the averages.

Testing tooth growth by supplement and dosage

Next we explore the effect of supplement type by dosage by performing a two-sample t-test for the difference in tooth length by supplement at each different dosage. Again we assume a confidence level of 95% and unequal variances,

data_does0.5 <- filter(data, dose==0.5) 
data_does1 <- filter(data, dose==1) 
data_does2 <- filter(data, dose==2) 

test0.5 <- t.test(len ~ supp, data= data_does0.5, var.equal = FALSE, paired=FALSE ,conf.level = .95)
test1 <- t.test(len ~ supp, data= data_does1, var.equal = FALSE, paired=FALSE ,conf.level = .95)
test2 <- t.test(len ~ supp, data= data_does2, var.equal = FALSE, paired=FALSE ,conf.level = .95)


result2 <- data.frame( "t-statistic"  = c(test0.5$statistic,test1$statistic,test2$statistic), 
                       "df" = c(test0.5$parameter,test1$parameter,test2$parameter),
                        "p-value"  = c(test0.5$p.value,test1$p.value,test2$p.value),
                        "lower CL" = c(test0.5$conf.int[1],test1$conf.int[1],test2$conf.int[1]),
                        "upper CL" = c(test0.5 $conf.int[2],test1$conf.int[2],test2$conf.int[2]),
                        "OJ mean" = c(test0.5 $estimate[1],test1 $estimate[1],test2 $estimate[1]),
                        "VC mean" = c(test0.5 $estimate[2],test1 $estimate[2],test2 $estimate[2]),
                         row.names = c("OJ vs VC at dose = 0.5","OJ vs VC at dose = 1","OJ vs VC at dose = 2" ))

kable(round (x = result2, 3),align = 'c', 
      caption = "Summary of two sample t-test for tooth growth by supplement and dosage")
Summary of two sample t-test for tooth growth by supplement and dosage
t.statistic df p.value lower.CL upper.CL OJ.mean VC.mean
OJ vs VC at dose = 0.5 3.170 14.969 0.006 1.719 8.781 13.23 7.98
OJ vs VC at dose = 1 4.033 15.358 0.001 2.802 9.058 22.70 16.77
OJ vs VC at dose = 2 -0.046 14.040 0.964 -3.798 3.638 26.06 26.14

\(\;\)

Therefore, since the p-value of OJ vs VC at dose = 0.5 and OJ vs VC at dose = 1 are less than 0.05, and also since their confidence interval does not contain 0, we conclude that there is a significant difference in the difference between their averages. However for OJ vs VC at dose = 2 the difference in the average is not significant since the p-value is not less than the confidence interval contains zero.

Final Conclusion

Based on the analysis performed the previous section, we can conclude that low levels of dosage (0.5 & 1.0) of orange juice are effective in tooth growth comparing to vitamin C. However the result from higher dosage (2.0) is uncertain whether there will be a greater effect from either OJ or VC.