Overview:

In this project we will Analyze the ToothGrowth data in the R datasets package. In addtion to exploratory data analysis, Student-T test is used to study confidence intervals and hypothesis tests to compare tooth growth by suppliment(Vitamin C and Orange Juice) and dose(0.5,1.0,2.0 mg).

Simulations:

  1. Load the ToothGrowth data and perform some basic exploratory data analyses
  2. Provide a basic summary of the data.
  3. Use confidence intervals and hypothesis tests to compare tooth growth by supp and dose.
  4. State the conclusions and the assumptions needed for the conclusions.

1. Load the ToothGrowth data and perform some basic exploratory data analyses

Load ToothGrowth Data

data("ToothGrowth")
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Basic exploratory data analysis

len supp dose
4.2 VC 0.5
11.5 VC 0.5
7.3 VC 0.5
5.8 VC 0.5
6.4 VC 0.5
10.0 VC 0.5

A single plot highlighting basic features of the data

boxplot(VC,OJ,range=0,names=c('Vitamin C','Orange Juice'),col=c("green","blue"),ylab="Growth Length")

2. Provide a basic summary of the data.

Basic summary of the data:

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

In the previous boxplot we can speculate that the overall effect of orange juice (OJ) on tooth growth is more than vitamin C (VC). When we also consider the does effect, we should obtain the following box plot.

VC5 <- ToothGrowth$len[1:10]
VC1<- ToothGrowth$len[11:20]
VC2<- ToothGrowth$len[21:30]
OJ5 <- ToothGrowth$len[31:40]
OJ1 <- ToothGrowth$len[41:50]
OJ2 <- ToothGrowth$len[51:60]
boxplot(VC5,VC1,VC2,OJ5,OJ1,OJ2,range=0,col = c("green","gold","royalblue","green","gold","royalblue"),
        names=c("VC0.5","VC1","VC2","OJ0.5","OJ1","OJ2"),ylab="Growth Length")

3. Use confidence intervals and hypothesis tests to compare tooth growth by supp and dose.

Here, we use the t-interval and student-t test we learned from the class for this analysis.

data <- ToothGrowth
dose5 <- subset(data, dose == 0.5) 
dose1 <- subset(data, dose == 1) 
dose2 <- subset(data, dose == 2) 

test5 <- t.test(len ~ supp, data= dose5, var.equal = FALSE, paired=FALSE ,conf.level = .95)
test1 <- t.test(len ~ supp, data= dose1, var.equal = FALSE, paired=FALSE ,conf.level = .95)
test2 <- t.test(len ~ supp, data= dose2, var.equal = FALSE, paired=FALSE ,conf.level = .95)


result <- data.frame( "t-statistic"  = c(test5$statistic,test1$statistic,test2$statistic), 
                       "df" = c(test5$parameter,test1$parameter,test2$parameter),
                       "p-value"  = c(test5$p.value,test1$p.value,test2$p.value),
                       "lower CL" = c(test5$conf.int[1],test1$conf.int[1],test2$conf.int[1]),
                       "upper CL" = c(test5 $conf.int[2],test1$conf.int[2],test2$conf.int[2]),
                       "OJ mean" = c(test5 $estimate[1],test1 $estimate[1],test2 $estimate[1]),
                       "VC mean" = c(test5 $estimate[2],test1 $estimate[2],test2 $estimate[2]),
        row.names = c("OJ vs VC at dose = 0.5","OJ vs VC at dose = 1","OJ vs VC at dose = 2" ))

knitr::kable(round (x = result, 3),align = 'c', 
      caption = "Summary of two sample t-test for tooth growth by supplement and dosage")
Summary of two sample t-test for tooth growth by supplement and dosage
t.statistic df p.value lower.CL upper.CL OJ.mean VC.mean
OJ vs VC at dose = 0.5 3.170 14.969 0.006 1.719 8.781 13.23 7.98
OJ vs VC at dose = 1 4.033 15.358 0.001 2.802 9.058 22.70 16.77
OJ vs VC at dose = 2 -0.046 14.040 0.964 -3.798 3.638 26.06 26.14

Therefore, since the p-value of OJ vs VC at dose = 0.5 and OJ vs VC at dose = 1 are less than 0.05, and also since their confidence interval does not contain 0, we conclude that there is a significant difference in the difference between their averages. However for OJ vs VC at dose = 2 the difference in the average is not significant since the p-value is not less than the confidence interval contains zero.

4. State the conclusions and the assumptions needed for conclusions.

Conclusions:

Based on the analysis performed the previous section, we can conclude that low levels of dosage (0.5 & 1.0) of orange juice are effective in tooth growth comparing to vitamin C. However the result from higher dosage (2.0) is uncertain whether there will be a greater effect from either OJ or VC.

Assumptions:

It assumes that the underlying data are iid Normal(Gaussian). We also assume they come from unequal variance populations.