======================================

Title: “Statistical Inference - Course Project (Part 2)”

Date: “December 23, 2015”

======================================

Synopsis

This is the project for the statistical inference class (Part 2). In this part, the effect of Vitamin C on tooth growth in guinea pigs will be discussed.

Problem Description

The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid). (Description from the ToothGrowth Documentation)

Question 1 Load the ToothGrowth data and perform some basic exploratory data analyses

Loading Libraries

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
library(graphics)

Load the dataset ToothGrowth

data("ToothGrowth")
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

convert dose variable from numeric to factor

ToothGrowth$dose <- as.factor(ToothGrowth$dose)

Exploratory Data Analyses: Below is the scatterplots by supplement type and dose.

coplot(len ~ dose | supp, data = ToothGrowth, panel = panel.smooth,
       ylab = "Length", las = 2,
       xlab = c("Dose (mg)", "
       Tooth Growth given by Orange Juice (Left) or Ascorbic Acid (Right)"))

Below is the boxplots by supplement type and dosage. Scatterplots and boxplots show that Orange Juice (OJ) cause greater ToothGrowth compare to Ascorbic Acid (VC).

boxplot(len ~ dose * supp, data = ToothGrowth, xlab = "Dose in milligrams & Supplement Type", ylab = "Tooth Length", col = "green")

Question 2 Provide a basic summary of the data

Summary of Tooth Growth

summary(ToothGrowth)
##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

Summary of supplement type Orange Juice (OJ)

summary(ToothGrowth[ToothGrowth$supp=="OJ",])
##       len        supp     dose   
##  Min.   : 8.20   OJ:30   0.5:10  
##  1st Qu.:15.53   VC: 0   1  :10  
##  Median :22.70           2  :10  
##  Mean   :20.66                   
##  3rd Qu.:25.73                   
##  Max.   :30.90

Summary of supplement type Ascorbic Acid (VC)

summary(ToothGrowth[ToothGrowth$supp=="VC",])
##       len        supp     dose   
##  Min.   : 4.20   OJ: 0   0.5:10  
##  1st Qu.:11.20   VC:30   1  :10  
##  Median :16.50           2  :10  
##  Mean   :16.96                   
##  3rd Qu.:23.10                   
##  Max.   :33.90

Question 3 Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

The following t.test is to find the difference in the performance of the supplement treatments. Also interested in looking for p-value > 0.05 and confidence interval contains zero.

t.test(len ~ supp, data = ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

The p-value is 0.06 (> 0.05) and the confidence interval contains zero. This indicates that there is no enough evidence to reject the null hypothesis (different supplement types have no effect on tooth length)

Perform t.test to check for group differences based on different dose levels, by assuming equal variance

Dose level: (1.0,2.0)

t.test(ToothGrowth$len[ToothGrowth$dose==2], ToothGrowth$len[ToothGrowth$dose==1], paired = FALSE, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 2] and ToothGrowth$len[ToothGrowth$dose == 1]
## t = 4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.735613 8.994387
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

Dose level: (0.5,1.0)

t.test(ToothGrowth$len[ToothGrowth$dose==1], ToothGrowth$len[ToothGrowth$dose==0.5], paired = FALSE, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 1] and ToothGrowth$len[ToothGrowth$dose == 0.5]
## t = 6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.276252 11.983748
## sample estimates:
## mean of x mean of y 
##    19.735    10.605

For the dose level pairs t.test conducted, the p-values are very close to zero (< 0.05) and the confidence interval does not contain zero. The mean tooth length increases on raising the dose level indicates that we can reject the null hypothesis.

Question 4 State your conclusions and the assumptions needed for your conclusions.

Conclusions

  1. Different supplement types does not have effect on tooth length.
  2. Increase in dose level increases the tooth growth.

Assumptions

  1. For the t.test, a common variance is assumed in the guinea pigs population (var.equal = TRUE)
  2. Assumed that guinea pigs were randomly assigned to a combination of dosage and supplement type (paired = FALSE; independent samples methodology).