Statistical Inference - Course Project (part II)

In this report, we’re going to analyze the ToothGrowth data in the R datasets package.

Load the ToothGrowth data and perform some basic exploratory data analyses

   library(datasets)
   data <- ToothGrowth
   head(data)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

ToothGrowth has 60 observations of 3 variables The breakdown of supllements is as follows:

 summary(ToothGrowth$supp)

## OJ VC 
## 30 30

There are 3 variables in this data set:

 names(ToothGrowth)

## [1] "len"  "supp" "dose"

Factor levels for variables: supp and dose

  unique(ToothGrowth$supp)

## [1] VC OJ
## Levels: OJ VC

  unique(ToothGrowth$dose)

## [1] 0.5 1.0 2.0

60 patients received two kinds of supplement in dose varying from Min. : 0.5 to Max. : 2

Box plot of supp vs. len

plot(ToothGrowth$supp, ToothGrowth$len, main="supplement vs. length of tooth")

plot of chunk unnamed-chunk-5

From the above box plot, we may notice:

‘OJ’ has a higher median than ‘VC’.
Lengths associated with ‘OJ’ have a smaller variability comparing ot its counterpart in ‘VC’.

Density plot for different doses

library(ggplot2)
g <- ggplot(data,aes(x=len))
g + geom_density(aes(x=len,colour=supp)) + 
        facet_grid(. ~ dose,scales="free") +
        labs(title="Density plot for different doses")

plot of chunk unnamed-chunk-6

Density plot for different supplements

g + geom_density(aes(x=len,colour=as.factor(dose))) + 
        facet_grid(. ~ supp,scales="free") +
        labs(title="Density plot for different supplements")

plot of chunk unnamed-chunk-7

### Plot Graph
ggplot(data=ToothGrowth, aes(x=as.factor(dose), y=len, fill=supp)) +
    geom_bar(stat="identity",) +
    facet_grid(. ~ supp) +
    xlab("Dose in miligrams") +
    ylab("Tooth length") +
    guides(fill=guide_legend(title="Supplement type"))

plot of chunk unnamed-chunk-8 Based on the above figure, it is clear that there is a positive correlation between the tooth length and the dose level with respect to both delivery methods.

Analyze median tooth length by dose and supplement:

  aggregate(list(Median.Length = ToothGrowth$len), by = list(Dose = ToothGrowth$dose, Supplement = ToothGrowth$supp), FUN = median)

##   Dose Supplement Median.Length
## 1  0.5         OJ         12.25
## 2  1.0         OJ         23.45
## 3  2.0         OJ         25.95
## 4  0.5         VC          7.15
## 5  1.0         VC         16.50
## 6  2.0         VC         25.95

Generally, increases in dose seem to correlate with increases in tooth length. This is true of both types of supplements. The median tooth length at lower dosages (0.5 mg and 1.0 mg) was lower for VC than OJ; however, the VC observations did not increase as much overall between the dosage increases, resulting in the same median tooth length of 25.95 at 2.0 mg dosage for both delivery methods.

Provide a basic summary of the data.

summary(data)

##       len       supp         dose     
##  Min.   : 4.2   OJ:30   Min.   :0.50  
##  1st Qu.:13.1   VC:30   1st Qu.:0.50  
##  Median :19.2           Median :1.00  
##  Mean   :18.8           Mean   :1.17  
##  3rd Qu.:25.3           3rd Qu.:2.00  
##  Max.   :33.9           Max.   :2.00

Use confidence intervals and hypothesis tests to compare tooth growth by supp and dose

Overview of dataset used

This data set contains 3 columns: - len: tooth length - supp: suplement type used (VC: ascorbic acid, OJ: orange juice) - dose: vitamin C dose in milligrams. The data set includes 60 observations. They took place across 10 guinea pigs. Each guinea pig was observed at each of the three dose levels of Vitamin C with each of the two delivery methods.

Hypothesis Testing

Test #1:

h0: The difference in mean tooth length when given a Vitamin C dose of 2.0 mg vs 0.5 mg is 0.

h1: The difference in mean tooth length when given a Vitamin C dose of 2.0 mg vs 0.5 mg is different than 0.

   TG1 <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5, 2.0))
   t.test(len ~ dose, var.equal = FALSE, data = TG1)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.8, df = 36.88, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.16 -12.83
## sample estimates:
## mean in group 0.5   mean in group 2 
##             10.61             26.10

Since the p-value < .05, the null hypothesis h0 can be rejected and the alternative hypothesis h1 can be accepted.

Test #2:

h0: The difference in mean tooth length when given a Vitamin C dose of 1.0 mg vs 0.5 mg is 0.

h1: The difference in mean tooth length when given a Vitamin C dose of 1.0 mg vs 0.5 mg is different than 0.

   TG2 <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5, 1.0))
   t.test(len ~ dose, var.equal = FALSE, data = TG2)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.477, df = 37.99, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.984  -6.276
## sample estimates:
## mean in group 0.5   mean in group 1 
##             10.61             19.73

Since the p-value < .05, the null hypothesis h0 can be rejected and the alternative hypothesis h1 can be accepted.

Test #3:

h0: The difference in mean tooth length when given a Vitamin C dose of 2.0 mg vs 1.0 mg is 0.

h1: The difference in mean tooth length when given a Vitamin C dose of 2.0 mg vs 1.0 mg is different than 0.

   TG3 <- subset(ToothGrowth, ToothGrowth$dose %in% c(2.0, 1.0))
   t.test(len ~ dose, var.equal = FALSE, data = TG3)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.901, df = 37.1, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996 -3.734
## sample estimates:
## mean in group 1 mean in group 2 
##           19.73           26.10

Since the p-value < .05, the null hypothesis h0 can be rejected and the alternative hypothesis h1 can be accepted.

Test #4:

h0: The difference in mean tooth length when given a Vitamin C via Orange Juice (OJ) vs. Ascorbic Acid (VC) is 0.

h1: The difference in mean tooth length when given a Vitamin C dose via Orange Juice (OJ) vs. Ascorbic Acid (VC) is different than 0.

   t.test(len ~ supp, var.equal = FALSE, data = ToothGrowth)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.915, df = 55.31, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.171  7.571
## sample estimates:
## mean in group OJ mean in group VC 
##            20.66            16.96

Since the p-value > .05, the null hypothesis h0 cannot be rejected. However, this does not mean that the alternative hypothesis h1 can be accepted.

Conclusions

By increasing dosages of Vitamin C, there are distinct increases in tooth size based on both the exploratory data analysis performed and the first three t-tests above. We do not have enough data to determine (with 95% confidence) a difference in tooth size between the two delivery methods (orange juice and ascorbic acid); however, we also cannot determine that there is no correlation.

Assumptions

All of the t-tests performed:

did not assume the same population variance between the two groups being compared, allowing for a more robust comparison.
assumed that the data is normally distributed.
assumed that the data distributions are not skewed.

Statistical Inference - Course Project (part II)

Noha Elprince

October 25, 2014

Load the ToothGrowth data and perform some basic exploratory data analyses

Provide a basic summary of the data.

Use confidence intervals and hypothesis tests to compare tooth growth by supp and dose

Overview of dataset used

Hypothesis Testing

Conclusions

Assumptions