Statistical Inference Course Project

In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Now in the second portion of the class, we’re going to analyze the ToothGrowth data in the R datasets package.

library(datasets)

1 Load the ToothGrowth data and perform some basic exploratory data analyses

data(ToothGrowth)
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...

?ToothGrowth

2 Provide a basic summary of the data.

#### Summary of ToothGrowth
summary(ToothGrowth)

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

#### Verifying Mean of Len variable by Supply
MeanSupp <- split(ToothGrowth$len, ToothGrowth$dose)
sapply(MeanSupp,mean)

##    0.5      1      2 
## 10.605 19.735 26.100

Displaying Graphs of ToothGrowth as a function of Supplement and/or Dose

library(ggplot2)
library(ggthemes)

Bar Graph

ggplot(data=ToothGrowth, aes(x=dose, y=len, fill=supp)) + geom_bar(stat="identity",) + ggtitle("Tooth Growth as a Factor of Dosage and Vitamin Source") +theme_economist() + scale_colour_economist() + facet_grid(. ~ supp) + xlab("Dose in miligrams") + ylab("Tooth length") + guides(fill=guide_legend(title="Supplement type"))

Scatter Plot

library(lattice)
xyplot(len~dose|supp, ToothGrowth,
       main="Scatterplot by Supplement and Dose",
       ylab="Length", xlab="Dose")

Box Plot

boxplot(len ~ supp * dose,ToothGrowth, col="green", ylab="Length", xlab="Supplement type & Dose",main="Boxplot by Supplement  and Dose")

3 Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

len<-ToothGrowth$len
supp<-ToothGrowth$supp
dose<-ToothGrowth$dose
sapply(MeanSupp, var)

##      0.5        1        2 
## 20.24787 19.49608 14.24421

Note: The variances of the two groups appears to be significant, so in the T-test we will not assume equality of variances, and our Hypothesis is: #### Null Hypothesis: OJ and VC show no significant Tooth growth, thus we’ll first test where both pairs are assumed to be FALSE

t.test(len[supp=="OJ"], len[supp=="VC"], paired = FALSE, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  len[supp == "OJ"] and len[supp == "VC"]
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

Note: The OJ and VC p-value was 0.06, which at a significance level is greater than 5%. In statistics, the p-value is a function of the observed sample results (a statistic) that is used for testing a statistical hypothesis. More specifically, the p-value is defined as the probability of obtaining a result equal to or “more extreme” than what was actually observed, assuming that the hypothesis under consideration is true.

4 State your conclusions and the assumptions needed for your conclusions.

Now we should test variables assumed to be equal, the null hypotheses of equal means between the two groups, versus the alternative hypothesis that the two means are different. This means that a two sided test will be performed, and so the confidence interval is absolutely equivalent to the t-test.

t.test(len[dose==2], len[dose==1], paired = FALSE, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  len[dose == 2] and len[dose == 1]
## t = 4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.735613 8.994387
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

The p-value of this test is 0; which implies that the NULL Hypothesis is FALSE. The 95% confidence interval is (-0.1710156 7.5710156) proves the value of zero is not included.

Let’s check the multiple comparisons by adjusting for FWER and FDR:

pValues<-c(t.test(len[supp=="OJ"], len[supp=="VC"], paired = FALSE, var.equal = FALSE)$p.value,
           t.test(len[dose==2], len[dose==1], paired = FALSE, var.equal = TRUE)$p.value,
           t.test(len[dose==1], len[dose==0.5], paired = FALSE, var.equal = TRUE)$p.value,
           t.test(len[dose==2], len[dose==0.5], paired = FALSE, var.equal = TRUE)$p.value)
# Controls FWER
sum(p.adjust(pValues, method = "bonferroni") < 0.05)

## [1] 3

# Controls FDR
sum(p.adjust(pValues, method = "BH") < 0.05)

## [1] 3

Only the first p-value is considered large and in the plots below, it is the p-value of the first test we made that shows up in the top right corner of the plots, and the other three falling one on another at the bottom left corner.

par(mfrow = c(1, 2))
plot(pValues, p.adjust(pValues, method = "bonferroni"), pch = 19)
plot(pValues, p.adjust(pValues, method = "BH"), pch = 19)

Conclusions:

This study of tooth growth effected by supplement type and dose reflects that there is a relationship between tooth length and dosage amount, as the dosage increases it creates a positive effect on tooth length, but does NOT show a significant variance in supplement type (as shown in our graphs and our T test).

However, studies of the potential of vitamin C supplementation to provide health benefits have provided conflicting results (source: wikipedia.org). Large doses of vitamin C can result in toxic reactions, but studies have shown that under certain conditions there maybe some benefits of high-dose vitamin therapy.

Major Assumption:

This study does not reflect, minimal nor maximal dosage impacts of vitamin C (supplements or natural source) on tooth growth (show in the above summary chart only 3 variables of dosage types, plus there was no control group)

Note: Mammals that are unable to synthesize vitamin C include humans and other primates including guinea pigs.

Statistical Inference Course Project - Part 2

Al Shain

October 22, 2015