In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
Now in the second portion of the class, we’re going to analyze the ToothGrowth data in the R datasets package.
library(datasets)
data(ToothGrowth)
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...
?ToothGrowth
#### Summary of ToothGrowth
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
#### Verifying Mean of Len variable by Supply
MeanSupp <- split(ToothGrowth$len, ToothGrowth$dose)
sapply(MeanSupp,mean)
## 0.5 1 2
## 10.605 19.735 26.100
library(ggplot2)
library(ggthemes)
ggplot(data=ToothGrowth, aes(x=dose, y=len, fill=supp)) + geom_bar(stat="identity",) + ggtitle("Tooth Growth as a Factor of Dosage and Vitamin Source") +theme_economist() + scale_colour_economist() + facet_grid(. ~ supp) + xlab("Dose in miligrams") + ylab("Tooth length") + guides(fill=guide_legend(title="Supplement type"))
library(lattice)
xyplot(len~dose|supp, ToothGrowth,
main="Scatterplot by Supplement and Dose",
ylab="Length", xlab="Dose")
boxplot(len ~ supp * dose,ToothGrowth, col="green", ylab="Length", xlab="Supplement type & Dose",main="Boxplot by Supplement and Dose")
len<-ToothGrowth$len
supp<-ToothGrowth$supp
dose<-ToothGrowth$dose
sapply(MeanSupp, var)
## 0.5 1 2
## 20.24787 19.49608 14.24421
t.test(len[supp=="OJ"], len[supp=="VC"], paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: len[supp == "OJ"] and len[supp == "VC"]
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
Now we should test variables assumed to be equal, the null hypotheses of equal means between the two groups, versus the alternative hypothesis that the two means are different. This means that a two sided test will be performed, and so the confidence interval is absolutely equivalent to the t-test.
t.test(len[dose==2], len[dose==1], paired = FALSE, var.equal = TRUE)
##
## Two Sample t-test
##
## data: len[dose == 2] and len[dose == 1]
## t = 4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.735613 8.994387
## sample estimates:
## mean of x mean of y
## 26.100 19.735
The p-value of this test is 0; which implies that the NULL Hypothesis is FALSE. The 95% confidence interval is (-0.1710156 7.5710156) proves the value of zero is not included.
Let’s check the multiple comparisons by adjusting for FWER and FDR:
pValues<-c(t.test(len[supp=="OJ"], len[supp=="VC"], paired = FALSE, var.equal = FALSE)$p.value,
t.test(len[dose==2], len[dose==1], paired = FALSE, var.equal = TRUE)$p.value,
t.test(len[dose==1], len[dose==0.5], paired = FALSE, var.equal = TRUE)$p.value,
t.test(len[dose==2], len[dose==0.5], paired = FALSE, var.equal = TRUE)$p.value)
# Controls FWER
sum(p.adjust(pValues, method = "bonferroni") < 0.05)
## [1] 3
# Controls FDR
sum(p.adjust(pValues, method = "BH") < 0.05)
## [1] 3
Only the first p-value is considered large and in the plots below, it is the p-value of the first test we made that shows up in the top right corner of the plots, and the other three falling one on another at the bottom left corner.
par(mfrow = c(1, 2))
plot(pValues, p.adjust(pValues, method = "bonferroni"), pch = 19)
plot(pValues, p.adjust(pValues, method = "BH"), pch = 19)