In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.
library(ggplot2)
library(dplyr)
The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. - Has mean = 1/lambda & standard deviation = 1/lambda. - Set lambda = 0.2 - Investigate the distribution of averages of 40 exponentials. - Note that you will need to do a 1000 simulations.
lambda = 0.2
# theoretical Mean & standard deviation
actual_mean = 1/0.2
actual_sd = 1/0.2
# simulate 1000
sample_means = NULL
for (i in 1 : 1000) sample_means = c(sample_means, mean(rexp(40, lambda)))
ggplot() +
aes(sample_means) +
geom_histogram(fill="#69b3a2", color="#e9ecef", alpha=0.9) +
ggtitle("Distribution of 1000 simulations of averages of 40 exponentials") +
geom_vline(xintercept = mean(sample_means), color="blue")
Actual Mean = 5 and the simulated mean = 4.9871958
Actual sd = 5 and the simulated mean = 0.7817004
data(ToothGrowth)
head(ToothGrowth)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# plot len & supp
ggplot(data=ToothGrowth) +
aes(x=supp, y=len)+
geom_boxplot(aes(fill=supp)) +
facet_grid(cols = vars(dose)) +
ggtitle("Length-Supplement Relation split by dose")
From Figure above, mean seems to be equal for both supp only for dose=2.
# create t test
# perform t test between supp types where dose = 2
t.test(ToothGrowth$len[ToothGrowth$supp=="OJ" & ToothGrowth$dose==2], ToothGrowth$len[ToothGrowth$supp=="VC" & ToothGrowth$dose==2], paired = TRUE)
##
## Paired t-test
##
## data: ToothGrowth$len[ToothGrowth$supp == "OJ" & ToothGrowth$dose == and ToothGrowth$len[ToothGrowth$supp == "VC" & ToothGrowth$dose == 2] and 2]
## t = -0.042592, df = 9, p-value = 0.967
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.328976 4.168976
## sample estimates:
## mean of the differences
## -0.08
# perform t test between supp types where dose != 2
t.test(ToothGrowth$len[ToothGrowth$supp=="OJ" & ToothGrowth$dose!=2], ToothGrowth$len[ToothGrowth$supp=="VC" & ToothGrowth$dose!=2], paired = TRUE)
##
## Paired t-test
##
## data: ToothGrowth$len[ToothGrowth$supp == "OJ" & ToothGrowth$dose != and ToothGrowth$len[ToothGrowth$supp == "VC" & ToothGrowth$dose != 2] and 2]
## t = 4.6042, df = 19, p-value = 0.0001936
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.048852 8.131148
## sample estimates:
## mean of the differences
## 5.59
If we consider Null Hypothesis (H0) to be; mean is almost equal per supp per dose
1- We Fail to reject H0 where dose = 2, as t-value is very small and is equal to -0.042592
2- We reject H0 where dose does not equal to 2, as t-value is large enough and is equal to 4.6042202