Investigation of the exponential distribution in R and comparisation with the Central Limit Theorem. The exponential distribution will be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda, lambda is 0.2 for all of the simulations. The distribution of averages include 40 exponentials and the investigation include thousand simulations.
lam <- 0.2
n <- 40
num <- 1:1000
set.seed(1)
means <- data.table(x = sapply(1:1000, function(x)
{mean(rexp(40, 0.2))}))
ggplot(data=means, aes(means$x)) +
geom_histogram()
Meansmpl <- mean(means$x)
Mean <- 1/lam
Meansmpl
## [1] 4.990025
Mean
## [1] 5
Varsmpl <- var(means$x)
Sd <- (1/lam)^2/n
Varsmpl
## [1] 0.6111165
Sd
## [1] 0.625
ggplot(data=means, aes(means$x)) +
geom_histogram(aes(y=..density..))+
geom_density(color = "red", size = 1)+ #normal distribution
labs(x="Mean")
The first step is loading of the ToothGrowth data and performing some basic exploratory data analyses. The second step is: Provide a basic summary of the data. The next step is comparisation the tooth growth by supp and dose.
tg <- ToothGrowth
str(tg)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
unique(tg$dose)
## [1] 0.5 1.0 2.0
unique(tg$supp)
## [1] VC OJ
## Levels: OJ VC
unique(tg$len)
## [1] 4.2 11.5 7.3 5.8 6.4 10.0 11.2 5.2 7.0 16.5 15.2 17.3 22.5 13.6 14.5
## [16] 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5 23.3 29.5 17.6 9.7 8.2
## [31] 9.4 19.7 20.0 25.2 25.8 21.2 27.3 22.4 24.5 24.8 30.9 29.4 23.0
s <- summary(tg)
s
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
ggplot(tg,aes(x=factor(dose),y=len,fill=factor(dose))) +
geom_boxplot(notch=F) +
facet_grid(.~supp) +
scale_x_discrete("Dose") +
scale_y_continuous("Tooth growth") +
scale_fill_discrete(name="Dose (mg)") +
ggtitle("Comparisation of tooth growth by supp and dose") +
geom_jitter(width=0.1,alpha=0.2)
fac_cols <- sapply(tg, is.factor) # Identify all factor columns
tg[fac_cols] <- lapply(tg[fac_cols], as.character)
dos_0.5 <- tg %>%
filter(dose==0.5)
dos_1 <- tg %>%
filter(dose==1)
dos_2 <- tg %>%
filter(dose==2)
t05 <- t.test(len ~ supp,
data = dos_0.5,
var.equal = FALSE)
t05
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
t1 <- t.test(len ~ supp,
data = dos_1,
var.equal = FALSE)
t1
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
t2 <- t.test(len ~ supp,
data = dos_2,
var.equal = FALSE)
t2
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
tstat_sum <- data.frame(
"p-value" = c(t05$p.value, t1$p.value, t2$p.value),
"con_interval_low" = c(t05$conf.int[1],t1$conf.int[1], t2$conf.int[1]),
"con_interval_high" = c(t05$conf.int[2],t1$conf.int[2], t2$conf.int[2]),
row.names = c("dose_05","dose_1","dose_2"))
tstat_sum
## p.value con_interval_low con_interval_high
## dose_05 0.006358607 1.719057 8.780943
## dose_1 0.001038376 2.802148 9.057852
## dose_2 0.963851589 -3.798070 3.638070
The sum of the statistic shows the following: 1) The null hypothesis: there is a difference in tooth growth according to the method of administration. 2) We observe p-values: At the two doses of 0.5 and 1, the p-value is below the threshold value of 0.05. So we are not rejecting the null hypothesis, claiming that the method of administration plays a role. 3) We observe p-values above the threshold of 0.05 and the confidence levels. We reject the null hypothesis with 95% certainty. 4) At a dosage of 2 milligrams / day, the p-value is higher than the threshold value of 5%. The method of administration does not matter in this case.