1. Introduction

This project investigates the exponential distribution in R and compares it with the Central Limit Theorem. The exponential distribution is simulated in R with rexp(n, lambda) where lambda is the rate parameter.

Lambda is set to 0.2 for all of the simulations. The distribution of averages of 40 exponentials is investigated in a thousand simulations.

The project consists of two parts:

2. Simulation

library(ggplot2)

lambda <- 0.2
n <- 40
nrSims <- 1000


set.seed(42)

expDist <- matrix(data=rexp(n * nrSims, lambda), nrow=nrSims)
expDistMean <- data.frame(means=apply(expDist, 1, mean))

2.1. Sample Mean compared to Theoretical Mean Distribution

Expected mean of exponential distribution of rate \(\lambda\): \(\mu= \frac{1}{\lambda}\)

mu <- 1/lambda
mu
## [1] 5

Average sample mean of 1000 simulations of 40 randomly sampled exponential distributions.

avgMean <- mean(expDistMean$means)
avgMean
## [1] 4.986508

Expected mean and Average Sample Mean are almost identical.

2.2. Theoretical Variance vs Sample Variance

Expected standard deviation of Exponential Distribution Rate is:

\(\sigma = \frac{1/\lambda}{\sqrt{n}}\)

sd <- 1/lambda/sqrt(n)
sd
## [1] 0.7905694

Variance of Standard Deviation is: \(Var = \sigma^2\)

Var <- sd^2
Var
## [1] 0.625

\(Var_x\) is the variance of the average sample mean for 1000 simulations of 40 sample exponential distribution

\(\sigma_x\) is the standard deviation

sd_x <- sd(expDistMean$means)
sd_x
## [1] 0.8242282
Var_x <- var(expDistMean$means)
Var_x
## [1] 0.6793521

The standard deviations are almost the same.

2.3 Distribution

Compare normal distribution of population means and standard deviation

Conclusion from the graphic is that the distribution of means of random exponantial distributions is very similar to normal distribution for the lamba used

3. Basic inferential data analysis

3.1 Loading ToothGrowth data + exploratory data analysis

library(ggplot2)
library(datasets)
library(gridExtra)
library(GGally)

data(ToothGrowth)
toothGrowth <- ToothGrowth 
toothGrowth$dose <- as.factor(toothGrowth$dose)

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

3.2 Data summary

str(toothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...
summary(toothGrowth)
##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90
head(toothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
ggplot(data=ToothGrowth, aes(x=as.factor(dose), y=len, fill=supp)) +
    geom_bar(stat="identity") +
    facet_grid(. ~ supp) +
    xlab("Dose(mg)") +
    ylab("Tooth length")

table(toothGrowth$supp, toothGrowth$dose)
##     
##      0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10

Compare tooth growth by supp and dose to validate hypothesis

## [1] -0.1710156  7.5710156
## attr(,"conf.level")
## [1] 0.95
## [1] 0.06063451
## [1] 1.719057 8.780943
## attr(,"conf.level")
## [1] 0.95
## [1] 0.006358607
## [1] 2.802148 9.057852
## attr(,"conf.level")
## [1] 0.95
## [1] 0.001038376
## [1] -3.79807  3.63807
## attr(,"conf.level")
## [1] 0.95
## [1] 0.9638516

3.3 Conclusions

More tooth growth from OJ than VC for doses 0.5 & 1.0.

Same tooth grooth from OJ and VC for doses 2.0.

Overall for all scenarios OJ is not more effective than VC.