This is a class project for Statistical Inference Course in Data Science Certification Program through John Hopkins University. This project conducts thousands of simulations of data of exponential distribution in R and assess how their means and variance compare with the theoretical ones. The second half of the project analyzes the ToothGrowth data and performs some basic exploratory analyses.
This project investigates the exponential distribution in R and compares it with the Central Limit Theorem. The exponential distribution was simulated in R using ‘rexp (n, lambda)’. If you are not sure, exponential distribution has a mean and standard deviation of 1 over lambda (1/ʎ). Lambda was set to (0.2) for all the simulations. This project investigates the distribution of averages of 40 exponentials using a thousand of simulations. This project attempted to answer following questions: 1. Does the sample mean vary from the theoretical mean? 2. Calculate the sample variance and compare it with the theoretical variance of the distribution. 3. Check if the distribution is approximately normal. The answer to question 3 should focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.
set.seed(-123)
n<-40
lambda<- 0.2
S<-1000
z<-1.96
mydata<-matrix(rexp(n*S,lambda),nrow=S)
str(mydata)
## num [1:1000, 1:40] 3.28 6.62 1.43 2.97 14.94 ...
Row_Mean<-rowMeans(mydata)
summary(Row_Mean)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.872 4.439 4.967 5.019 5.500 8.183
MeanOfMean<-mean(Row_Mean)
MeanOfMean
## [1] 5.018855
theoreticalMean<-1/lambda
theoreticalMean
## [1] 5
sampleVar<-var(Row_Mean)
sampleVar
## [1] 0.7073296
theoreticalVar<-(1/lambda)^2/(n)
theoreticalVar
## [1] 0.625
par(bg='grey')
hist(Row_Mean,
main="Histogram of Sample Data Distribution",
xlab="Mean",
xlim=c(2,8),
col="darkmagenta",
freq=FALSE
)