Overview

This analysis generated 1000 simulations of 40 observations drawn from exponential distribution with lambda (or rate) = 0.2. Aims of the analysis are to:

  1. Show the sample mean and compare it to the theoretical mean of the distribution.
  2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
  3. Show that the distribution is approximately normal.

Simulations

This simulation generated 1000 sets of 40 observations drawn from exponential distribution with lambda = 0.2.

x<-list()
for (i in c(1:1000)){
set.seed(i)
x[[i]]<-rexp(n = 40,rate = 0.2)
}
head(x,2)
## [[1]]
##  [1]  3.7759092  5.9082139  0.7285336  0.6989763  2.1803431 14.4748427
##  [7]  6.1478103  2.6984142  4.7828375  0.7352300  6.9536756  3.8101493
## [13]  6.1880178 22.1196711  5.2727158  5.1762197  9.3801759  3.2737332
## [19]  1.6846674  2.9423986 11.8225763  3.2094629  1.4706019  2.8293276
## [25]  0.5303631  0.2971958  2.8935623 19.7946643  5.8665605  4.9840648
## [31]  7.1764267  0.1863426  1.6200508  6.6023396  1.0175518  5.1136294
## [37]  1.5087047  3.6260715  3.7577135  1.1751373
## 
## [[2]]
##  [1]  9.32676220  2.02374036  0.73326335  8.65354862  0.44763090
##  [6]  3.33448816  5.37183430  7.55814652  6.57137953  0.78265141
## [11]  3.72559479  6.21672675  3.36889177  7.95152944  5.41118088
## [16]  3.98298611  7.22635194 22.45971167  8.51716149  3.10192144
## [21]  1.78236477  3.44429003  4.16314739  1.67225648  7.95026189
## [26]  0.39885101  2.75478809  5.44750412  0.01969881  3.10397263
## [31] 24.31107653  0.95446782  2.85878913  2.12779864  0.90213118
## [36]  1.34520722  3.69425097  3.97975644  4.67335785 15.61102778

Sample Mean Versus Theoretical Mean

We calculated the 40 sample means from the simulation.

# sample means and variances of each simulation
mean<-c()
for (i in c(1:1000)){
     mean <- c(mean,mean(x[[i]]))
}
head(mean)
## [1] 4.860372 5.199013 5.249615 5.021921 4.593493 3.500695

Since lambda = 0.2, theoretical mean is 1/lambda = 5. To answer the question, We plotted the cumulative means of 1000 simulations.

mns <- c()
count <- 0
sum <- 0
for (i in mean){
     count <- count+1
     sum <- sum+i
     mns <- c(mns,sum*1./count)
}
plot(mns,type = "l",xlab = "Numbers of simulation",ylab = "Cumulative Mean",main = "Cumulative Mean of 1000 Simulations")
abline(a = 5,b = 0)

From the plot, we see that the cumulative mean is getting closer to the theoretical mean (= 5) as more simulations we have. We statistically confirmed the answer by applying t-test.

t.test(x = mean,alternative = "two.sided",mu = 5)$p.value
## [1] 0.9261998

Since p-value > 0.05, we cannot reject the null hypothesis (H0: mean = 5) at 5% level of significances; in other words, the cumulative mean equals the theoretical mean.

Sample Variance Versus Theoretical Variance

We calculated the 40 sample variances from the simulation.

# sample means and variances of each simulation
var<-c()
for (i in c(1:1000)){
     var <- c(var,var(x[[i]]))
}
head(var)
## [1] 23.56906 27.82594 23.78379 33.03514 14.33921 13.43221

Since lambda = 0.2, theoretical variance is (1/lambda)^2 = 25. To answer the question, We plotted the cumulative variances of 1000 simulations.

mns <- c()
count <- 0
sum <- 0
for (i in var){
     count <- count+1
     sum <- sum+i
     mns <- c(mns,sum*1./count)
}
plot(mns,type = "l",xlab = "Numbers of simulation",ylab = "Cumulative Variance",main = "Cumulative Variance of 1000 Simulations")
abline(a = 25,b = 0)

From the plot, we see that the cumulative variance is getting closer to the theoretical variance (= 25) as more simulations we have. We statistically confirmed the answer by applying t-test.

t.test(x = var,alternative = "two.sided",mu = 25)$p.value
## [1] 0.1827943

Since p-value > 0.05, we cannot reject the null hypothesis (H0: var = 25) at 5% level of significances; in other words, the cumulative variance equals the theoretical variance.

Dsitribution

Central limit theorem says that for large \(n\) the distribution of stochastic variables \(Z = (X-mu)/(se/sqrt(n))\), where \(mu\) is mean of estimation and \(se\) is standard error of estimation, asymtotically approaches to standard normal distribution (with zero mean and unit variance).

z<-mean-5
se<-sqrt(var/40)
hist(z/se,prob=TRUE,ylim = c(0,0.5),xlab = "Z",main = "Histogram of Z from 1000 Simulations")
curve(dnorm(x, mean=0, sd=1),add=TRUE)

Figure shows the histogram of Z from 1000 simulations with the standard normal distribution curve. As the figure shows, the stochastic Z from the simulations is approximately distributed like standard normal one.