This report investigates the distribution of averages of 40 exponential(0.2) over 1000 simulations.
The first step is to create a dataset with 1000 rows and 40 columns. Each row has 40 exponentials for a single simulation.
set.seed(1)
n<-40
numsim<-1000
lambda<-0.2
dataset<-matrix(rexp(n*numsim,lambda),numsim)
dim(dataset)
## [1] 1000 40
The histogram below show the distibutuon of the means. The blue line shows the sammple mean.
RowM <- rowMeans(dataset)
mean<-as.numeric(mean(RowM))
hist(RowM, col=rainbow(n))
abline(v = mean, col = "green", lwd = 2, label="data mean")
## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...):
## "label" is not a graphical parameter
meanT<-as.numeric(1/0.2)
mean
## [1] 4.990025
meanT
## [1] 5
The sample mean is 5 and the theory mean is 4.99. The difference between the means is very small.
std<-sd(RowM)
var<-var(RowM)
var
## [1] 0.6177072
stdT<-((1/lambda) * (1/sqrt(n)))
varT<-stdT^2
varT
## [1] 0.625
The sample mean is 0.62, where as the theoretical mean is 0.63. Again, there is very little difference between the two.
The plot below show the distribution of the sample. The green vertical line shows the sample mean. This line almost overlaps the vertical blue line which is the value of the theory mean.
Similarly, the green curve is the curve obtained by ploting the sample mean and standard deviarion. The blue curve is plotted using the theoretical mean and standard deviation.
library(ggplot2)
#convert to dataframe
RowMean<-apply(dataset,1,mean)
dfRowM<-data.frame(RowM)
nd<-ggplot(dfRowM,aes(x=RowM))
nd<-nd+geom_histogram(binwidth = lambda,fill="orange",color="black",aes(y = ..density..))
nd<-nd + labs(title="Density of 40 Numbers from Exponential Distribution", x="Mean of 40 Selections", y="Density")
nd<-nd+geom_vline(xintercept=mean, size=1.5,colour="green", linetype=1, show_guide=TRUE)
nd<-nd + stat_function(fun=dnorm,args=list(mean=mean, sd=std),color = "green", size = 1.5)
nd<-nd+geom_vline(xintercept=meanT, size=1.0,colour="blue", linetype=1, show_guide=TRUE)
nd<-nd + stat_function(fun=dnorm,args=list(mean=meanT, sd=stdT),linetype=1,color = "blue", size = 1.0)
nd