In this project, I will investigate the exponential distribution in R and compare it with the Central Limit Theorem, which including the mean of exponential distribution and the standard deviation. And I will investigate the distribution of averages of 40 exponentials.
The exponential distribution can be simulated in R with rexp(n, lambda)
where lambda
is the rate parameter. Set lambda = 0.2
for all of the simulations. n is the size of the sample.
We first create a function named meanplot to plot the umulative mean with respect to the number of observations. then run the code meanplot(n,lambda)
. The red dash line in the picture is the theoretical mean.
lambda = 0.2
m =10000
n=40
Meanplot<- function(m,n,lambda){
mns = NULL
for (i in 1 : m) mns = c(mns, mean(rexp(n,lambda)))
hist(mns, xlab = "means of n obs", probability = T)
abline(v=5,lwd=2,col="red",lty=2)
abline(v= mean(mns),lwd=2,col="blue")
}
Meanplot(m,n,lambda)
The red dashed line is the theoretical center, the blue line is the distribution center. it is pretty close.
lambda = 0.2;n=1000
meanplot<- function(n,lambda){
x=1:n; y= rexp(n, lambda)
ymean=NULL
for (i in 1 : n) ymean[i] = mean(y[1:i])
plot(x,ymean,type = "l", ylab = "Cumulative mean", xlab = "Number of obs")
abline(h=5,lwd=2,col="red",lty=2)
}
meanplot(n,lambda)
Or we could try this using ggplot2 package.
library(ggplot2)
lambda = 0.2;n=1000
# Put initialization code in this file.
meanPlot <- function(n){
means<- cumsum(rexp(n,lambda))/(1:n)
g <- ggplot(data.frame(x = 1 : n, y = means), aes(x = x, y = y))
g <- g + geom_hline(size=1.5 ,yintercept = 5,alpha=0.6,
linetype="longdash") + geom_line(size = 1)
if(n<100){
g <- g + geom_point(colour="red",size=3,alpha=0.8)
}
g <- g + labs(x = "Number of obs", y = "Cumulative mean")
g <- g + scale_x_continuous(breaks=seq(0,n+1,ceiling(n/10)))
print(g)
invisible()
}
meanPlot(n)
From the picture, we can see that the variance of sample is asymptotic to the theoretical mean 5 as sample size n getting bigger.
We first create a function named varianceplot to plot the sample variance with respect to the sample size n. then run the code varianceplot(n,lambda)
.
lambda = 0.2;n=1000
varianceplot<- function(n,lambda){
x=1:n; y= rexp(n, lambda)
yvar=NULL
for (i in 1 : n) yvar[i] = sqrt(var(y[1:i]))
plot(x,yvar,type = "l", ylab = "standard deviation of the sample", xlab = "sample size")
abline(h=5,lwd=2,col="red",lty=2)
}
varianceplot(n,lambda)
From the picture, we can see that the variance of sample is asymptotic to the theoretical variance as sample size n getting bigger.
lambda = 0.2
m =10000
n=40
Varplot<- function(m,n,lambda){
mns = NULL
for (i in 1 : m) mns = c(mns, sd(rexp(n,lambda)))
hist(mns, xlab = "standard deviation of obs", probability = T)
abline(v=5,lwd=2,col="red",lty=2)
}
Varplot(m,n,lambda)
From the picture, the variance of 40 observations is center at the theoretical variance of the red dashed line.
We first create a function named meanhist to plot the sample mean with of m simulations. then run the code mean(m,n,lambda)
. The red dash line in the picture is the theoretical standard deviation.
lambda = 0.2
m =10000
n=40
meanhist<- function(m,n,lambda){
mns = NULL
for (i in 1 : m) mns = c(mns, mean(rexp(n,lambda)))
mns = (mns-1/lambda)*lambda*sqrt(n)
hist(mns, xlab = "means of n obs", probability = T)
}
meanhist(m,n,lambda)
x<- seq(-4,4,length = n)
y<- dnorm(x)
lines(x,y,type = "l", col ="red")
abline(v=0,lwd=4, col = "blue")
# we could create a interactive object to see the distribution of the sample mean.
# library(manipulate)
# manipulate(meanhist(m,n,lambda), n= slider(1,1000, initial = 40, step = 10))
From the picture, the red line is the standard normal distribution, we can see that the mean of sample is close to a normal distribution.