Statistical Inference Part 1

Project Description

In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Setting the parameter

n <- 40
lambda <- 0.2
simulation <- 1000

Calculation of theoretical values

mean_t = 1/lambda
sd_t = ((1/lambda) * (1/sqrt(n)))
var_t = sd_t^2

Calculation of actual values

data <- matrix(rexp(n*simulation, lambda), simulation)
row_means <- apply(data,1,mean)
mean_act <- mean(row_means)
sd_act <- sd(row_means)
var_act <- var(row_means)

Question

1. Show where the distribution is centered at and compare it to the theoretical center of the distribution.

The actual distribution is centered at 5.0450596 while the theoretical distribution is centered at 5

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

The actual variance is 0.6541736 while the theoretical variance is 0.625
The actual standard deviation is 0.80881 while the theoretical standard deviation is 0.7905694

Variable	Theoretical Value	Actual Value
Mean	5.0450596	5
Standard Deviation	0.80881	0.7905694
Variance	0.6541736	0.625

3. Show that the distribution is approximately normal.

dfrow_means<-data.frame(row_means) 
pp<-ggplot(dfrow_means,aes(x=dfrow_means))
pp<-pp+geom_histogram(binwidth = lambda,fill="pink",color="black",aes(y = ..density..))
pp<-pp + labs(title="Density of 40 Numbers from Exponential Distribution", x="Mean of 40 Selections", y="Density")
pp<-pp + geom_vline(xintercept=mean_act,size=1.0, color="black") # actual mean line
pp<-pp + stat_function(fun=dnorm,args=list(mean=mean_act, sd=sd_act),color = "blue", size = 1.0)
pp<-pp + geom_vline(xintercept=mean_t,size=1.0,color="yellow",linetype = "longdash")
pp<-pp + stat_function(fun=dnorm,args=list(mean=mean_t, sd=sd_t),color = "green", size = 1.0)
pp

## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.

Black line is the actual mean
Yellow dotted line is the theoretical mean
Green curve is the theoretical standard deviation
Blue curve is the actual standard deviation

The plot shows that Central Limit Theory works by trying to shape the actual data to follow the normal curve.