Statistical Inferance

Introduction

In probability theory and statistics, the exponential distribution (a.k.a. negative exponential distribution) is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate.

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter and n is the no of observation.The exponential distribution with rate lambda has density f(x)= \(??e?????x\) , for x ??? 0. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. I will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Objectives

Show the sample mean and compare it to the theoretical mean of the distribution.
Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
Show that the distribution is approximately normal.

Solution for objective 1 and 2

lambda <-0.2
sampleSize<-40
nSamples<-1000

Make a data frame with the sampled exponential distribution data

set.seed(54321) # Set seed for reproducibility
expoDist <- replicate(n = nSamples, expr = rexp(n = sampleSize, lambda))
colnames(expoDist) <- paste0("sample", 1:1000)

Calculate theoretical and sample Mean.

theoreticalMean <-1/lambda
sampleMean <- round(mean(expoDist),3)

Calculate theoretical and sample Standerd deviation.

theoreticalSD<- round((1/lambda)*(1/sqrt(sampleSize)),3)
sampleSD<- round(sd(colMeans(expoDist)),3)

Calculate theoretical and sample Standerd Variances.

theoreticalVariance<- round((theoreticalSD^2),3)
sampleVariance<- round((sampleSD^2),3)

stat table

Variable | Theoretical | Sample

Mean |5 |4.99

SD |0.791|0.776

Variencce |0.626 | 0.602

Answer to objective 1.

From the above stat we can see that the Mean of simulation sampling distribution (4.99) is virtually same as the theoritical mean(5).

Answer to objective 2.

Similarlly, From the above stat we can see that the Standerd Deviation (0.776) and Variance(0.602) of simulation sampling distribution is virtually same as the theoritical Standerd Deviation (0.791) and Variance(0.626).

Answer to objective 3.

Exponancial Distribution Plot

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.2.3

qplot(as.vector(expoDist),binwidth=1/5)+geom_histogram(color='red',fill='green')+
  xlab("value")+ylab("Count")+ggtitle("Exponancial Distribution")

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Mean distribution plot

mean_df<- data.frame(means=colMeans(expoDist))

head(mean_df)

##            means
## sample1 4.970664
## sample2 4.798634
## sample3 5.008839
## sample4 4.329191
## sample5 4.996757
## sample6 4.676697

ggplot(data=mean_df,aes(means))+
  geom_histogram( aes(y=..density..),fill = "steelblue", 
                       binwidth = 1/7, color = "black", alpha = 1/3) +
  geom_density(aes(color="Means Distribution"),size=1,show_guide=FALSE)+
  stat_function(fun=dnorm, arg=list(mean=theoreticalMean, sd=theoreticalSD), 
                      aes(color = "Normal distribution"), size = 1)+
  geom_vline(aes(xintercept=sampleMean, colour="Sample mean"), size = 1.5)+
        geom_vline(aes(xintercept=theoreticalMean,colour = "Theoretical mean"), 
                   size = 1, linetype="dashed")+theme(legend.justification=c(1,0), legend.position=c(1,0.5))+
  scale_color_discrete(name ="Compared Parameters")+
        geom_rug(col = "darkred", alpha = 0.1)+
  labs(title="Mean exponencial Distribution",x="mean of exponencial (0.2)")

The tow ploat shows that the simulated Exponencial destribution has a Nagative Charecteristic. But the mean of the simulation has nearly perfect Normanl Distributon.

This can also be varified by ** QQ-PLOT**

qqnorm(colMeans(expoDist),col = "steelblue")
qqline(colMeans(expoDist),col = "red")