Statistical Inference Project

PART 1 PROJECT: A Simulation Exercise In this project, we will simulate a set of numbers under the exponential distribution This is in contrast to a set of numbers that is under the normal distribution

Simulating the numbers

# Simulate exponential distribution in R using the rexp function
# This function takes 2 parameters: n (size of set of numbers)
# and lambda (which is the rate of the exponential distribution)

# Set seed for consistency
set.seed(10)

# Now simulate 40 numbers under the exponential distribution
# with a rate of 0.2
First_Simulation = rexp(40,0.2)
First_Simulation

##  [1]  0.07478203  4.60110602  3.76079469  7.87520925  1.15829308
##  [6]  5.43336502 11.63811436  3.64561911  6.44155052  3.36134143
## [11]  2.13264896  5.57710973  6.58273534  2.06646914  3.38287665
## [16]  8.16491940  0.35597038 12.84425807  8.72359347  1.46475033
## [21]  3.22652580  1.73461201  3.97541315  6.98149591  6.99053713
## [26]  5.55229720  0.85210798  9.59390665  0.83221970  4.85144995
## [31]  0.05285596 13.96504857 11.77305890  3.33619194  2.61121743
## [36]  0.73356449  3.75617153 11.44283106  0.34426784  1.41356577

# Now we dont want just one set of numbers. We want to get 1000 set of these numbers
# We can use the replicate function in R to repeat this simulation 1000 times
Bulk_Simulation = replicate(1000,rexp(40,0.2))
#View(Bulk_Simulation)

# If you view this data set, this consists of 1000 columns, with each column
# containing 40 numbers simulated under the exp distribution with lambda of 0.2

# Lets get the mean of each set of simulation,
# This entails using the colMeans function
Mean_of_each_simulation = colMeans(Bulk_Simulation)
# Mean_of_each_simulation

# The result of this is 1000 numbers with each correseponding to the mean
# of each simulation



# Now I can calculate the Theroretical mean of this distribution by
# taking all the means of each simulation and using that as a variable to
# calculate a mean

MeanOfSample = mean(Mean_of_each_simulation)
MeanOfSample

## [1] 5.046705

# Based on our experiment of several simulations of these set of numbers under this
# distribution, we arrived at 5.04 as the mean

# Now I can visualise the distribution of the mean of each simulation using the hist function
hist(Mean_of_each_simulation)
# I can then add the line that depicts the mean we just calculated
# and that line we highlighted in red
abline(v=MeanOfSample,col='red')

Comparing sample mean and theoretical mean of distribution

# The theoretical mean of an exponential distribution is 1/lambda
lambda = 0.2
TheoreticalMean = 1/lambda
TheoreticalMean

## [1] 5

# The mean of the simulation exercise we did is 5.04 while
# theoretical mean of this distribution is 5. So they are very similar

Comparing sample variance and theoretical variance of distribution

# We have 1000 numbers representing the mean of each simulation
# We can calculate the standard deviation of this distribution
Std_Dev_of_Sample = sd(Mean_of_each_simulation)
Std_Dev_of_Sample

## [1] 0.7995393

# We can also calculate the sample variance
Variance_of_Sample = Std_Dev_of_Sample**2
Variance_of_Sample

## [1] 0.6392631

# The theoretical standard deviation of exponential distribution is
# given by (1/lambda)/sqrt(n)
Theoretical_std_dev = (1/lambda)/sqrt(40)
Theoretical_std_dev

## [1] 0.7905694

Theoretical_variance = Theoretical_std_dev**2
Theoretical_variance

## [1] 0.625

# The variance of simulation exercise we did is 0.64 while
# theoretical variance of this distribution is 0.63. So they are very similar

Show that the distribution is approximately normal

# Lets plot the histogram and overlay that with the histogram line
hist(Mean_of_each_simulation, breaks=40,prob=TRUE)
lines(density(Mean_of_each_simulation),lwd=10,col='red')