PART 1 PROJECT: A Simulation Exercise In this project, we will simulate a set of numbers under the exponential distribution This is in contrast to a set of numbers that is under the normal distribution
Simulating the numbers
# Simulate exponential distribution in R using the rexp function
# This function takes 2 parameters: n (size of set of numbers)
# and lambda (which is the rate of the exponential distribution)
# Set seed for consistency
set.seed(10)
# Now simulate 40 numbers under the exponential distribution
# with a rate of 0.2
First_Simulation = rexp(40,0.2)
First_Simulation
## [1] 0.07478203 4.60110602 3.76079469 7.87520925 1.15829308
## [6] 5.43336502 11.63811436 3.64561911 6.44155052 3.36134143
## [11] 2.13264896 5.57710973 6.58273534 2.06646914 3.38287665
## [16] 8.16491940 0.35597038 12.84425807 8.72359347 1.46475033
## [21] 3.22652580 1.73461201 3.97541315 6.98149591 6.99053713
## [26] 5.55229720 0.85210798 9.59390665 0.83221970 4.85144995
## [31] 0.05285596 13.96504857 11.77305890 3.33619194 2.61121743
## [36] 0.73356449 3.75617153 11.44283106 0.34426784 1.41356577
# Now we dont want just one set of numbers. We want to get 1000 set of these numbers
# We can use the replicate function in R to repeat this simulation 1000 times
Bulk_Simulation = replicate(1000,rexp(40,0.2))
#View(Bulk_Simulation)
# If you view this data set, this consists of 1000 columns, with each column
# containing 40 numbers simulated under the exp distribution with lambda of 0.2
# Lets get the mean of each set of simulation,
# This entails using the colMeans function
Mean_of_each_simulation = colMeans(Bulk_Simulation)
# Mean_of_each_simulation
# The result of this is 1000 numbers with each correseponding to the mean
# of each simulation
# Now I can calculate the Theroretical mean of this distribution by
# taking all the means of each simulation and using that as a variable to
# calculate a mean
MeanOfSample = mean(Mean_of_each_simulation)
MeanOfSample
## [1] 5.046705
# Based on our experiment of several simulations of these set of numbers under this
# distribution, we arrived at 5.04 as the mean
# Now I can visualise the distribution of the mean of each simulation using the hist function
hist(Mean_of_each_simulation)
# I can then add the line that depicts the mean we just calculated
# and that line we highlighted in red
abline(v=MeanOfSample,col='red')
# The theoretical mean of an exponential distribution is 1/lambda
lambda = 0.2
TheoreticalMean = 1/lambda
TheoreticalMean
## [1] 5
# The mean of the simulation exercise we did is 5.04 while
# theoretical mean of this distribution is 5. So they are very similar
# We have 1000 numbers representing the mean of each simulation
# We can calculate the standard deviation of this distribution
Std_Dev_of_Sample = sd(Mean_of_each_simulation)
Std_Dev_of_Sample
## [1] 0.7995393
# We can also calculate the sample variance
Variance_of_Sample = Std_Dev_of_Sample**2
Variance_of_Sample
## [1] 0.6392631
# The theoretical standard deviation of exponential distribution is
# given by (1/lambda)/sqrt(n)
Theoretical_std_dev = (1/lambda)/sqrt(40)
Theoretical_std_dev
## [1] 0.7905694
Theoretical_variance = Theoretical_std_dev**2
Theoretical_variance
## [1] 0.625
# The variance of simulation exercise we did is 0.64 while
# theoretical variance of this distribution is 0.63. So they are very similar
# Lets plot the histogram and overlay that with the histogram line
hist(Mean_of_each_simulation, breaks=40,prob=TRUE)
lines(density(Mean_of_each_simulation),lwd=10,col='red')