One of the great advantages of using R, is the ability to simulate samples from various probability distributions and statistical models as real simulations is computationally intensive.In this report we are going analyse the exponential distribution which are simulated using with rexp(n, lambda) where lambda\(\lambda\) is the rate parameter and its mean and standard deviation is \(1/\lambda\) with averages of 40 exponential(0.2)s.
In this project we are going to illustrate the distribution of the mean of 40 exponentials by answering the below questions:
1. Show the sample mean and compare it to the theoretical mean of the distribution.
2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
3. Show that the distribution is approximately normal.
First get the means of the 1000 simulations where each simulation will contain 40 observations and compare them against the theoretical mean.
Generate the sample data first using lambda = 0.2, n = 40, simulations = 1000.
set.seed(284)
means <- data.frame(x = sapply(1:numsim, function(x) {mean(rexp(n, lambda))}))
Now calculate sample mean of n=1000 and theoretical mean of exponential distribution
sample_mean <- mean(means$x)
theor_mean <- 1/lambda
We found the simulation mean of 1000 sample is 4.969464 which is very close to the theoretical mean of 5.
From the histogram we can prove that the mean and sample and theoretical mean is very close to each other.
Next We will compare the variance present in the sample means of the 1000 simulations to the theoretical variance of the population. The variance of the sample means estimates the variance of the 1000 entries in the means vector times the sample size, 40. That is, Ï2=Var(samplemeans)ÃN.
Like mean comparison , the variance of sample mean value is 0.5905787 is also very close to the theoretical variance of the distribution is \(\sigma^2 / n = 1/(\lambda^2 n) = 1/(0.04 \times 40)\) =0.625.
As the CLT states that when the number of sample sizes increases then the distribution of averages of iid variables (properly normalized) becomes that of a standard normal,hence We compare the difference between the distribution of a large collection of random exponentials with the distribution of a large collection of averages of 40 exponentials by plotting
First we generate sample data containing 10,000 simulations each of sample size 40 using lambda = 0.2 as below. Then compare the distribution by plotting them in histogram as show in figure below
set.seed(284)
bignumsim= 10000
bigmeans <- data.frame(x = sapply(1:bignumsim, function(x) {mean(rexp(n, lambda))}))
It is clear from the curve line, that sample distribution is approximately normal as the distribution of mean of random sampled exponantial distributions, as it overlaps very closely with the normal distribution for \(\lambda=0.2\)
For \(\lambda=0.2\) ,the sample distribution is approximately normal with Mean and Variance value match closely with theoritical mean \(\mu= \frac{1}{\lambda}\) and variance \(Var = \sigma^2\). Also We notice that as when we increase the sample size, the distribution of means follow a bell curve as of normal distribution sample and conclude that it is approximately normal.