This report will examine the relationships between population mean,variance vs sample mean,variance with the help of computer simulation. The analysis will be done and results will be discussed in the light of Central Limit Theorem.
In our analysis we’ll have sample dataset of 40 points from an exponential distribution. Exponential distribution has a lambda value of 0.2 Exponential distribution is a continuous function with mean and std deviation of 1/lambda.
1-) Let’s set the simulation parameters
# set the initial parameters for simulation for exponential distribution
lambda <- 0.2
# n is the sample size
n <-40
# n_sim is the number of simulations
n_sim <-1000
2-) Create a matrix of 1000 simulations each have 40 samples from exponential distribution with lambda 0.2
sim_data_linear <- rexp(n*n_sim,lambda)
sim_data <- matrix(ncol = n,sim_data_linear)
# eachrow is one sample set of 40
dim(sim_data)
## [1] 1000 40
Questions :
Q1) Show the sample mean and compare it to the theoretical mean of the distribution.
sample_mean <- apply(X = sim_data,FUN = mean,MARGIN = 1)
# Just have a quick look how sample mean is distributed
summary(sample_mean)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.910 4.478 4.984 5.007 5.537 7.580
# Below is the theoretical mean of exponential distribution
theoretical_mean <- 1/lambda
print(theoretical_mean)
## [1] 5
As seen above, mean of sample means are very close to 5 which equals to population mean. This is expected since Central Limit Theorem suggests this.
Q2) Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
# Below is the variance of the sample mean
var_of_sample_mean <- var(sample_mean)
print(var_of_sample_mean)
## [1] 0.6023958
# By formula we can calculate the theoretical variance of sample mean
theor_var_of_sample_mean <- ((1/lambda)^2)/n
print(theor_var_of_sample_mean)
## [1] 0.625
As Central Limit Theorem suggests variance of sample means is equal to theoretical variance/sample size = (1/lambda)^2/n. Variance of sample means therefore equals to 0.625 and by our experiment we found almost the same value.
Q3) Show that the distribution is approximately normal.
# Plot histogram of sample means
hist(sample_mean,breaks = 40)
As Central Limit Theorem also suggests histogram of all sample means create a new Normal random variable X, where X has a mean equals ~ to population mean and has a standard deviation of population standard deviation/sqrt(sample size).
Below is the original dataset we got 40000 samples, look at the histogram of exponential distribution. It is not Gaussian but as we saw above sample means of this data has a Gaussian distribution.
#Notice that the sample data (40000 values) from exponential distribution has a skewed histogram
hist(sim_data_linear,breaks = 40)