key words:Asymptotic, The Central Limit Theorem, Sample Mean vs Theoretical Mean, Sample Variance vs Theoretical Variance, Distributions, Quantiles, Statistical Inference.

Introduction

The project consists of two parts (i) a simulation exercise and a (ii) basic inferential data analysis, the former is presented here. Asymptotics form the basis for frequency interpretation of probabilities, where the behavior of statistics depends on the sample size or some other relevant quantity of limits to infinity or to zero, the Swiss Army Knives of Statistics.

An exponential distribution is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate. A thousand simulations, our samples, each of \(n\) = 40 were run to investigate the asymptotic of an exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution), a discreet case of the Central Limit Theorem,CLT, in R4.0.

The CLT says, for a large \(n\), this normalized variable, \(\frac{\bar X_n - \mu}{\sigma / \sqrt{n}}\) is almost normally distributed with a mean of 0 and variance of 1:

\[ \frac{\bar X_n - \mu}{\sigma / \sqrt{n}} = \frac{\mbox{Estimate} - \mbox{Mean of estimate}}{\mbox{Std. Err. of estimate}}. \]

The hypothesis tested, was that the sampling distribution of exponential distribution has a normal distribution with a mean that matches the population mean and a variance that matches the theoretical result. It was found that for the exponential distribution generated the true mean that matches the population mean and a variance that matches the theoretical result in addition the distributions have similar means at the quantiles: 5%, 25%, 50% , 75% and 95%.

#R4.0 Environmental Set for:
library(knitr) # creating a pdf document ; 
library(ggplot2) # making plots
library(dplyr) #exploring data
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(DataExplorer) # creating reports
## Warning: package 'DataExplorer' was built under R version 4.0.3

Exponential Distribution Simulation:

A discreet case of the CLT, 1000 trials were simulated in with the R function rexp(n, lambda) where lambda, \(\lambda\) , was set to 0.2, this is the rate parameter. The mean & the standard deviation of an exponential distribution is 1/\(\lambda\) , for \(n\) = 40. Data was then explored with glimpse() and summary().

create, explore & plot data A pseudo-random number generator was used so that the exercise is replicable,data was explored with the dplyr and DataExplorer libraries from CRAN.

set.seed(42); 
lambda <- 0.2; 
nosim <- 1000; n <- 40 ; 
SimulationMatrix_1 <- matrix(rexp(nosim * n, rate = lambda), nosim); 
Data_SM1 <- apply(SimulationMatrix_1, 1, mean); 
print("Here is a glimpse of the data:Data_SM1"); glimpse(Data_SM1); 
## [1] "Here is a glimpse of the data:Data_SM1"
##  num [1:1000] 6.94 5.38 5.45 5.5 6 ...
print("and a basic data summary:Data_SM1"); 
## [1] "and a basic data summary:Data_SM1"
summary(Data_SM1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.141   4.406   4.919   4.987   5.504   7.882
#create_report(Data_SM1), the histogram is interesting
plot(Data_SM1)

Mean vs True Mean We can get the mean of the data with mean() and the theoretical mean is 1/\(\lambda\).

mean_Data <-mean(Data_SM1); print("The mean of the data, the true mean: ") ; 
## [1] "The mean of the data, the true mean: "
mean_Data; 
## [1] 4.986508
TheoMean <- 1/lambda ; 
print("is is close to the theoretical mean: "); 
## [1] "is is close to the theoretical mean: "
TheoMean
## [1] 5

After taking a peak at the report generated with create_report(), we see that the histogram has a bell shaped curve it looks like a normal distribution.

#create_report(Data_SM1), the histogram is interesting
hist(Data_SM1)

True variance vs Theoretical variance We can get the variance of the data by using the result of the sd() function squared and the theoretical value is \(\sigma2n\) = 1/\(\lambda^2n\).

Variance_Data <- sd(Data_SM1)^2 ; 
print("The variance of the data is: "); 
## [1] "The variance of the data is: "
Variance_Data
## [1] 0.6793521
TheoVar <- 1/((lambda^2)*n); 
print("is close to the theoretical variance: "); 
## [1] "is close to the theoretical variance: "
TheoVar
## [1] 0.625

Comparing Distributions The easiest way to compare the data vs theoretical distributions would be to use the qnorn() function to get the normal distributions and quantile() to get the quantiles of the data then compare the true vs theoretical values.

# First lets create a vector of useful quantiles
Quantiles_DatavsTheo <-c(0.05, 0.25, 0.5, 0.75, 0.95);
print("The distribution of quantiles, for the data:"); 
## [1] "The distribution of quantiles, for the data:"
quantile(Data_SM1,  Quantiles_DatavsTheo); 
##       5%      25%      50%      75%      95% 
## 3.750774 4.406170 4.919282 5.504413 6.485176
print("is close to the theoretical quantiles:"); 
## [1] "is close to the theoretical quantiles:"
qnorm(Quantiles_DatavsTheo, mean = mean(Data_SM1), sd = sd(Data_SM1))
## [1] 3.630774 4.430575 4.986508 5.542442 6.342243

Summary So we see that for the exponential distribution generated above the true mean matches the population mean and it has a variance that matches the theoretical result and that the distribution has similar means at the quantiles: 5%, 25%, 50% , 75% and 95% as the theoretical means at the same quantiles.

After-note: