R is well equipped with functionality for simulating, modeling and analyzing data. This is a basic simulation of exponential data, and then comparing that data with a theoretical model.
We start by simulating 1000 trials with 40 observations each. This is accomplished using a for loop and the rexp() function. The means and standard deviations for each trial are saved in respective data frames.
lambda <- 0.2
n <- 40
N <- 1000
## containers for collected sample mean and sample standard deviations
xbars = numeric(N)
sds = numeric(N)
for (i in 1:N) {
set.seed(i);makesample = rexp(n, lambda)
xbars[i] = mean(makesample)
sds[i] = sd(makesample)
}
The raw data as a scatter plot comes out of the simulation looking like figure 1. We can see that the data becomes more and more concentrated around 0 which indicates an skewed distribution.
The Exponential distribution can be visualized by using a histogram to model the probability density.
In this section we evaluate the simulated exponential data against the theoretical model.
Using the data we can calculate the simulated and theoretical means.
sample_mean = mean(xbars)
sample_mean
## [1] 5.002327
theor_mean = 1/lambda
theor_mean
## [1] 5
hist(xbars, 22, main = "Figure 2 - Distribution of Sample Means")
abline(v=5,lw=4,col="red")
The results show that the sample mean is off by only .011. In the figure below we can further see that the distirbution of the means is approximately normal.
Next we calculate the standard deviations and variances.
sample_var <- var(xbars)
sample_var
## [1] 0.6308244
theor_var <- (1/lambda^2)/n
theor_var
## [1] 0.625
The variances match up closely. Below is a chart comparing theoretical standard deviation distribution versus the standard deviation of the sample data.
Finally, we can test the normality using a Normal Probability (“q-q”) plot.