To compare the exponential distribution in R, i.e rexp(n,lambda) where n is the number of observations and lambda is the rate parameter, we will set a seed (for reproducibility) and take 1000 simulations of 40 draws from the distribution. We’ll then explore the the sample statistics (mean and variance) and how they compare to the known theoretical statistics (1/lambda, and [1/lambda]^2, respectively).
Here we set the seed and create our data and also add two columns, the mean and variance of each row (in the original 1000x40 dataset).
set.seed(444) #for reproducibility
lambda <- .2
data = matrix(data = NA, nrow = 1000, ncol = 40) #an empty matrix, 1000x40
for( i in 1:1000) data[i,] <- rexp(40,lambda) #each row is 40 obs from rexp
datameans <- apply(data,1,mean) #for each row, get the mean
datavars <- apply(data,1,var)
data <- cbind(data, datameans,datavars)
data <- as.data.frame(data)
ggplot(data = data, aes(data$datameans)) + geom_histogram(
breaks = seq(0,10, by=.25),col = "red", fill="blue", alpha = .5) +
labs(title = "Histogram of Means") +
labs( x = "Sample Mean", y = "Frequency")
This histogram shows a clear centering around 5, which fits our theoretical mean of 1/lambda, i.e 1/.2 = 5. It is approximately normally distributed as it is symmetric around the mean with no skew.
tbl <- rbind(1/lambda, mean(datameans), (1/lambda)-mean(datameans),
100*((1/lambda)-mean(datameans))/(1/lambda))
colnames(tbl) <- c("Mean")
rownames(tbl) <- c("Theoretical","Sample Estimate","Difference","Difference as %")
tbl
## Mean
## Theoretical 5.00000000
## Sample Estimate 4.96354641
## Difference 0.03645359
## Difference as % 0.72907188
It is clear from the table that the sample mean is an extremely close approximation of the theoretical mean when given 1000 samples of 40 observations. It is only 0.72% away, a magnitude of .0364 away.
ggplot(data, aes(data$datavars)) + geom_histogram(
col = "blue", fill = "green",alpha=.5) +
labs(title = "Histogram of Variances")+
labs(x = "Sample Variances",y="Frequency")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This distribution is skewed, it looks similar to a chi-squared distribution which we did not discuss in this class. Nonetheless, it’s mean is approximately equal to our theoretical mean as shown in the table below.
theoreticalvar <- (1/lambda)^2
vartbl <- rbind(theoreticalvar, mean(datavars),theoreticalvar - mean(datavars),
100*(theoreticalvar - mean(datavars))/theoreticalvar)
colnames(vartbl) <- c("Variance")
tbl2 <- cbind(tbl,vartbl)
tbl2
## Mean Variance
## Theoretical 5.00000000 25.000000
## Sample Estimate 4.96354641 24.536923
## Difference 0.03645359 0.463077
## Difference as % 0.72907188 1.852308
The mean sample variance is approximately equal to the theoretical variance sd^2, i.e. (1/lambda)^2 or 25 for lambda = 0.2 The difference is only .46, a less than 2% difference from the theoretical variance. A high number of samples would converge to the theoretical variance.