In this project, we will explore the exponential function in R. We will investigate the distribution of averages of exponential functions, taking a sample size of 40 and running 1000 simulations using the rexp() function. We will compare the theoretically expected value and variance of the distribution with the results of our simuations, and investigate whether the distribution is approximately normal.
The general exponential function can be represented as \(f(x) = a^x\), where \(a\) is any value greater than 0. For more information on the exponential function please go to Appendix - Section A.
The expected value of the distribution of averages of samples of exponential functions (i.e., the theoretical mean of the sampling distribution) is equal to the population mean(\(\mu\)). For an exponential function with \(rate = \lambda = 0.2\), the mean is \(\mu = 1/\lambda\).
Thus, the theoretical mean of the distribution is 5.
Now, let’s take the mean of the averages of 1000 samples of 40 exponential functions (i.e., the sample mean of the sampling distribution). We will perform the simulation in R.
The sample mean is 4.97.
As we expected, the value of sample mean is very close to the value of the theoretical mean.
We can illustrate this by simulating the above distribution, and comparing its sample mean and the theoretical mean.
Population Variance vs. Sample Variance:
The sample variance is considered to be an unbiased estimator of the population variance. This means that the distribution of the variances of multiple samples will be centred at the population variance.
In order to test this, let’s compare the population variance(\(\sigma^2\)) of an exponential function with \(rate = \lambda = 0.2\) and \(\sigma = 1/\lambda\). The population variance comes to \(\sigma^2 = 25\).
Now, for the sample variance, let’s take the average of 1000 variances of 40 exponential functions. This comes to \(s^2 = 25.14\).
Thus, the sample variance is centred very close to the population variance, which is as we expected. Let’s illustrate the distribution of the sample variances, and the population and sample variances, in the form of a plot.
Theoretical Variance vs. Sample Variance of the Sampling Distribution:
The theoretical variance of the sampling distribution is equal to the population variance divided by the sample size(\(\sigma^2/n\)). For an exponential function with \(rate = \lambda = 0.2\) and \(\sigma = 1/\lambda\), the theoretical variance comes to 0.625.
Now, let’s take the variance of the averages of 1000 samples of 40 exponential functions (i.e., the sample variance of the samping distribution). We will perform the simulation in R.
The sample variance is 0.6331.
As we expected, the value of the sample variance of the sampling distribution is very close to the value of the theoretical variance.
Yes, it does. We will prove this first using the CLT, and then by using a plot.
The Central Limit Theorem:
“The Central Limit Theorem (CLT) is one of the most important theorems in statistics…the CLT states that the distribution of averages of iid (independent and identically distributed) variables becomes that of a standard normal as the sample size increases.” - Dr. Brian Caffo, Statistical Inference For Data Science2
We have used the distribution of averages of iid exponential functions. So far so good. But is our sample size of 40 enough for the distribution to approximate a normal distribution?
To answer this question, let’s turn to another authority on the CLT.
General statistical practice is to assume that, for most applications, the sampling distribution of \(\overline{x}\) can be approximated by a normal distribution whenever the sample size is 30 or more. - Anderson, Sweeney & Williams, Stastistics For Business And Economics3
Given that 40 > 30, we can safely confirm that the distribution of 1000 averages of 40 exponentials approximates a normal distribution.
Density Plot:
In order to confirm that our distribution is indeed normal, we will compare the density plots of our distribution (1000 samples, sample size = 40, means of exponential functions) with a normal distribution (1000000 random normals). For both distributions, \(mean = \mu = 5\), and variance = \(\sigma^2/n = 0.625\). The density plot is as follows - it affirms the fact that our distribution is approximately normal.
The general exponential function can be represented as \(f(x) = a^x\), where \(a\) is any value greater than 0.
The properties of the exponential function depend on the value of \(a\):
Other important properties of the exponential function are:
Theoretical Mean of the Sampling Distribution:
lambda = 0.2
theomean = 1/lambda
Sample Mean of the Sampling Distribution:
###Sample Mean of the Sampling Distribution of Exponential functions
#1000 simulations with sample size = 40
nosim <- 1000
n <- 40
lambda <- 0.2
#Simulate mean of 1000 averages of 40 exponentials
sampmean40 <- round(mean(apply(matrix(rexp(nosim * n, rate = lambda), nosim), 1, mean)),2)
Distribution of 1000 means of samples of 40 exponentials:
###Plotting distribution of 1000 means of samples of 40 exponentials...
###...with a comparison of the theoretical and sample means
#Install and Load the tidyverse set of packages
#install.packages("tidyverse") (#Remove comment sign if already installed)
library(tidyverse)
#Plot the histogram
x1 <- matrix(rexp(nosim * n, rate = lambda), nosim)
x2 <- apply(x1, 1, mean)
x2 <- as_tibble(x2)
ggplot(x2, aes(x = value)) +
geom_histogram(aes(y = ..density..), fill = "seagreen2",
colour = "black", bins = 20, boundary = 0) +
geom_vline(xintercept = theomean, colour = "black", size = 1.5) +
geom_vline(xintercept = sampmean40, colour = "red", size = 1.5) +
labs(title = "Sampling distribution of 1000 averages of 40 exponential functions",
subtitle = "Theoretical mean (black) vs. sample mean (red)",
x = "Distribution of sample means",
y = "Count of sample means")
Population Variance:
lambda = 0.2
sigma = 1/lambda
popvar = sigma^2
Average of Distribution of Variances of Exponential Functions - i.e. Sample Variance:
sampvarnorm40 <- round(mean(apply(matrix(rexp(nosim * n, rate = lambda), nosim), 1, var)),2)
Distribution of 1000 variances of samples of 40 exponentials:
###Distribution of The Variances of Multiple Samples of the Exponential function
#1000 simulations with sample size = 40
nosim <- 1000
n <- 40
#Distribution of 1000 variances of samples of 40 exponentials
x1 <- matrix(rexp(nosim * n, rate = lambda), nosim)
x2 <- apply(x1, 1, var)
x2 <- as_tibble(x2)
ggplot(x2, aes(x = value, y = ..density..)) +
geom_density(aes(y = ..density..), fill = "seagreen2",
colour = "black", size = 2) +
geom_vline(xintercept = popvar, colour = "black", size = 0.5) +
geom_vline(xintercept = sampvarnorm40, colour = "red", size = 0.5) +
labs(title = "Distribution of exponential variances",
subtitle = "Population var. (black) vs. sample var. (red)",
x = "Distribution of sample variances",
y = "Density of sample variances") +
geom_label(data = x2, mapping = aes(36, 0.04, label = popvar),
colour = "black") +
geom_label(data = x2, mapping = aes(36, 0.032, label = sampvarnorm40),
colour = "red")
Theoretical Variance of the Sampling Distribution:
lambda = 0.2
sigma = 1/lambda
theovar = sigma^2/n
Sample Variance of the Sampling Distribution:
###Sample Variance of Sampling Distribution of the Exponential function
#1000 simulations with sample size = 40
nosim <- 1000
n <- 40
#Simulate variance of 1000 averages of 40 exponentials
sampvar40 <- round(var(apply(matrix(rexp(nosim * n, rate = lambda), nosim), 1, mean)),4)
Density Plot:
###Comparision of Sampling Distribution of Exponential Function with Normal Distribution
#Setting the parameters
lambda = 0.2
mu = 1/lambda
sigma = 1/lambda
nosim = 1000
n = 40
#Constructing the data frames for the plot
x1 <- matrix(rexp(nosim * n, rate = lambda), nosim)
x2 <- apply(x1, 1, mean)
x2 <- as_tibble(x2)
y1 <- as_tibble(rnorm(1000000, mean = mu, sd = sigma/sqrt(n)))
#Constructing the density plot
ggplot(x2, aes(x = value, y = ..density..)) +
geom_density(size = 2, fill = "seagreen2", colour ="black") +
geom_density(data = y1, aes(x = value), size = 2, colour ="red") +
labs(title = "Is the sampling distribution approximately normal?",
subtitle = "Sample size = 1000000 for plotting normal distribution",
x = "Distribution of sample means",
y = "Density of sample means")
This report is based on an assignment for the online course “Statistical Inference” on coursera.org↩︎
https://www.cengage.com/c/statistics-for-business-economics-14e-anderson/9781337901062PF/↩︎