=============================================================
- Synopsis
=============================================================
- The Exponential Function
=============================================================
- The Report
=============================================================
- Appendix
  - A. A Brief Introduction To The Exponential Function
  - B. Code Chunks
=============================================================
- References

=============================================================

Synopsis¹

In this project, we will explore the exponential function in R. We will investigate the distribution of averages of exponential functions, taking a sample size of 40 and running 1000 simulations using the rexp() function. We will compare the theoretically expected value and variance of the distribution with the results of our simuations, and investigate whether the distribution is approximately normal.

=============================================================

The Exponential Function

The general exponential function can be represented as \(f(x) = a^x\), where \(a\) is any value greater than 0. For more information on the exponential function please go to Appendix - Section A.

=============================================================

The Report

1. Theoretical Mean vs. Sample Mean

The expected value of the distribution of averages of samples of exponential functions (i.e., the theoretical mean of the sampling distribution) is equal to the population mean(\(\mu\)). For an exponential function with \(rate = \lambda = 0.2\), the mean is \(\mu = 1/\lambda\).

Thus, the theoretical mean of the distribution is 5.

Now, let’s take the mean of the averages of 1000 samples of 40 exponential functions (i.e., the sample mean of the sampling distribution). We will perform the simulation in R.

The sample mean is 4.97.

As we expected, the value of sample mean is very close to the value of the theoretical mean.

We can illustrate this by simulating the above distribution, and comparing its sample mean and the theoretical mean.

2. Theoretical Variance vs. Sample Variance

Population Variance vs. Sample Variance:

The sample variance is considered to be an unbiased estimator of the population variance. This means that the distribution of the variances of multiple samples will be centred at the population variance.

In order to test this, let’s compare the population variance(\(\sigma^2\)) of an exponential function with \(rate = \lambda = 0.2\) and \(\sigma = 1/\lambda\). The population variance comes to \(\sigma^2 = 25\).

Now, for the sample variance, let’s take the average of 1000 variances of 40 exponential functions. This comes to \(s^2 = 25.14\).

Thus, the sample variance is centred very close to the population variance, which is as we expected. Let’s illustrate the distribution of the sample variances, and the population and sample variances, in the form of a plot.

Theoretical Variance vs. Sample Variance of the Sampling Distribution:

The theoretical variance of the sampling distribution is equal to the population variance divided by the sample size(\(\sigma^2/n\)). For an exponential function with \(rate = \lambda = 0.2\) and \(\sigma = 1/\lambda\), the theoretical variance comes to 0.625.

Now, let’s take the variance of the averages of 1000 samples of 40 exponential functions (i.e., the sample variance of the samping distribution). We will perform the simulation in R.

The sample variance is 0.6331.

As we expected, the value of the sample variance of the sampling distribution is very close to the value of the theoretical variance.

3. Does the Sampling Distribution approximate a Normal Distribution?

Yes, it does. We will prove this first using the CLT, and then by using a plot.

The Central Limit Theorem:

“The Central Limit Theorem (CLT) is one of the most important theorems in statistics…the CLT states that the distribution of averages of iid (independent and identically distributed) variables becomes that of a standard normal as the sample size increases.” - Dr. Brian Caffo, Statistical Inference For Data Science²

We have used the distribution of averages of iid exponential functions. So far so good. But is our sample size of 40 enough for the distribution to approximate a normal distribution?

To answer this question, let’s turn to another authority on the CLT.

General statistical practice is to assume that, for most applications, the sampling distribution of \(\overline{x}\) can be approximated by a normal distribution whenever the sample size is 30 or more. - Anderson, Sweeney & Williams, Stastistics For Business And Economics³

Given that 40 > 30, we can safely confirm that the distribution of 1000 averages of 40 exponentials approximates a normal distribution.

Density Plot:

In order to confirm that our distribution is indeed normal, we will compare the density plots of our distribution (1000 samples, sample size = 40, means of exponential functions) with a normal distribution (1000000 random normals). For both distributions, \(mean = \mu = 5\), and variance = \(\sigma^2/n = 0.625\). The density plot is as follows - it affirms the fact that our distribution is approximately normal.

=============================================================

Appendix

A. A Brief Introduction To The Exponential Function⁴

The general exponential function can be represented as \(f(x) = a^x\), where \(a\) is any value greater than 0.

The properties of the exponential function depend on the value of \(a\):

When \(a = 1\), the graph of the function is a horizontal line at \(y = f(x) = 1\)
When \(0 < a < 1\), the graph of the function is a strictly decreasing line along the X axis
When \(a > 1\), the graph of the function is a strictly decreasing line along the X axis

Other important properties of the exponential function are:

It is always greater than 0, and never crosses the X axis
It always intersects the Y axis at \(y = 1\). Thus, it always passes through \((0,1)\)
At \(x = 1\), \(f(x) = a\). In other words, the function always passes through \((1,a)\)

B. Code Chunks

Theoretical Mean of the Sampling Distribution:

lambda = 0.2
theomean = 1/lambda

Sample Mean of the Sampling Distribution:

###Sample Mean of the Sampling Distribution of Exponential functions

#1000 simulations with sample size = 40
nosim <- 1000
n <- 40
lambda <- 0.2
#Simulate mean of 1000 averages of 40 exponentials
sampmean40 <- round(mean(apply(matrix(rexp(nosim * n, rate = lambda), nosim), 1, mean)),2)

Distribution of 1000 means of samples of 40 exponentials:

###Plotting distribution of 1000 means of samples of 40 exponentials...
###...with a comparison of the theoretical and sample means

#Install and Load the tidyverse set of packages
#install.packages("tidyverse") (#Remove comment sign if already installed)
library(tidyverse)
#Plot the histogram
x1 <- matrix(rexp(nosim * n, rate = lambda), nosim)
x2 <- apply(x1, 1, mean)
x2 <- as_tibble(x2)
ggplot(x2, aes(x = value)) +
        geom_histogram(aes(y = ..density..), fill = "seagreen2", 
                     colour = "black", bins = 20, boundary = 0) +
        geom_vline(xintercept = theomean, colour = "black", size = 1.5) +
        geom_vline(xintercept = sampmean40, colour = "red", size = 1.5) +
        labs(title = "Sampling distribution of 1000 averages of 40 exponential functions",
             subtitle = "Theoretical mean (black) vs. sample mean (red)",
             x = "Distribution of sample means",
             y = "Count of sample means")

Population Variance:

lambda = 0.2
sigma = 1/lambda
popvar = sigma^2

Average of Distribution of Variances of Exponential Functions - i.e. Sample Variance:

sampvarnorm40 <- round(mean(apply(matrix(rexp(nosim * n, rate = lambda), nosim), 1, var)),2)

Distribution of 1000 variances of samples of 40 exponentials:

###Distribution of The Variances of Multiple Samples of the Exponential function
#1000 simulations with sample size = 40
nosim <- 1000
n <- 40
#Distribution of 1000 variances of samples of 40 exponentials 
x1 <- matrix(rexp(nosim * n, rate = lambda), nosim)
x2 <- apply(x1, 1, var)
x2 <- as_tibble(x2)
ggplot(x2, aes(x = value, y = ..density..)) +
        geom_density(aes(y = ..density..), fill = "seagreen2", 
               colour = "black", size = 2) +
        geom_vline(xintercept = popvar, colour = "black", size = 0.5) +
        geom_vline(xintercept = sampvarnorm40, colour = "red", size = 0.5) +
        labs(title = "Distribution of exponential variances",
             subtitle = "Population var. (black) vs. sample var. (red)", 
             x = "Distribution of sample variances", 
             y = "Density of sample variances") + 
        geom_label(data = x2, mapping = aes(36, 0.04, label = popvar), 
                   colour = "black") +
        geom_label(data = x2, mapping = aes(36, 0.032, label = sampvarnorm40), 
                   colour = "red")

Theoretical Variance of the Sampling Distribution:

lambda = 0.2
sigma = 1/lambda
theovar = sigma^2/n

Sample Variance of the Sampling Distribution:

###Sample Variance of Sampling Distribution of the Exponential function

#1000 simulations with sample size = 40
nosim <- 1000
n <- 40
#Simulate variance of 1000 averages of 40 exponentials
sampvar40 <- round(var(apply(matrix(rexp(nosim * n, rate = lambda), nosim), 1, mean)),4)

Density Plot:

###Comparision of Sampling Distribution of Exponential Function with Normal Distribution

#Setting the parameters
lambda = 0.2
mu = 1/lambda
sigma = 1/lambda
nosim = 1000
n = 40
#Constructing the data frames for the plot
x1 <- matrix(rexp(nosim * n, rate = lambda), nosim)
x2 <- apply(x1, 1, mean)
x2 <- as_tibble(x2)
y1 <- as_tibble(rnorm(1000000, mean = mu, sd = sigma/sqrt(n)))
#Constructing the density plot
ggplot(x2, aes(x = value, y = ..density..)) +
        geom_density(size = 2, fill = "seagreen2", colour ="black") +
        geom_density(data = y1, aes(x = value), size = 2, colour ="red") +
        labs(title = "Is the sampling distribution approximately normal?",
             subtitle = "Sample size = 1000000 for plotting normal distribution",
             x = "Distribution of sample means", 
             y = "Density of sample means")

=============================================================

References

This report is based on an assignment for the online course “Statistical Inference” on coursera.org↩︎
https://leanpub.com/LittleInferenceBook ↩︎
https://www.cengage.com/c/statistics-for-business-economics-14e-anderson/9781337901062PF/↩︎
https://www.mathsisfun.com/sets/function-exponential.html ↩︎

Statistical Truths Via The Exponential Function

Siddharth Samant

12/06/2020

=============================================================

Synopsis¹

=============================================================

The Exponential Function

=============================================================

The Report

1. Theoretical Mean vs. Sample Mean

2. Theoretical Variance vs. Sample Variance

3. Does the Sampling Distribution approximate a Normal Distribution?

=============================================================

Appendix

A. A Brief Introduction To The Exponential Function⁴

B. Code Chunks

=============================================================

References

Statistical Truths Via The Exponential Function

Siddharth Samant

12/06/2020

=============================================================

Synopsis1

=============================================================

The Exponential Function

=============================================================

The Report

1. Theoretical Mean vs. Sample Mean

2. Theoretical Variance vs. Sample Variance

3. Does the Sampling Distribution approximate a Normal Distribution?

=============================================================

Appendix

A. A Brief Introduction To The Exponential Function4

B. Code Chunks

=============================================================

References

Synopsis¹

A. A Brief Introduction To The Exponential Function⁴