Presentation for Course Project in Developing Data Products

Numerical Proof of Central Limit Theorem

Hsin-Hua Lai
Not a simple guy

Proof of Central Limit Theorem

The Shiny App is at: https://hsinhualai.shinyapps.io/Course_Project_Develop_Dat_Products/

The Presentation is at: http://rpubs.com/hsinhua/http://rpubs.com/hsinhua/170919

R codes can be found in my github: https://github.com/hsinhualai/datasciencecoursera/tree/master/Course_Project_Develop_Dat_Products/Shiny_App_Course_Project

In this Course Project, we develop a Shiny app for numerically proving the Central Limit Theorem (CLT). CLT in a nutshell states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough. We choose three sampling distributions, 1. Exponential Distribution, 2. Poisson Distribution and 3. Uniform Distribution

During the simulations, we first generate 40 values out of random distribution functions and obtain the mean value. We then repeat doing this. In the shiny app, the slider gives the number of iterations ranging from 1 to 10000. We can see that all the sampling distribution of the mean values become normal. In the three slides later, I will introduce in order the Exponential Distribution, Poisson Distribution, and Uniform Distribution, and how to extract the theoretical mean and variance of the final normal distribution for a proof of CLT.

Exponential Distribution and Poisson Distribution

  1. Exponential Distribution: The probability density function of an exponential distribution is \[f(n,\lambda)=\lambda e^{-\lambda n},\] where the \(\lambda\) is the rate and the \(n\) is the number of observation. The sampling distribution of the mean of the values should approach normal with mean \(\lambda^{-1}=5\) and variance \(\frac{\lambda^{-2}}{n}=0.625\), where we choose \(\lambda = 0.2\) and \(n=40\).

  2. Poisson Distribution: For an event occurring in integer times,i.e., 0, 1, 2..., in an interval. The average number of events in an interval is designated \(\lambda\), where \(\lambda\) is the event rate, which represent the average number of events per interval.The probability of observing \(n\) events in an interval is given by the Poisson distribution, \[ f(n,\lambda) = \frac{\lambda^n e^{-\lambda}}{k!}.\] For the Poisson distribution, the \(\lambda\) is equal to the expectation value of \(x\) and also its variance, \(\lambda = E(x) = var(x)\). The sampling distribution of the mean of the values should approach normal with mean \(\lambda = 0.2\) and variance \(\frac{\lambda}{n}=0.005\), where \(\lambda = 0.2\) and \(n=40\).

Uniform Distribution to CLT

A uniform distribution is a distribution that has constant probability. The probability density function on the interval \([a,b]\) are defined as \[ P(x)= \frac{1}{b-a}~~for~a \leq x \leq b.\] In the usual situation as in the app for the course project, we choose \([a,b]=[0,1]\). The mean and the variance of the uniform distribution is straightforward to obtian, which are \(E[x] =0.5\) and \(var[x]=\frac{(b-a)^2}{12}=\frac{1}{12}\). If we sample the mean of the values generated from the random uniform distribution, the distribution will approach nomal with mean \(E[\bar{x}]=0.5\) and the variance \(var[\bar{x}] = \frac{var[x]}{n}=\frac{1}{48}\simeq 0.00208\).

Illustration of the plot in the Shiny App

For an illustration, we plot the mean value distribution randomly generated by the exponential distribution. The green curve is the normal distribution. The vertical blue line illustrates the mean value.

plot of chunk unnamed-chunk-1