Central Limit Theory - A RStudio Presentation

Umut Kahramankaptan
2016-01-22

Developing Data Products - Course Project

Introduction

Central Limit Theorem states that, “the distribution of the sum (or average) of a large number of independent, identically distributed variables will be approximately normal, regardless of the underlying distribution.”

(Ref: http://www.math.uah.edu/stat/sample/CLT.html)

To demonstrate this theorem, a Shiny App is created for Coursera Data Science Specialization - Developing Data Products course

Deliverables

The app is available through https://ukahramankaptan.shinyapps.io/Central_Limit_Theorem where app code is available as well.

This presentation is created as a RStudio Presentation and published on RPubs, which is available through http://rpubs.com/umut_kahramankaptan/CLT.

The Rpres file is available through GitHub via https://github.com/umutkahramankaptan/CLT-Rpres

Goals

  1. While number of iid variables increase, simulations will approximate to the underlying distribution.
  2. While number of simulations increasing distribution of means will approximate to the normal distribution .
  3. With changing underlying distributions, while the simulation changes according to their underlying distribution in the upper figure, the distribution of means will still approximate to the normal distribution in the lower figure

Assumptions

For this app, default values of following underlying distributions used for simulations: runif, dunif, rexp, dexp, rlnorm, dlnorm, rnorm, dnorm.

Code

#preperation
 n<-input$n; r<-input$r
 rdist <- switch(input$dist,"exponential" = rexp, 
   "normal" = rnorm,"log normal" = rlnorm,"uniform" = runif)
 ddist <- switch(input$dist,"exponential" = dexp, 
   "normal"=dnorm,"log normal"=dlnorm,"uniform"=dunif)
#simulations
 samples <- matrix(rdist(n*r),r)
 sim_means <- apply(samples,1,mean)
 sim_mu<-mean(sim_means); sim_sd<-sd(sim_means)
#drawing simulation
 par(mfrow=c(2,1))
 h1<-hist(samples[1,],freq=FALSE, main=paste("1st Simulation of ",as.character(r)), breaks = 50)
 xfit<-seq(min(samples),max(samples),length=500) 
 yfit<-ddist(xfit) 
 lines(xfit, yfit, col="blue", lwd=2)
#drawing distribution of means
 h2<-hist(sim_means, 
   main=paste("Distribution of Means of samples, mu = ",format(sim_mu,digits=5), ", sd = ",format(sim_sd,digits = 5)), breaks = 50)
 xfit<-seq(min(sim_means),max(sim_means),length=500) 
 yfit<-dnorm(xfit,mean=sim_mu,sd=sim_sd) 
 yfit <- yfit*diff(h2$mids[1:2])*length(sim_means) 
 lines(xfit, yfit, col="blue", lwd=2)

Shiny App Main User Interface

Side Panel

A snapshot of the Shiny application's main user interaction

plot of chunk unnamed-chunk-2

Reactively created content to return to user.