Statistical Distribution Generator Development

Course Project / Developing Data Products

Kuphy98 , November, 2015

Introduction

  1. This application is developed to give an instruction about the statistical distributions and to easily observe their properties.
  2. Simple R codes and fuctions are used to generate data samples and to plot histogram, etc.
  3. Once a probability function is selected, it gives a brief description.
  4. The relevant variables of the probability functions can be easily controlled.
  5. Based on the selected values, the histogram will be plotted and the app. also provide its density distribution with a reference plot.
  6. The equations of the selected probability functions are shown below the generated histograms

Contents in the Application

  1. This application requires two decisions: Probability Function & Data sample size.
  2. Once, the probability function is selected, it asks the relevant variables for the function.
  3. It also has the function to nomalize the histograms and to plot density distribution with a reference plot.
  4. The data sample and its statistics can be seen by selecting "Generated Numbers" tab.

alt text

Generating Data Samples

  • Data samples for the distributions are generated in the reactive process.
  • The process below re-runs when one of input variables is changed.
dataInput <- reactive({
      if(input$randomfn == "Gaussian") x <- rnorm(input$nrand, input$mean, input$sd)
      if(input$randomfn == "Poisson") x <- rpois(input$nrand, input$lambda)
      if(input$randomfn == "Binomial") x <- rbinom(input$nrand, input$size, input$prob)

      return(x)
    })
  • The return data sample is used when a histogram plotting. When density distributions are added, the process do not need to re-run.

Plotting a Histogram / Density Distibutions

  • Based on the selected values, the relevant histogram will be plotted.
  • The generated histogram can be compared to the reference plot with its density distribution. Here, the gaussian distribution (for reference) is defined as: \[f(x | \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x - \mu)^2}{2\sigma^2}}\]
  • An example will be shown in the next slide.

Summary / Future Plans

  1. The relationship between inputs values/functions can be understood easily by observing the change in the distributions with this App.
  2. The application provide three statistical distrubitions and breif descriptions, for now.
    => More statistical distributions and specific descriptions will be updated.
  3. The functions for the (inverse) cumulative density function and so on, will be provided.

Example: Plotting distributions

# Generate data sample / plotting a histogram (not-normalized)
rn <- rnorm(1000, mean = 10, sd = 1)
h <- hist(rn, border = "blue", lwd = 2, las = 1, 
          col = "lightblue", freq=F, xlab = 'Generated Numbers', main = '')
# density distribution / reference plot
lines(density(rn, adjust = 2), col="red", lwd=2) 
curve(dnorm(x, mean=mean(rn), sd=sd(rn)), add=TRUE, col="darkblue", lwd=2) 
legend("topleft", c("Density", "Gaussian [Ref]"), col=c("blue", "red"), lwd=3)

plot of chunk hist1