Developing Data Products - Project

Zachary Eisenstein
March 12, 2017

Background

This presentation is a deliverable for the Developing Data Products final project. The requirements were to build a Shiny application to be deployed onto the Rstudio servers, as well as to prepare an accompanying reproducable pitch presentation.

This project utilizes R's simulation capabilities as well as the ggplot graphical interface to explore the effect of correlation on random variables. The Shiny app allows for user input of the desired correlation and properties of the random variables. The presentation was created using RStudio Presenter.

The app can be accessed here.

Understanding Correlation

  • In all types of modeling, both simple and complex, capturing dependancy is paramount.
  • Regression analysis and correlation are tools to measure the dependancy between variables.
  • Regression estimates the line of best fit through the data.
  • (Pearson) Correlation measures the strength of the linear relationship.
  • The coefficient of determination or \( R^2 \) is the ratio of the explained variation to the total variation and is also the square of the Pearson correlation \( \rho \).

Uncorrelated Simulation

Below is an example of simulation of uncorrelated variables

x <- rnorm(100); y <- rnorm(100) # simulate 100 pairs of standard normals
plot(x,y) # scatter plot
abline(lm(y~x), col="red") # regression line (y~x) 

plot of chunk unnamed-chunk-1

Correlated Simulation with Shiny App

Inputs to the Shiny App:

  • Number of Simulations, Mean and Std Dev of variables, Correlation coefficient screenshot