Coursera Developing Data Products

Reproducible Pitch Presentation

Darryl Buswell (www.github.com/buswedg)

Some Background


What's exploratory data analysis?

Exploratory data analysis involves summarizing and analyzing data at a high level, often using visual methods. When employed correctly, it is able to give the analyst a 'sense' of the data characteristics and relationships, improving flow-on analysis throughout the data analysis pipeline.

What's the problem?

Exploratory analysis doesn't generate the same sort of titillation as other methods in the data scientist's toolbox (looking at you machine learning). It is often viewed as a tedious exercise that is low-value add.

What's the solution?

Cut down on the manual effort involved for exploratory analysis with a web application which can import raw datasets and automate the charting process...

The Web Application


What's working in the background?

The web application will provide an interactive environment for basic exploratory data analysis. A user interface will be built leveraging R and Shiny, and hosted via the SaaS platform from RStudio shinyapss.io.

What's customizable?

Dynamic chart elements include the type of chart (e.g. histrogram, scatter etc.), varible sample size, which dataset fields are to be assigned to each axis, and finally, which dataset field is to be used to categorise displayed data.

The web application will leverage ggplot2 for charting. Code for a static view of the default plot rendered by the current version of the web application:

ggplot(dataset, aes(x = Age, fill = as.character(Survived))) + guides(fill = FALSE) + geom_histogram()

plot of chunk unnamed-chunk-2

What Next?


  • The appplication is hosted via the SaaS platform from RStudio shinyapss.io and can be accessed here: Shiny page. Note that the current version of the application leverages a static dataset obtained from the Kaggle Titanic Challenge.


  • Raw version of the associated project files and pitch presentation can be found on my GitHub page here: Github page


  • More information on the Kaggle Titanic Challenge can be found here: Kaggle page