Wells Fargo Campus Analytic Challenge

Scott Jacobs, St. Mary’s College

March 2, 2016

Agenda

  • Brief Overview & Analytic Process
  • Demo

    Interactive Open Source Tools (R)

  • Conclusion

Quick question:

R recently had an anniversary. Does anyone know which one?

  • 16 years since R 1.0.0

Developed in R using RStudio

Packages I found useful;

  • STM & LDA for Topic Modeling
  • qdap for Sentiment Analysis
  • knitr/rmarkdown for Project Submission in html
    • (and this preso!)
  • html_widgets
    • stmCorrViz, stmBrowser

Overview of Process

  1. Data Cleaning: regex to parse text
  2. Data Exploration: association of words with corpus
  3. Text Modeling:
    • Latent Dirichlet Allocation model, wasn’t so coherent, threw it out
    • Structural Topic Model model with 5, then 10, then 20 topics (stm has a pedestrian looking plot for this)
  4. Visualization of fitted model

  5. Labeled topic model

Visualizations: d3 wrapped in html_widgets

Topic Cluster:

stmCorrViz() - function to create d3 viz of topic correlations w/in STM (Clustering)

Topic Plot:

stmBrowser() - function to create d3 scatter plot to compare topics

Test Drive

http://sco-lo-digital.github.io/WellsFargo_Analytics_Challenge/

Conclusion

  • Pros: Low or no cost, opportunity to leverage amazing work of the R community
  • Cons: Might not always be scalable, might have bugs (But have found authors to be very responsive)
  • Open source tools are extremely valuable for rapid prototyping from conception to delivery, helping to transform ideas into action.

Questions?

Please let me know if you have any questions.

Thanks!

Helpful Sites:

http://www.structuraltopicmodel.com

http://www.buildingwidgets.com

http://www.htmlwidgets.org

http://rpubs.com/scottjacobs/CampusChallenge