Coursera's Data Science Capstone. A shiny app to predict the next word.

Francisco Navarro
march, 2017

Slide 2/5

This is the end of the road of my (our) Data Science Specializacion at Johns Hopkins University by Cursera.

In the capstone we are asked to build and deploy on-line a Shiny App where users can enter a phrase in a input box and get a prediction of a single next word after pressing some kind of “submit” buttom. A suitable delay is accepted for the app to apply a model and compute the answer.

Slide 3/5

This is the Shiny App working:

It doesn't use a “submit” buttom at all. As the user stops writing text, the app starts applying the model to get a “next” word. Then, the written text and the next word are displayed together, with the simbol “>>” as a separator. I think is not a bad display.

Slide 4/5

The model behind the App was constructed from data provided by the instructors (see my milestone report for details).

I made intensive use of this packages:

  • tm: a framework for text mining applications within R
  • data.table: fast aggregation of large data (e.g. 100GB in RAM)
  • dplyr: provides a flexible grammar of data manipulation

My intention was to implement a Katz's back-off model but at the end my apps were slow so I turned to the so called Stupid Back-Off model, faster and easier, althoug not so accurate.

Slide 5/5

The shiny app can be run here:

https://francisconm.shinyapps.io/110_Next_Word/

As this a presentation for a peer-graded assignment, it's very likely that you are reading this because, like me, you have worked hard to complete the ten courses specialization. So I want that my last words be for wishing you good luck in your career as a data scientist.