Sean Angiolillo
18 Feb 2018
The capstone project for the JHU Data Science Specialization on Coursera calls for a next word prediction Shiny app. As laid out in my milestone report, I broke the assignment into the following four R scripts and the Shiny app itself.
| base | pred |
|---|---|
| a barrier | to |
| a bartab | drink |
| a bartender | at and |
| a base | of salary hit |
| a baseball | cap bat game |
| a bases | loaded |
| a basic | level understanding income |
| a basis | for in |
| a basket | of with and |
| a basketball | game team player |
Starting with a 15% sample of the original data, the final data table had 369,575 unique ngrams.
Predictions saved in one character vector and then split as needed if queried in the app.
The app incorporates a very simple backoff style algorithm. In a table of bigrams and trigrams, the algorithm first processes the user input to accept at most two words.
Once this text has been cleaned in the same manner as the data, the algorithm searches the data table for its accompanying predictions.
If it is not found, it will attempt to locate only the last word given.
If that too is not found, it simply gives the result of the most common unigrams.
Try the app for yourself at the link below: https://seanangio.shinyapps.io/next_word_app/
The code can be found on Github.