Joe Florence
2/4/18
This application is the culmination of a project that uses natural language processing tools to investigate large text files. It provides an interactive data product that provides an educated guess regarding the next word in a sequence.
The product is similar to ones used to speed up text messaging on your phone, e.g., the SwiftKey technology, albeit without the accuracy. Doing so, it takes a series of words and predicts the next word in the series.
I use a Katz's back-off model to predict the next word after an n-gram. The model predictions are based on the conditional probability that a given word follows some sequence of prior words.
Although other computational approaches can produce similar outputs (e.g., the Keyner-Ney Smoothing algorithm), the back-off model combines relative simplicity with relatively accurate predictions.
You can read more about this model online at Wikipedia.
Here is a screenshot of the Predict Next Word application. It is located online at this address: joeflorence.shinyapps.io/next_word_prediction.
To use the app, simply type in a few words.
For more information about the program, click on the “Read Me” tab.
I created this program and website as part of the Johns Hopkins University/Coursera Data Science Specialization program. The specialization consists of 10 distinct courses, and this project is its culminating, capstone project. Check out the material on Coursera's website if interested.
Thanks for your attention!