Bryan Scheiderer
March 12, 2017
This Shiny web application was developed as part of the capstone course to the Coursera Data Science Specialization in conjunction with SwiftKey.
The assignment was to build an app using natural language processing to predict the next word from a user input of a word or short phrase. A large corpus of of blog, news and twitter data was used to build the model. Using the R programming language and packages, the data set was searched and tokenized, that is, ngram frequency tables were built from the data. Ngrams are sequences of words; in this case only 1, 2, and 3 ngram tables were used. More accurate models will use larger ngrams. The frequency at which ngram appears in the database was calculated and saved. These saved data files are then searched by the model to predict the next word that follows the given user input text.
The Next Word Prediction App is located at:
https://bdscheiderer.shinyapps.io/WordPrediction/
Possible improvements for future versions: