Amanda Salvesen
06/18/2017
The next slides will present a text prediction application created for the Coursera Data Science Specialization Capstone Project in June 2017.
The text prediction application draws on a large corpus of blogs, news articles, and tweets to quickly and accurately predict the next word based on a given phrase.
To build this application, the designer:
The predictive function cleans the input data to match the data in the loaded bigram, trigram, and quadgram frequency dictionaries. The model then draws on these dictionaries to predict the next word. First, it tries to match from the quadgram dictionary. If no quadgram is available or the user provides fewer than three words of input, the model reverts to the trigram dictionary. The model then continues to back off to bigrams and the most common unigram in this manner.
Click for further information on backoff models and their implementation in R.
To use the application, the user enters any number of words in the data entry box (red) and clicks “Go!”. The app will automatically display the predicted word in the results box (green). The user can also view the clean input used in the model.
Visit https://asalvesen.shinyapps.io/Capstone/ to try it yourself!
Check out the Coursera Data Science Specialization!
Learn more about natural language processing and text mining in R!
Thank you to Coursera, Johns Hopkins, and Swiftkey for sponsoring this capstone project, and to you for your attention all the way to the last slide.