Text Predictor App
David Rubinger
January 24, 2015
Introduction
- The Text Predictor App predicts the next word in a phrase entered in by a user.
- Simply type in your text and hit the submit button, and the app will instantaneously predict the next word

Model
- The algorithm is based on a n-gram back-off model
- Uni-, bi-, tri- and quadrigrams were trained on a 160,000 corpus of blogs, news and tweets from HC Corpora
- The algorithm initially tries to match the input with a quadrigram, and returns the most frequently occurring quadrigram provided it's above a certain threshold. If there isn't match past a certain threshold, it works its way down to lower-order models
- The source for profane words filtered out as part of preprocessing were taken from here https://gist.github.com/jamiew/1112488
Model
- Thresholds were set to ensure more reliability in the predictions as well as trimming down on model size
- Thresholds were set such that was there was 80% coverage of unique unigrams in the training set, 35% coverage of bigrams, 8% trigrams, and 1% of quadrigrams