Text Predictor App

David Rubinger
January 24, 2015

Introduction

  • The Text Predictor App predicts the next word in a phrase entered in by a user.
  • Simply type in your text and hit the submit button, and the app will instantaneously predict the next word

alt text

Model

  • The algorithm is based on a n-gram back-off model
  • Uni-, bi-, tri- and quadrigrams were trained on a 160,000 corpus of blogs, news and tweets from HC Corpora
  • The algorithm initially tries to match the input with a quadrigram, and returns the most frequently occurring quadrigram provided it's above a certain threshold. If there isn't match past a certain threshold, it works its way down to lower-order models
  • The source for profane words filtered out as part of preprocessing were taken from here https://gist.github.com/jamiew/1112488

Model

  • Thresholds were set to ensure more reliability in the predictions as well as trimming down on model size
  • Thresholds were set such that was there was 80% coverage of unique unigrams in the training set, 35% coverage of bigrams, 8% trigrams, and 1% of quadrigrams

Enjoy!