Word Predictor App

Olson Jodl Ignacio V. Gayatin
June 13, 2017

Background

  • People are spending more time on mobile devices for many types of activities.
  • Typing on mobile devices can be inconvenient.
  • The experience can be improved by making typing easier through text prediction.
  • A prototype app is shown based on a predictive text model.
  • Three corpora of texts were used from 3 sources: (blogs, twitter, news).
  • Each set has 30+ million words. Only 1% of the total corpora were randomly sampled and used as reference to improve speed.

Prediction Algorithm - 1

  • The top most frequently appearing quadgrams, trigrams and bigrams were gathered. The word sets are arranged alphabetically first then sorted based on frequency of occurence
  • The top most frequently occuring unigrams (excluding stop words) were are also gathered

Sample word set for quadgram

first three fourth word frequency
a big fan of 8
a bit of a 12
a bit of an 5
a chance to win 8
a couple of weeks 15
a couple of days 9

Prediction Algorithm -2

  • The Katz's backoff model is used to predict the next word
  • Inputs are truncated into the last three words.
  • Given the last three words the algorithim first selects a quadgram which matches the first 3 words and supplies the fourth one
  • If no quadgram exists, the app takes the last 2 words and finds a pattern in the trigram and so on.
  • The highest occuring combination will provide the top ranked word.
  • If additional combinations occur, the 2nd, 3rd and 4th most frequently occuring word will be provided
  • A balance of speed and accuracy was used to build the model

Product Demo