TextPredictionApp

Maria Mendoza
August 19, 2015

TextPredictionApp

The App

This app is a simple text prediction application that attempts to predict the next word based on your previous words.

The algorithm uses 4-grams and trigrams that backs off to bigrams then unigrams.

The probability used to select the next word was calculated with Kneser-Ney smoothing.

Try the app here: https://mariamendoza.shinyapps.io/TextPredictionApp

How to Use The App

  1. Type words into the text box.
  2. The predicted word will appear on the blue button as the button label.
  3. Click the blue button to use the predicted word.
  4. Click the orange button (clear) to clear the text box.

notes:

  • button label / predicted word changes as you type.
  • terminal-like display shows the entire text and changes as you type.

Language Model

  • Training data was derived from 100k twitter lines, 100k news lines and 100k blog lines
  • Extensive cleansing of training data included profanity filtering, emoji filtering, and punctuation standardization.
  • The language model used included 4-grams, trigrams, bigrams and unigrams to allow gradual backoff.
  • The next word is predicted using the probability calculated with Kneser-Ney smoothing (with continuation probability)

Kneser-Ney Smoothing

The Kneser-Ney smoothing implemented applies a discount of 0.75 on the probability of a word (given an ngram) and distributes the discount to all words that follow the same ngram.

\[ \begin{aligned} \ P_{kn}(w_i|w_{i-1}) = \frac {max(c(w_{i-1},w_i)-d, 0)}{c(w_{i-1})} + \lambda(w_{i-1})P_{continuation}(w_i) \ \end{aligned} \] where: \[ \begin{aligned} \ \lambda(w_{i-1}) = \frac d{c(w_{i-1})}|\{w:c(w_{i-1}, w) > 0\}| \ \end{aligned} \]

source: https://web.stanford.edu/~jurafsky/NLPCourseraSlides.html