Prediction of the next word in a sequence of words

a web app based on (smoothed) n-gram models for English and German


author: Christian Thiele

date: 27.04.2015

Interface and main features

The interface

  • Prediction of the next word (top prediction largest)
  • Probable next words: a Markov Chain prediction of the continuation of the sentence given the top prediction
  • Supports German and English

Additional features

The second part of the interface

  • The user can choose from two prediction algorithms
  • A gauge for “relative confidence” in the top prediction: the square root of the quantile of the count or probability within its respective group of (skip-)n-grams
  • Please note the other tabs in the navigation bar with additional information

Prediction algorithms and data

  • Data
    • The predictions are based on 7.5 million unique (skip-)n-grams ranging from unigrams to 4-grams. To account for longer dependencies skip-5-grams and skip-6-grams are used.
    • The (skip-)n-grams were generated using tweets, news and blog articles
  • Algorithms
    • Raw counts and Katz-Backoff: The word with the highest count following the longest possible n-gram
    • Kneser-Ney-smoothing and backoff to skip-n-grams: Recursively calculated probabilities of n-grams and backoff to skip-n-grams

Further development and additional information

  • In the current version Kneser-Ney-smoothing does not beat Katz backoff in benchmarks which can probably be improved
  • Based on statistical tagging and Hidden-Markov-Models grammar could be incorporated into the prediction algorithm (there was no resource for tagged language data available)
  • If there are any technical problems or if you have any questions please contact me via contme109@gmail.com
  • Please allow a startup time of around 15 seconds when the app is opened