Prediction of the next word in a sequence of words

a web app based on (smoothed) n-gram models for English and German

author: Christian Thiele

date: 27.04.2015

The interface

Prediction of the next word (top prediction largest)
Probable next words: a Markov Chain prediction of the continuation of the sentence given the top prediction
Supports German and English

The second part of the interface

The user can choose from two prediction algorithms
A gauge for “relative confidence” in the top prediction: the square root of the quantile of the count or probability within its respective group of (skip-)n-grams
Please note the other tabs in the navigation bar with additional information

Data
- The predictions are based on 7.5 million unique (skip-)n-grams ranging from unigrams to 4-grams. To account for longer dependencies skip-5-grams and skip-6-grams are used.
- The (skip-)n-grams were generated using tweets, news and blog articles

Algorithms
- Raw counts and Katz-Backoff: The word with the highest count following the longest possible n-gram
- Kneser-Ney-smoothing and backoff to skip-n-grams: Recursively calculated probabilities of n-grams and backoff to skip-n-grams

In the current version Kneser-Ney-smoothing does not beat Katz backoff in benchmarks which can probably be improved
Based on statistical tagging and Hidden-Markov-Models grammar could be incorporated into the prediction algorithm (there was no resource for tagged language data available)
If there are any technical problems or if you have any questions please contact me via contme109@gmail.com
Please allow a startup time of around 15 seconds when the app is opened