Prediction of the next word in a sequence of words
a web app based on (smoothed) n-gram models for English and German
author: Christian Thiele
date: 27.04.2015
Interface and main features
Prediction of the next word (top prediction largest)
Probable next words: a Markov Chain prediction of the continuation of the sentence given the top prediction
Supports German and English
Additional features
The user can choose from two prediction algorithms
A gauge for “relative confidence” in the top prediction: the square root of the quantile of the count or probability within its respective group of (skip-)n-grams
Please note the other tabs in the navigation bar with additional information
Prediction algorithms and data
Data
The predictions are based on 7.5 million unique (skip-)n-grams ranging from unigrams to
4-grams. To account for longer dependencies skip-5-grams and skip-6-grams are used.
The (skip-)n-grams were generated using tweets, news and blog articles
Algorithms
Raw counts and Katz-Backoff: The word with the highest count following the longest possible n-gram
Kneser-Ney-smoothing and backoff to skip-n-grams: Recursively
calculated probabilities of n-grams and backoff to skip-n-grams
Further development and additional information
In the current version Kneser-Ney-smoothing does not beat Katz backoff in
benchmarks which can probably be improved
Based on statistical tagging and Hidden-Markov-Models grammar could be
incorporated into the prediction algorithm (there was no resource for tagged
language data available)
If there are any technical problems or if you have any questions
please contact me via contme109@gmail.com
Please allow a startup time of around 15 seconds when the app is opened