Swiftkey-Based Ngram Text Predictor

William Holst
12/15/2016

alt text

It predicts the next word in a phrase or 'babbles' based on the phrase.

Main Input
- User enters a brief phrase and selects 'Predict..' or 'Babble..'
- System 'cleans' input data and looks up appropriate ngram
- A back-off algorithm determines the 'best' available next word
- Same algorithm predicts a fun babble phrase if that option selected
Frequent Ngrams - shows histograms of the most frequent phrases
About - Explains how the algorithm works

The ngram tables were constructed from Swiftkey-provided text sets from Twitter, blogs, and news sources.

alt text

Application uses a simple Backoff algorithm

phrase of length 3 - pick highest probability from quadgram table
if not present in quadgram, use high probability phrase length 2 in trigram table
if not in trigram, use high probability phrase length 1 in bigram table

Performance of the algorithm

Accuracy -with random test cases of 2,3, and 4 word phrases - approximately 40% correct hit rate

Performance - easily observed with a long babble - approximately 0.2 seconds per lookup

App startup - around 15 seconds total for 4 tables between 7 and 16 mb each