Complexity reduction
- All text provided by Swiftkey was processed initially and then reduced by only including N-Grams which occur greater than 5 or 10 times in the text
Prediction process
The App is based on pre-computed N-Gram tables, from bigrams upto 5-Grams, containing prediction words and scores
- An input string is converted to tokens with punctuation, stopwords etc.. removed
- Each N-Gram table contains strings of words in one column, a predictor word and a frequency / score - e.g. String: “once_upon_a”, Predictor: “time”, Score: n
- A backoff model is used to find the most likely next word prediction
- Match the last four words of the input string to the 5-Gram table and select the most frequent word
- If no match exists backoff to a lower order N-Gram using a stupid backoff score and match
- Failing this if no matches are found, the most popular unigrams are selected