TurboText - text prediction technology to support large data sets

vanilla-ic

The TurboText app takes text input and then provides a prediction, as well as suggestions for the next word.

Instructions

Enter text in the text field in the top left
The app will take the last words (up to 3).
The app then matches it against quadgram, trigram, bigram and unigram tables.
It then automatically provides a prediction of the next word, as well as further suggestions.

The model eventually deployed was the stupid backoff model, which is described in section 4.

The stupid backoff model searches for a match, initially starting with quadgrams.
If there are no matches in the quadgrams then it backsoff (N-1) to the trigrams, if there are no matches in then trigrams then backsoff to the bigrams, then unigrams.
Lambda has been to set to 0.4, however this maybe adjusted in future versions.
20% of the data set was used in this model.

A vocab dictionary is maintained where each possible word from training data has been stored with an index.
n-grams are not storing the actual string, rather they are pointing to an index in the dictionary.
For 2, 3, and 4-grams, each word within the gram is pointing to an index in the vocab dictionary.
Lookup and comparisons are involved using integers (which gives O(1) comparison time, as compared to string comparison which takes O(n) time), it gives run time efficiency.
Data has been hashed and accessed by reference only.

TurboText can support a model built with large data.
TurboText has impressive accuracy and this can be further improved with funded research.
TurboText is very suitable to the mobile market. We are focusing on IOS devices, as Swiftkey are strong in android platforms.
Our development center is in Asia and TurboText can support Asian languages.
Clink on TurboText and try it out.