TurboText - text prediction technology to support large data sets

vanilla-ic

Description and Instructions

The TurboText app takes text input and then provides a prediction, as well as suggestions for the next word.

Instructions

  • Enter text in the text field in the top left
  • The app will take the last words (up to 3).
  • The app then matches it against quadgram, trigram, bigram and unigram tables.
  • It then automatically provides a prediction of the next word, as well as further suggestions.

The Predictive Algorithm

The model eventually deployed was the stupid backoff model, which is described in section 4.

  • The stupid backoff model searches for a match, initially starting with quadgrams.
  • If there are no matches in the quadgrams then it backsoff (N-1) to the trigrams, if there are no matches in then trigrams then backsoff to the bigrams, then unigrams.
  • Lambda has been to set to 0.4, however this maybe adjusted in future versions.
  • 20% of the data set was used in this model.

Some Technological Aspects

  • A vocab dictionary is maintained where each possible word from training data has been stored with an index.
  • n-grams are not storing the actual string, rather they are pointing to an index in the dictionary.
  • For 2, 3, and 4-grams, each word within the gram is pointing to an index in the vocab dictionary.
  • Lookup and comparisons are involved using integers (which gives O(1) comparison time, as compared to string comparison which takes O(n) time), it gives run time efficiency.
  • Data has been hashed and accessed by reference only.

Key Points for the Investor to Consider

  • TurboText can support a model built with large data.
  • TurboText has impressive accuracy and this can be further improved with funded research.
  • TurboText is very suitable to the mobile market. We are focusing on IOS devices, as Swiftkey are strong in android platforms.
  • Our development center is in Asia and TurboText can support Asian languages.
  • Clink on TurboText and try it out.