nextWord: faster typing. A Data Science approach.

Alejandro Morales
December, 2014.

What if we could... type less and faster?


  • nextWord does exactly that!
  • core design pinciples: simple, fast and easy-to-use.
  • uses millions of words from different sources.
  • leverages powerful data mining and NLP algorithms
  • nextWord allows the user to cut up to 15% of typing time.



It is estimated that the average cellphone user sends or recieves up to 40 text messages per day with younger users can send or recive up to a hundred messages (Pew Research Center: how americans text, 2011)

Usage

alt text

  • Side panel:
    • phrase captured through the text box.
    • 3 next-word suggestions appear after a short delay.
    • leftmost being most probable .
  • Details: detailed predictions.
  • How to: instructions.
  • About: explains the algorithms and data used.

Algorithm


nextWord uses a trigram language model with interpolation and Kneser-Ney smoothing.

The last two words are used to predict the next word, e.g.:

            may no [nextWord] --> 1. longer

If that combination of words is not existent, we back off to a bigram model, e.g.:

no [nextWord] --> 1.one, 2.longer, 3.matter

Data & Predictive power


Data

  • Corpora: mix of twitter, news and blog sources [1].
  • Cleaned for profanities.
  • Improved accuracy via Kneser-Ney smoothing of lower order n-grams [2].


[1] Coursera data
[2] N-gram language model


Predictive power

  • 15% with single most probable suggestion.
  • up to 20% when considering all three suggestions*.





*When tested against a large text.