My Next Word Suggestion

Birgit Kiesewetter
August 2016

My Next Word Suggestion

Saves time and makes life easier …


alt text

This fast and simple Application helps out, if you are struggling with typing on a small keyboard or if you are just too lazy to type.
Three words are suggested you can choose from just by clicking.



- Click here and try it out -


The next slides give some background information on the data and model used.

What's Behind the Application?

  • The application algorithm is a simple Katz's back-off model
  • The model has been trained and build on a 18 MB text corpus including sampled text lines from Twitter, Blogs and News.
  • The data has been cleaned up: Numbers, graphs, punctuation, white-spaces and swearwords have been removed.
  • The frequencies of the next word following a phrase are used to weight the phrases.
  • The suggestions are derived from dictionaries, one Trigram (3-word phrases) and one Bigram (2-word phrases) dictionary and finally from the 3 most common words “the, to, end”. For details please see the next slide.
  • The final dictionaries include only the Top 3 suggestions per phrase. In pat situations the single word frequency decided.
  • On the suggested words some cosmetic transformations have been done like changing “im” back to “I'm” or “youre” to “you're” as those have been stripped out during cleaning.

Example on How It Works

alt text




alt text

Advantages and Enhancement Ideas

Advantages:

  • Easy user friendly interface
  • Fast due to streamlined and sorted dictionaries (150000 total phrases)
  • Simple understandable algorithm
  • Around 50% accuracy tested against 80,000 new unique phrases

Enhancement Ideas:

  • Integrate additional languages
  • Add spellchecker like Google “Did you mean …”
  • Increase accuracy through context sensitive prediction model and dictionaries
  • Train the dictionaries on user specific phrases

The complete code can be found in this Github Repo