Word prediction using n-grams

P. D. Bolier
14 March 2017

Goal

Develop an 'algorithm' to predict a word given a sentence, thus enabling users of faster and easier text entry on a device, such as a mobile phone. Altough the devices become ever more powerfull typing text on a small screen (or (virtual) keyboard) is not easy or efficient. If we can help users by predicting the next word and make it easy to pick one word out of a short list it would save a lot of time and irritation.

This model is built using n-grams; where n = 1..4 is used. Before doing so, a set of documents or text must be analysed, cleaned and processed to produce a set of usable n-grams.

Steps

  • analyse, clean the data, donated by Swiftkey, using 80% for training.
  • remove sparse terms
  • build n-grams model for n = 1..4
  • smooth the probabilities
  • use the n-grams to predict words: start with 4-gram
  • if not enough try using the simpler grams.

Application

The application is hosted at Shinyapps and can be used to play with. It takes a sentence as input and tries to come up with a word fitting the sentence.
enter sentence
After entering a sentence the button can be used to initiated the prediction algorithm. It will show a top ten order in decreasing probability.

Where to go

We've defined and build a proof of concept for predicting words, it already shows some promising results although it needs some polishing.

References: