Word Prediction App

Enrique Balp
Dec 2014

Word Prediction

In today's world people type many different phrases in many devices. It is important to predict the next word so in order to have functions such as autocomplete. Features based on word prediction have the potential of adding value.

In this presentation we show a word predicting app we just developed.

The Data

For the model, we used data from Twitts, News and Blogs, taken from the HC Corpora English corpus. Consisting on more that 500 MB of text, we believe this is representative of the use of modern English language.

The algorithm

After removing all characters except alphanumeric and apostrophes, we created 2-grams, 3-grams and 4-grams. Then we counted the occurrences of each n-gram and sorted them in descending order.

After getting aninput string, we match with the most appropriate 4-gram, 3-gram or 2-gram. Given this, we are now in a position to give the probabilities for each prediction (based on n-gram count).

There are many possible improvements of the algorithm and the app. For example, we could include 5-grams to give better predictions.

The app

The app accepts an input string and a number of desired predictions (will give less if there are not enough).

It outputs the most likely word based on the model and a table with the probabilities for different predictions.

We are on the path to a great word prediction tool, which could possible add great value in autocomplete functions everywhere.