Next Word Predictor

Herbert Barrientos
2016-10-09

Improving the mobile user experience!

The Problem

With the advent of smart mobile devices, text messaging has become an important means of communication.

Smart phones and tablets provide virtual keyboards for typing, but they present the following inconveniences:

Keyboards on devices with smaller screens make it difficult to type due to small key buttons

In general, typing is limited to using the index finger, or at most both thumbs, which many users find slow and frustrating

Typing in this rather “unnatural” way tends to produce many errors, whose fixing is time consuming

Users also tend to over abbreviate words and use an excessive number of acronyms, thus creating bad writing habits

The Solution

The Next Word Predictor is a computational mechanism, based on Natural Language Processing techniques, that:

Helps users compose their messages quickly and linguistically correct, by presenting them with predicted “next words” as they type along
Accepts a single word or a phrase as input, and proposes up to a preconfigured number (currently set at 5) of “next words” aimed following a coherent line of thought with respect to the input text
Uses a sampling method that produces different, but meaningful, “next word” options with every run
Was designed and developed to efficiently use available computing resources (i.e., memory, processing time) on mobile, local computer, and server environments

Solution Features

The solution uses bigrams, trigrams, quadgrams, pentagrams, and sextagrams as “next word” search spaces. This approach processes input texts ranging from a single word to a five-word phrase. Longer phrases are initially truncated to the last five words.

“Next word” probabilities for retrieved search results are calculated using the Chain Rule of Conditional Probability approach, in an attempt to follow the entire input text's “line of thought” as coherently as possible.

The prediction algorithm was designed as a finite state machine, which processes the input text based on its current state (e.g., a seven-word text converted to a five-word text, which in turn may be converted to a four-word text, and so on).

Changing the text's state allows for further searching using a “back off” technique (e.g., if a five-word search did not yield any results, the input text's state is changed to a four-word text, and a new search may begin).

Technical Details

Configurability - depending on available computing resources, the solution is configurable to: a) permanently load all datasets in memory, or load data on demand and discard when done; b) use different sampling search space sizes c) constrain the number of retrieved results to process; and d) change the number of output “next words,” among other options.

Source code - the entire project, which includes detailed documentation, is available at the following URL:
https://github.com/hbarrien/nextwordpredictor

Demo - An online demo is available at the following URL:
https://hpbarr.shinyapps.io/wordpredictor/

NOTE: given the Shiny Apps free server's low resource availability, the demo is configured to operate on “load data on demand and discard when done” mode, which affects response time due to the costly I/O disk operations.