Predictive Words Input System: An interactive application to help the input of words along a message by predicting the next one at each point.

A.Casares M
June 14th 2017

App description

A simulation of the process of predicting the next entry word into a mobile device, based on the previous typed-in words, using Machine Learning techniques applied to the Natural Language Processing field.

The screen after ending a message Self explaining and documented working screen. With examples included. Two possible environments.

Three main objectives:

Easy operation,

Fast and useful answers,

Short hardware requirements (as could be found in a mobile phone).

Ad-hoc light and fast data structures:

For Unigrams: Hash dictionary

For Bi,Tri,Quadgrams: Key named vectors with the frequency embedded at right end of each element.

Access method to key vectors: Bisection with pre selection of the search domain.

User Instructions:

PWIS is not just a word predictor. It will help you to type a sentence or message, offering you, at each step along the process, options of words to choose, saving you the need of typing them.

While you are typing, the application will try to anticipate the next word you need. If it is among the suggestions, just select it by clicking the mouse over the word you want. If it is not there, just continue typing….

The special data structures of the application, the access methods to the data and the small memory requirements that guided the design, allow a fast answer at each step, and the application will keep its pace along with the typist. If he makes a mistake, he can edit or erase some words of the input and the application will adjust itself to the new text.

To increase its utility, it does not suggest just one word in each step, but as far as 5 options you can select from, thus incrementing the chances of getting what you need. At the end, you can obtain application's performance statistics for the message you entered.

It you want to enter another message, just click the bottom-right action button, that resets the buffers.

Algorithmic Insight:

The system sets up a window frame four words wide, and moves it from the first left position, one word at the time, towards the right, keeping aligned with the last input word (typed or choosed from suggestions).

At each position, an open competence among candidates obtained from Quad, Tri and Bigrams (in that order), is stablished. The first model with a valid prediction is the base model and the probabilities of the rest are proyected on this base model using Katz's backoff formulas with simple Good-Turing discounts and estimators.

Essentially, as Wikipedia explains it, if an N-gram has been seen in the data, the conditional probability of a word is proportional to the maximum likelihood estimate (MLE) of that N-gram. Otherwise, the conditional probability is equal to the back-off conditional probability of the (N-1)-gram.

Up to 15 most probable candidates are obtained, and from them the first five are presented as user's options, with the best option specially remarked.

If there are no candidates, the most frequent word taken from a decreasing frequencies ordered keys' vector, with some contextual rules, is purposed as next term.

As another alternative for the default case, when the last word of the prefix has appeared previously in position n within the message, the application outputs the n+1 subsequent term, as SwiftKey Keyboard Android application does.

The application registers its prediction hits along the process, and upon the statistical button action, presents its distribution per model (also in a bar graph), assigned raw probability, mean time per frame, and other statistics.

Technical details can be found in the documentation and examples imbedded in the working screen.

An illustrative example:

Taken from a system's trace Taken from a famous speech, this example allows us to see one instance of prediction:
The full prefix my fellow americans is been used in the model Quadgrams. Finds one quadgram having it as a root: “my fellow americans i”, with a frequency 1.
That makes the model 4 the base model, and the probabilities of other models will be projected upon it using alpha and beta, probability projection factor, progressively less in each new model.

Using fellow americans as a root the search does not throw any valid prediction in Trigrams.
Using americans as root, 99 bigrams are found. The 4 with highest frequencies in those bigrams correspond to: “americans are”, “americans who”, “americans have”,“americans and”. Their probabilities are computed using the alpha and beta values, that make them comparable with the probability found on the base model.

The valid candidates are then sorted by decreasing probabilities, and a final list is obtained. In this case, the prefered option is are (coming from a bigram), then I (coming from the quadgram), and the rest from bigrams.

The Kat'z backoff smoothing and the Good-Turing estimators not only assign non zero probabilities to out-of-vocabulary words (OOVs), but make it possible the comparison between different models.
Although frequently the highest probabilities come from the base model, it is not always that way, as we have just seen in this example.