Predictive Text

Gregg Velatini
11/30/2017

some caption

Why Predictive Text ?

Using devices such as smartphones, people are increasingly using text based interfaces to communicate with one another (texting).
Communicating via text, however, is riddled with challenges.
- Typing every letter of every word.
- Using appropriate punctuation.
- Substituting non verbal cues with text appropriate words or symbols.
- “Keyboards” used for text entry are extremely small, limiting the number of characters that are readily available, as well exascerbating keyboard entry errors (typos).

Messages can be exchanged with greater speed and accuracy if an individual's interaction with the keyboard is reduced. One way to reduce the number of keyboard inputs is to use so called “Predictive Text” (PT) algorithms.

How does the Algorithm Work?

Large text files from blogs, printed news, and twitter exchanges were used as a source of real world communication via texting.

Using the Qaunteda package, these text files were mined for the ALL instances of single words (one grams), pairs of words (two grams), and word triples (three grams).

The text files were “cleaned”.

Profanity was removed.
All punctuation except for a single apostrophe was removed.
Application specific words, such as “rt” from twitter were removed

All remaining instances, called features, are assigned a maximum liklihood estimate(MLE) and are then ordered by the MLE.

How it Works continued

Any features that occur only once are removed and the results are written to a file.

Below is an example of the head and the tail of this file.

head(myFeatures,3)

  feature count ngram tot_feat
1     the 92807     1  1849934
2     and 55040     1  1849934
3      to 53681     1  1849934

tail(myFeatures,3)

                feature count ngram tot_feat
316924      rain_in_the     2     3  1760442
316925 the_little_girls     2     3  1760442
316926  the_shark_towel     2     3  1760442

The resulting data file is read in by the application when the application is launched.
The application searches the data file for word suggestions based do the MLE.
If two or more suggested words have the same likelihood, A tie-breaking algorithm is used.
- All suggested words that have the same likelihood are compared to the list of one-grams and ranked by the MLE of the one-grams.

The application generates a “Word Cloud” of the top 30 suggested words.

How to Use the App

The application can be found here: https://gvelatini.shinyapps.io/predictivetext/

Please be patient, it's a little slow to load.

Enter any text you like in [Input Text] entry box.
Click the [GO] button to see the top three suggestions for next word and the Word Cloud!

Notes:

When you enter your text, ensure that there is only one space between each word.
The app is a little slower on the free Shiny server. Please be patient.
The [GO] button keeps the app from searching until before the user finishes typing.