Text Prediction App

Given a lead phrase (“to be or not to …”), we want to create an application that

  • Predicts the next word with some fidelity
  • Doesn't unduly tax memory resources
  • Doesn't take a long time
  • Can account for novel words

We have to do some trade-off when it comes to accuracy and speed.

The solution? A simple lookup table of known phrases, which is much faster than, and nearly as accurate as, more complex algorithms.

Algorithm Details

To predict what comes after “I root for the inimitable Green Bay”:

First, just take the last four words, “the inimitable Green Bay”. Do we have that four word phrase in our list? We don't.

Then look for “inimitable Green Bay”. Is that three-word phrase in our list? Nope.

Look for “Green Bay”. We do find that. For the phrase “green bay”, the most frequently occuring next word is “packers”. Make the prediction.

To predict what comes after “I don't care for Hilllary” [sic]:

Look for “don't care for Hilllary”, then “care for Hilllary”, then “for Hilllary”, then “Hilllary”.

We don't find a match at all in our lookup lists.

Check to see if any of our one-word phrases are close (string distance <=3) to “Hilllary”. Turns out that the closest word is “hillary”, and the most frequent word after that is “clinton”.

Accuracy

  • Initial accuracy was derived by creating a model generated on a 90% training set of texts in the corpora provided, and measuring accuracy of the model for the 10% hold out.
  • Accuracy using this method was benchmarked at 11.5% for Twitter, 13.4% for blog posts, and 19.2% for news articles.
  • Next, the entire corpora was used to generate a model, and novel texts (10k sentences from 2016 Wikipedia posts and 10k sentences from 2015 news articles) were tested, with a 16.4% accuracy prediction rate for Wikipedia entries and 18.0% accuracy for news articles.

Simple to Use

The app is no-frills. Simply enter the text for which you'd like to have the next word predicted.

There's no need to clean the text – the app handles transforming everything to lower case, removing excess space, etc.

Click the “predict” button, and the suggested next word will appear!

Screenshot of Text Prediction Shiny App