Given a lead phrase (“to be or not to …”), we want to create an application that
We have to do some trade-off when it comes to accuracy and speed.
The solution? A simple lookup table of known phrases, which is much faster than, and nearly as accurate as, more complex algorithms.
To predict what comes after “I root for the inimitable Green Bay”:
First, just take the last four words, “the inimitable Green Bay”. Do we have that four word phrase in our list? We don't.
Then look for “inimitable Green Bay”. Is that three-word phrase in our list? Nope.
Look for “Green Bay”. We do find that. For the phrase “green bay”, the most frequently occuring next word is “packers”. Make the prediction.
To predict what comes after “I don't care for Hilllary” [sic]:
Look for “don't care for Hilllary”, then “care for Hilllary”, then “for Hilllary”, then “Hilllary”.
We don't find a match at all in our lookup lists.
Check to see if any of our one-word phrases are close (string distance <=3) to “Hilllary”. Turns out that the closest word is “hillary”, and the most frequent word after that is “clinton”.
The app is no-frills. Simply enter the text for which you'd like to have the next word predicted.
There's no need to clean the text – the app handles transforming everything to lower case, removing excess space, etc.
Click the “predict” button, and the suggested next word will appear!