Next Text - Web App

Jas Sohi - www.jassohi.com
December 12, 2014

Introduction

The “Next Text” web app allows the user to enter in any 3 word phrase and predicts the next word that most users would type.

In addition, it offers other likely predictions (if the predicted word is not what the user was thinking) in the form of a word cloud.

EXAMPLE: If the user types: “What are you” He/she gets a prediction of “doing”.

Final

Algorithm

  • I used a bag of words approach to count the frequency of Ngrams in the corpora(documents used to train the model). The position relative to other Ngrams doesn't matter.

  • I combined it with a back-off approach. We first compare the user's 3 words to a 4gram model(all four consecutive words - but only need to match the first 3 words), if no matches then a trigram model(all three consecutive words try to match the user's last 2 words to the first 2 words of the trigram), and if still no match we compare the last word with a bigram(all 2 word combinations).

  • I removed frequencies that only appeared once since they did not improve the accuracy of the prediction.

Implementation

  • I first cleaned the corpora(removed punctuation, invalid characters, etc) in Python using the NLTK package and exported the Ngrams to 3 text files.
  • Once a user types in three words, using the sqldf package in R, I compare these 3 words against a SQL query (I query one of three R dataframes which represent Ngram)
  • If all 3 words match, return the 4th most frequent word.
  • Otherwise, compare the last 2 words, against a query to the trigram(3 words) and then the last word to a bigram(if no matches to the last word).
  • Finally, to simplify things, if there are no matches I predict the most common second word “the”. This is relatively rare as most of the time the user types in at least one matching word.

Instructions

  1. Wait for the app to load the instructions and required data.

  2. Once “Done Loading!” appears, enter any 3 word phrase into the text box.

  3. Click on the Predict! button.

  4. Wait a few seconds and you will see the predicted, most likely next word, and a word cloud if there are any other alternative predictions (less likely).

Functionality & Benefits

  • Re-calculates how many words the user has typed on the fly. Word Count
  • Provides loading messages to tell the user what is going on. Loading
  • Status message - tells the user once loading is complete or to try another phrase. Status Messages
  • Responsive - It works well on both mobile and desktop web browsers. No need to develop separate apps.
  • Wow factor! - Shows both the predicted word & a word cloud with other predictions.