Help! What should I say next?

Katherine Vance
August 14, 2020

The Need

The world is full of socially awkward people. These people may struggle to find the right words to say in many situations:

  • Professional networking
  • Small talk at parties
  • Chatting with the grocery store cashier, the person behind you in line, or anyone else you encounter in daily life

This app will improve peoples' lives by helping them think of things to say in these situations. The audience of potential users is huge - it's everyone!

The User Interface

The app is straightforward and easy to use.

  • Users enter some English text in the text box
  • Click the “suggest the next word for me!” button
  • The suggested word will appear to the right or below the button, depending on the size of the user's browser window

The Model

The model creating these suggestions is a simplified Katz back-off model. It was trained on a random 5% sample of the HC Corpora, which consists of blog posts, news articles, and tweets collected by a web crawler.

  1. The training corpus was tokenized into words using tidytext tools, and profane or offensive words were filtered out using a list from Luis von Ahn's website.
  2. The result was tokenized into 2-grams and 3-grams, which were then counted.
  3. The model filters profane or offensive words out of the inputted phrase, then selects the last two words.

The Model (continued)

  1. If the last two words appear as the first two words in any 3-grams in the training corpus, the model returns the third word in the most common such 3-gram.
  2. If no such 3-gram exists, the model looks for 2-grams in the training corpus whose first word is the last word of the input phrase, and returns the second word in the most common such 2-gram.
  3. If no such 2-gram exists, the model returns the most common word in the training corpus (“the”).

Ties in the counts of 2-grams and 3-grams are broken using alphabetical order.