6/30/2021

Introduction

This app is an interactive one that takes any string as input and produces the predicted next word as output. It does this by finding the most common 2-,3- and 4-grams that start with the most recent 1-,2- and 3-word phrases that the user has typed.

This app uses millions of tweets and blogs to help predict the user’s next word. It uses data.table for its data storage, including tables of ngrams and their frequencies combined with dictionaries that pair ngrams with integer lookups.

Under the Hood: Steps to Set Up

  • Download data and packages into R
  • Create data tables
    • Four with ngrams, their frequencies, their last word and every word before their last word
      • Pare down these data tables to include only the most-common word that follows (instead of all that have a frequency above a chosen threshold)
    • Four dictionaries: 2-gram, 3-gram, 4-gram, skipGram
  • Convert ngram data tables to integer lookup tables

Under the Hood: Steps to Run the App

  • Take in the text input from the user
  • Find the most common n-grams that include the user’s input
  • Print the most likely next word quickly

Example 1

What happens when you enter the string, “It’s the most”?

Example 2

Evaluation

  • Competitively accurate predictor
  • .083 seconds for average feedback
  • This version is optimized for speed and does not include any visuals beyond the predicted next word