Word Predictor

5/30/2020

Overview

The goal of this presentation is to introduce a shiny app built to predict the next word based on users’ inputs.

Why this app?

Increase text enter speed. Reduce typos.

The app works by using an n-gram backoff model. A basic idea of the steps the model takes are:

Enter word or phrase of length n. If n > 3, keeps last 3 words of phrase.
App searches dictionary of (n+1)grams for matches of the first n words.
- If match, returns the most frequent (n+1)th word.
- If not, shortens the n-gram (removes first word). Then process repeats with new phrase and continues until match is found.
If no match is found, model returns most frequent unigram(single word).

Response time was sped up by:

pre-cleaning data(including removing retweets(duplicate tweets) & filtering out n-grams with frequency < 10)
using 50% of news and twitter data
writing new, clean, tokenized data files to be used by app
converting data frames to data tables to enable faster lookup

If the app can’t find a match, it returns the highest frequency unigram so the user always has a new word to add to their sentence.