Capstone Project - Word Prediction Shiny App

March 11, 2017

Word Prediction Shiny App Overview

The “Your next word is…” app predicts the next word in a phrase input by the user.
The phrase can be any length but must be in English.

The instructions for the user are quite simple:

  • Enter a phrase
  • Click “Submit”

Word Prediction Algorithm

A simple back off model was implemented as follows:

  • Term frequency tables were created for trigrams, bigrams, and unigrams using a corpus created from blog posts, news articles, and tweets (provided on the Capstone Coursera webpage).
  • If the phrase input by the user is 2 words or longer, the last word of the most frequent trigram whose first two words match the last two words of the user input is returned as the prediction.
  • If there is no trigram whose first two words match the last two words of the user input or if the input phrase is only 1 word, then the bigram table is searched. In this case, the last word of the most frequent bigram whose first word matches the last word of the user input is returned.
  • If the last word of the input phrase does not match the first word in any bigrams, then the most frequent unigram is returned.

Rationale for Algorithm Selection

There are two important factors to consider when selecting a word prediction algorithm:

  • Accuracy: Are the word predictions correct?
  • Speed: How long does the user have to wait for a prediction?

Improving the accuracy of an algorithm may also increase the time required to return a prediction. While the simple back off model used in this app may not be as accurate as more complex models, it requires very few calculations resulting in a short waiting time for the user.

Shiny App Example Screenshot