Objective

This presentation introduces the Next Word Predict Shiny application, including:

The application can be accessed here:

Shiny Application

Next Word Predict is a Shiny application that predicts the next word in a sentence based on text entered by a user.

The application uses an n-gram language model to generate predictions. An n-gram is a contiguous sequence of n words from a body of text.

The predictive model was trained using a large corpus of:

N-grams were extracted from this corpus and used to construct the language model.

Various natural language processing and text mining techniques were explored to improve both prediction accuracy and application performance.

The predictive text model was built using a sample of approximately 800,000 lines from the original text corpus.

After cleaning, the text was tokenized into n-grams.

As the user enters text, the algorithm:

This uses a simple back-off strategy, prioritizing longer and more informative word sequences.

The predicted next word appears once the application detects that the user has completed typing one or more words.

Click the image below for a larger view: