Next Word Predict

Pranav C

1 January, 2026

Coursera Data Science Specialization

Capstone Project

Johns Hopkins University

Objective

This presentation introduces the Next Word Predict Shiny application, including:

The application can be accessed here:

Shiny Application

Next Word Predict is a Shiny application that predicts the next word in a sentence based on text entered by a user.

The application uses an n-gram language model to generate predictions. An n-gram is a contiguous sequence of n words from a body of text.

The predictive model was trained using a large corpus of:

N-grams were extracted from this corpus and used to construct the language model.

Various natural language processing and text mining techniques were explored to improve both prediction accuracy and application performance.

The Predictive Text Model

The predictive text model was built using a sample of approximately 800,000 lines from the original text corpus.

Data Preparation Steps

After cleaning, the text was tokenized into n-grams.

Prediction Algorithm

As the user enters text, the algorithm:

  1. Attempts to match the longest possible n-gram (4-gram)
  2. Backs off to shorter n-grams if needed
  3. Selects the most frequent matching n-gram
  4. Predicts the next word

This uses a simple back-off strategy, prioritizing longer and more informative word sequences.

Application User Interface

The predicted next word appears once the application detects that the user has completed typing one or more words.

Interface Features

Click the image below for a larger view:

Next Word Predict UI