Next Word Generator

LaKeya King

2026-06-19

Overview

This project builds a predictive text model using natural language processing techniques.

Goal: Predict the next word given a user-entered phrase using real-world text data.

Data

The model is trained on three datasets: - Blogs - News - Twitter

A sample of the data was used to improve performance and reduce computation time.

Methodology

The model uses n-grams: - Bigrams (2-word sequences) - Trigrams (3-word sequences)

Each n-gram is counted and ranked by frequency.

Prediction Algorithm

  1. Take the last two words of input
  2. Search trigram dataset for matches starting with those words
  3. If found → return most frequent next word
  4. If not → fallback to bigram using last word
  5. If still no match → return default (“the”)

Predictions are based on the most frequent matching phrase.

Prediction Function + App

The prediction function: - Cleans input text (lowercase, remove punctuation) - Splits input into words - Uses pattern matching to find matching n-grams

matches <- trigram_df[grepl(paste0("^", last_two), trigram_df$text), ]
predicted <- most_freq_word(matches)

The Shiny app allows users to: 1. Enter a phrase 2. Click “Predict” 3. View the next word instantly