NLP Next Word Generator App

2026-06-20

Overview

This project builds a next-word prediction model using natural language processing techniques.

Goals:

Uses n-gram modeling (trigrams)
Predicts the next word based on previous two words
Built and deployed as a Shiny web application

Data & Model

The model is trained on three datasets (Blogs, News, and Twitter) and processed into trigrams. A sample of the data was used to improve performance and reduce computation time

Steps:

Convert text to lowercase
Remove punctuation
Tokenize into words
Build trigram frequency table

Example:

Input: “I love data”
Model learns: “I love → data”

Prediction Function + App

The prediction function: - Cleans input text (lowercase, remove punctuation) - Splits input into words - Uses pattern matching to find matching trigrams

matches <- trigram_df[grepl(paste0("^", last_two), trigram_df$text), ]
predicted <- most_freq_word(matches)

Returns the most frequent next word

The Shiny app allows users to:

Enter a phrase
Click “Predict”
View the next word instantly

App Features & Conclusion

The application includes:

Simple and clean user interface
Real-time prediction
Input validation (requires at least two words)
Fast response using precomputed n-grams

Key achievements:

Built a functional NLP prediction model
Demonstrated how n-grams predict language patterns
Delivered an interactive web application