Next Word Predictor

2026-05-27

Product Overview and Data

Built an interactive Shiny app that predicts the next word from a user-entered English phrase
Final model: Stupid Backoff
Sampled 30% from blogs, news, and Twitter to reduce memory and processing time
Split data into train, validation, and test sets using an 80% / 10% / 10% split
Cleaned text and created unigram, bigram, trigram, and fourgram frequency tables
Converted raw text into context-target training data using a 3-word sliding window

Compared Simple Backoff, Stupid Backoff, Laplace-smoothed Backoff, and Naive Bayes
Evaluation used validation and test datasets
Metrics included top-k accuracy, runtime, and model object size
Stupid Backoff achieved the highest test top-k accuracy: 30.60%
Naive Bayes was fastest and smallest, but Stupid Backoff was selected for better prediction accuracy

Users type an English phrase into the text box
The app displays the top 5 predicted next words; the first suggestion is the highest-ranked prediction
Users can click a suggested word to add it to the input phrase and continue writing
Five Twitter/news-style test phrases were entered with the final word removed; the app returned predictions for all five
The app provides a fast and interactive next-word prediction experience using a compact language model