Capstone Product Pitch: Next-Word Prediction

2026-02-13

1) What We Built

A production-style Shiny text prediction app (shiny/app.R).
Input: a short phrase (2+ words).
Output: predicted next word + top suggestions + confidence bar.
Two interchangeable model backends:
- N-gram backend: optimized trigram Stupid Backoff model.
- Class backend: lightweight class transition model.
Interactive controls:
- suggestion count
- backend selection
- common-word penalty tuning for n-gram predictions.

# From project root:
# shiny::runApp("shiny")

Text is cleaned and tokenized consistently with training.
Next-word candidates come from:
1. trigram context (strongest signal),
2. bigram backoff,
3. unigram fallback.
Scores are combined with backoff penalties.
To reduce generic outputs, we apply a common-word reranking penalty:
- stronger on backoff candidates,
- capped to avoid over-boosting rare words.

Step	Purpose
Input phrase	User context
Clean + tokenize	Standardized context tokens
Trigram lookup	High-context predictions
Bigram backoff	Fallback if trigram sparse
Unigram fallback	Final coverage safety net
Rerank + return top N	Reduce common-word bias

Example benchmark tradeoff chart (from evaluate_model.R output).

Fast and practical: trigram-only model improves speed and memory footprint.
Robust UX: confidence display, validation, and guided tuning controls.
Flexible architecture: plug-and-play n-gram and class backends.
Data-driven optimization: evaluation pipeline quantifies speed/accuracy tradeoffs.
Deployment-ready: self-contained Shiny app with reproducible training scripts.

## App entry point: shiny/app.R

## Training entry point: R/train_model.R