Next-Word Prediction Explorer
Dominique Mühlbauer
2025-07-04
1. Next-Word Prediction: The Opportunity
- Problem: In modern text‐based interfaces (chatbots,
email clients, code editors), users expect smart, real-time
suggestions.
- Solution: Our Interpolated Kneser-Ney
4-gram model delivers lightning-fast, context-aware next‐word
predictions.
- Why now: NLP advances + in-browser/data-product
integration make this the time to ship.
2. How It Works
- Preprocessing & N-grams
- Corpus tokenized, lemmatized, stop-words removed
- Build 1–4-grams with counts, cached on disk
- Interpolated Kneser-Ney Smoothing
- Discounts low-count events (D=0.75)
- Backoff across 4→1 gram levels with learned λ weights
- On-Demand, Indexed Storage
- Model persisted as Parquet with dictionary
encoding
- Arrow predicate‐pushdown reads only needed
contexts
- Memoisation caches repeated lookups
4. Live Demo of the Shiny App
- Enter text in the sidebar; the last
token is highlighted in real time.
- Adjust “Max n-gram order” to see trade-offs between
context depth and speed.
- View top-k suggestions in the table and bar
chart.
- Toggle to “Word Cloud” for a visual glimpse of
candidate probabilities.