Next-Word Predictor

2026-04-01

How It Works

Input: The user types any text phrase.

The model searches 3 tables in order:

Trigram table — looks up the last 2 words as a prefix
Bigram table — backs off to the last 1 word (if no trigram match)
Unigram fallback — returns the most common words overall

This is called Stupid Backoff (Brants et al., 2007):

\[S(w \mid \text{context}) = \begin{cases} \frac{f(\text{context},\,w)}{f(\text{context})} & \text{if seen} \\ 0.4 \times S(w \mid \text{shorter context}) & \text{otherwise} \end{cases}\]

Output: Top 3 predicted next words, shown as clickable buttons.

Model Performance

Accuracy on held-out test set:

Metric	Score
Top-1 Accuracy	~15%
Top-2 Accuracy	~20%
Top-3 Accuracy	~25%
Avg prediction time	< 5 ms

The App — How It Works

Try it: https://rdelgrande.shinyapps.io/shiny_app/

User types:   "I want to go"
                      ↓
  Trigram lookup:  "want_to_go"  → no match
  Bigram  lookup:  "to_go"       → [ "to", "the", "back" ] ✓
                      ↓
  App shows 3 buttons:  [ to ]  [ the ]  [ back ]
                      ↓
  User clicks "the"  →  input becomes "I want to go the"
                      ↓
  Predictions update instantly for new context

Updates on every keystroke
Clicking a word appends it to your text
Never fails — always returns a prediction via backoff

Why This App?

The problem: Typing on mobile is slow and error-prone.

The solution: Real-time next-word prediction — just like your phone keyboard.

Key advantages:

⚡ Fast — predictions in < 5 ms, imperceptible to users
🪶 Lightweight — entire model fits in ~5 MB of RAM
🔄 Robust — Stupid Backoff ensures a prediction is always returned
📱 Deployable — runs on shinyapps.io, no special hardware needed

Built with:

quanteda — tokenization and n-gram construction
data.table — O(1) prefix lookups via keyed tables
shiny — reactive web interface
SwiftKey corpus — 800,000+ lines from blogs, news, and Twitter