2026-04-01

How It Works

Input: The user types any text phrase.

The model searches 3 tables in order:

  1. Trigram table — looks up the last 2 words as a prefix
  2. Bigram table — backs off to the last 1 word (if no trigram match)
  3. Unigram fallback — returns the most common words overall

This is called Stupid Backoff (Brants et al., 2007):

\[S(w \mid \text{context}) = \begin{cases} \frac{f(\text{context},\,w)}{f(\text{context})} & \text{if seen} \\ 0.4 \times S(w \mid \text{shorter context}) & \text{otherwise} \end{cases}\]

Output: Top 3 predicted next words, shown as clickable buttons.

Model Performance

Accuracy on held-out test set:

Metric Score
Top-1 Accuracy ~15%
Top-2 Accuracy ~20%
Top-3 Accuracy ~25%
Avg prediction time < 5 ms


The App — How It Works

Try it: https://rdelgrande.shinyapps.io/shiny_app/

User types:   "I want to go"
                      ↓
  Trigram lookup:  "want_to_go"  → no match
  Bigram  lookup:  "to_go"       → [ "to", "the", "back" ] ✓
                      ↓
  App shows 3 buttons:  [ to ]  [ the ]  [ back ]
                      ↓
  User clicks "the"  →  input becomes "I want to go the"
                      ↓
  Predictions update instantly for new context
  • Updates on every keystroke
  • Clicking a word appends it to your text
  • Never fails — always returns a prediction via backoff

Why This App?

The problem: Typing on mobile is slow and error-prone.

The solution: Real-time next-word prediction — just like your phone keyboard.

Key advantages:

  • Fast — predictions in < 5 ms, imperceptible to users
  • 🪶 Lightweight — entire model fits in ~5 MB of RAM
  • 🔄 Robust — Stupid Backoff ensures a prediction is always returned
  • 📱 Deployable — runs on shinyapps.io, no special hardware needed

Built with:

  • quanteda — tokenization and n-gram construction
  • data.table — O(1) prefix lookups via keyed tables
  • shiny — reactive web interface
  • SwiftKey corpus — 800,000+ lines from blogs, news, and Twitter