NextWord Prediction: Real-Time Next-Word Prediction

Om Srivastava
June 2026

Slide 1, The Problem Worth Solving

Every mobile keyboard predicts your next word as you type. How?

The common assumption is that this requires deep learning and heavy infrastructure. It doesn't. A well-engineered statistical n-gram model, trained once, served instantly, delivers the same fluid experience at a fraction of the cost.

This project proves it.

  • Trained on the SwiftKey English corpus (blogs, news, tweets)
  • Built in Python, deployed live on Shiny Apps (PyShiny)
  • Predicts the next word as you type, no button, no waiting

Model at a glance:

  • 5.0M corpus words processed | 59,783-word vocabulary | up to 4-gram context
  • Entire model: under 16 MB, sub-millisecond predictions

Slide 2, The Algorithm: Stupid Backoff on N-grams

Model: Pre-computed frequency tables for 1-grams through 4-grams, stored as pre-sorted Python dictionaries and queried at runtime using the Stupid Backoff algorithm (Brants et al., 2007).

How it works:

Input phrase: "I want to go to the ___"

Extract last 3 words → "go to the"
Look up matching 4-grams → scored candidates: [store ✓, gym, park]
If no match → back off to last 2 words: "to the"
If no match → back off to last word: "the"
Ultimate fallback → highest-frequency unigrams

Backoff scoring:

  • If the n-gram exists: S(w) = count(n-gram) / count(prefix)
  • Otherwise back off: S(w) = 0.4 × S(w | shorter context)

Each backoff step multiplies the score by λ = 0.4, so longer, more specific context is always preferred over shorter matches.

Why not Kneser-Ney? It offers marginally better perplexity, but Stupid Backoff reaches near-identical accuracy on a large corpus with zero runtime overhead, essential when predicting on every single keystroke.

Slide 3, The App: NextWord Prediction

Live at: https://om05.shinyapps.io/nextword-predictive-text/

It works like a smart keyboard, not a form:

  1. Start typing any English phrase into the text box
  2. As you type, the top 3 next-word suggestions appear instantly, each with a confidence percentage and the n-gram order used (e.g. “4-gram”)
  3. Press 1, 2, or 3 (or click the chip) to append that word and keep typing

No submit button, no delay, predictions update live on every keystroke via a persistent WebSocket connection, exactly like the autocomplete bar on a phone.

Example: typing “I want to” returns:

Rank Word Confidence Source
1 be 45% 4-gram
2 go 28% 4-gram
3 see 27% 4-gram

Built with: Python | Shiny for Python (PyShiny) | ASGI / Uvicorn | WebSockets

Slide 4, Performance & Design Decisions

The engineering tradeoffs that made this fast and shippable:

Decision Choice Reason
Corpus sample 5% random sample Preserves common n-grams, fits memory budget
N-gram max order 4-gram Strong context with diminishing returns beyond
Pruning Drop singletons (count < 2) Cut model from ~70 MB → 15.7 MB; reduces noise
Storage Pre-sorted Python dicts O(1) lookups, instant ranked retrieval
Serving PyShiny over WebSockets Real-time keystroke sync, no page reloads

Model footprint (pickled n-gram tables):

File Size
unigrams.pkl 852 KB
bigrams.pkl 4.9 MB
trigrams.pkl 6.2 MB
quadgrams.pkl 3.8 MB
Total ~15.7 MB

Measured performance:

  • Local prediction latency: < 1 ms (O(1) dictionary lookups)
  • Deployed round-trip (incl. WebSocket): 30–50 ms
  • Cold-start model load into memory: < 0.5 s

Slide 5, Why This Matters

Production-grade UX, built without a GPU.

Fluid, keyboard-style experience The app behaves like a real virtual keyboard, type and predictions stream in live, with keyboard shortcuts (1, 2, 3) to append words. No “Submit” button means it feels like a genuine utility, not a class assignment.

Extreme efficiency A resource-heavy LSTM or Transformer would hit cold-start and memory limits on a free 1 GB tier. Instead, an optimized Stupid Backoff model runs in under 16 MB of RAM with sub-millisecond execution, and never hallucinates.

Pragmatic engineering Singleton pruning, pre-sorted context lists, and reactive caching demonstrate real command of resource constraints and system design, the difference between a model that works in a notebook and a product that ships.

Try it: https://om05.shinyapps.io/nextword-predictive-text/