Om Srivastava
June 2026
Every mobile keyboard predicts your next word as you type. How?
The common assumption is that this requires deep learning and heavy infrastructure. It doesn't. A well-engineered statistical n-gram model, trained once, served instantly, delivers the same fluid experience at a fraction of the cost.
This project proves it.
Model at a glance:
Model: Pre-computed frequency tables for 1-grams through 4-grams, stored as pre-sorted Python dictionaries and queried at runtime using the Stupid Backoff algorithm (Brants et al., 2007).
How it works:
Input phrase: "I want to go to the ___"
Extract last 3 words → "go to the"
Look up matching 4-grams → scored candidates: [store ✓, gym, park]
If no match → back off to last 2 words: "to the"
If no match → back off to last word: "the"
Ultimate fallback → highest-frequency unigrams
Backoff scoring:
Each backoff step multiplies the score by λ = 0.4, so longer, more specific context is always preferred over shorter matches.
Why not Kneser-Ney? It offers marginally better perplexity, but Stupid Backoff reaches near-identical accuracy on a large corpus with zero runtime overhead, essential when predicting on every single keystroke.
Live at: https://om05.shinyapps.io/nextword-predictive-text/
It works like a smart keyboard, not a form:
No submit button, no delay, predictions update live on every keystroke via a persistent WebSocket connection, exactly like the autocomplete bar on a phone.
Example: typing “I want to” returns:
| Rank | Word | Confidence | Source |
|---|---|---|---|
| 1 | be | 45% | 4-gram |
| 2 | go | 28% | 4-gram |
| 3 | see | 27% | 4-gram |
Built with: Python | Shiny for Python (PyShiny) | ASGI / Uvicorn | WebSockets
The engineering tradeoffs that made this fast and shippable:
| Decision | Choice | Reason |
|---|---|---|
| Corpus sample | 5% random sample | Preserves common n-grams, fits memory budget |
| N-gram max order | 4-gram | Strong context with diminishing returns beyond |
| Pruning | Drop singletons (count < 2) | Cut model from ~70 MB → 15.7 MB; reduces noise |
| Storage | Pre-sorted Python dicts | O(1) lookups, instant ranked retrieval |
| Serving | PyShiny over WebSockets | Real-time keystroke sync, no page reloads |
Model footprint (pickled n-gram tables):
| File | Size |
|---|---|
| unigrams.pkl | 852 KB |
| bigrams.pkl | 4.9 MB |
| trigrams.pkl | 6.2 MB |
| quadgrams.pkl | 3.8 MB |
| Total | ~15.7 MB |
Measured performance:
Production-grade UX, built without a GPU.
Fluid, keyboard-style experience The app behaves like a real virtual keyboard, type and predictions stream in live, with keyboard shortcuts (1, 2, 3) to append words. No “Submit” button means it feels like a genuine utility, not a class assignment.
Extreme efficiency A resource-heavy LSTM or Transformer would hit cold-start and memory limits on a free 1 GB tier. Instead, an optimized Stupid Backoff model runs in under 16 MB of RAM with sub-millisecond execution, and never hallucinates.
Pragmatic engineering Singleton pruning, pre-sorted context lists, and reactive caching demonstrate real command of resource constraints and system design, the difference between a model that works in a notebook and a product that ships.