Your Name
May 2026
Typing is slow. Prediction is fast.
Every major mobile keyboard (SwiftKey, Gboard, iOS) uses language models to suggest your next word β saving keystrokes and reducing errors.
Goal: Build a lightweight, real-time next-word predictor trained on real English text (news Β· blogs Β· Twitter).
βPredict the next word the way a human would β by remembering what usually comes next.β
HC Corpora β SwiftKey English Dataset
| Source | Lines (total) | Lines sampled | Tokens |
|---|---|---|---|
| Blogs | 899,288 | ~45,000 | ~5 M |
| News | 1,010,242 | ~50,000 | ~4.5 M |
| 2,360,148 | ~118,000 | ~3 M | |
| Total | 4.27 M | ~213,000 | ~12.5 M |
Pre-processing pipeline:
tidytextngrams.rds (~8 MB) for fast in-app loadingResult: ~150K unigrams Β· ~800K bigrams Β· ~600K trigrams
Why Stupid Backoff?
How it works (3 steps):
Input phrase: "I want to go"
Step 1 β Trigram lookup (last 2 words: "to go")
β Find all w3 where (w1="to", w2="go") β score = freq
Step 2 β Bigram backoff (last word: "go")
β Find all w2 where (w1="go") β score = freq Γ 0.4
Step 3 β Unigram backoff (most common words)
β All unigrams β score = freq Γ 0.16
Return top-5 candidates ranked by score.
Backoff factor Ξ» = 0.4 (standard Stupid Backoff value, Brants et al. 2007)
Try it: https://yourname.shinyapps.io/next-word-predictor

Features:
Test phrases from Twitter & news:
| Phrase (last word removed) | Prediction |
|---|---|
| βHappy birthday to ___β | you |
| βI can't believe how ___β | much |
| βThe president said that ___β | he |
| βShe looked at him and ___β | said |
| βThe team scored a ___β | goal |
| Metric | Value |
|---|---|
| App load time | < 2 s |
| Prediction latency | < 100 ms |
| Memory footprint | ~80 MB RAM |
| N-gram model size | ~8 MB on disk |
| Test-set top-1 accuracy | ~18 % |
| Test-set top-3 accuracy | ~32 % |
Advantages over deep-learning alternatives:
β
No GPU required β runs on free shinyapps.io tier
β
Fully interpretable β you can inspect every n-gram
β
Fast to retrain on new domain data
β
Graceful degradation β always returns a prediction
Future improvements:
Source code: github.com/yourname/next-word-predictor