Sandy
Given a phrase of n words, predict the single most likely word to follow β instantly and accurately.
Why it matters
Our Solution
Build a fast, accurate next-word predictor trained on real English text β and wrap it in a polished Shiny web app anyone can use.
User types: "I want to go to the ___"
App returns:
store park gym beach next
Built with R Β· Trained on 102 M words Β· Deployed on shinyapps.io
| Source | Lines | Words | Size |
|---|---|---|---|
| π Blogs | 899,288 | 37.3 M | 210 MB |
| π° News | 1,010,242 | 34.3 M | 206 MB |
| π¦ Twitter | 2,360,148 | 30.3 M | 167 MB |
| Total | 4.27 M | 102 M | 583 MB |
EDA Highlights
The same scoring approach used at Google for web-scale LMs β no normalisation, sub-millisecond lookups.
# Back-off chain (highest n-gram wins):
4-gram match β score = 1.000 Γ freq / total
3-gram match β score = 0.400 Γ freq / total
2-gram match β score = 0.160 Γ freq / total
unigram fall β score = 0.064 Γ P(word)
| N-gram | Entries | Min Freq |
|---|---|---|
| Unigram | 39,987 | 2 |
| Bigram | 199,420 | 2 |
| Trigram | 163,116 | 2 |
| Quadgram | 61,530 | 2 |
Why Stupid Back-off over Kneser-Ney?
data.table pre-indexingπ https://YOUR-NAME.shinyapps.io/NextWordPredictor
How to use it
| Feature | Detail |
|---|---|
| Response time | < 100 ms |
| Prediction levels | 4-gram β 3-gram β 2-gram β unigram |
| Suggestions shown | 5 clickable word pills |
| Click-to-complete | β Appends & re-predicts instantly |
| Confidence bars | β Scored bar chart per candidate |
| Sentence preview | β Highlighted top prediction |
| Corpus | Blogs + News + Twitter (en_US) |
| Model size on disk | ~3 MB (4 .rds files) |
| Phrase (last word removed) | Prediction | |
|---|---|---|
| βI want to go to the ___β | store | β |
| βHappy birthday to ___β | you | β |
| βThe president of the United ___β | States | β |
| βThanks for sharing this ___β | week | β |
| βLooking forward to seeing ___β | you | β |
We built a production-ready text prediction engine β clean pipeline, proven algorithm, polished UI β in days, not months. The same architecture powers keyboards used by billions worldwide.