- Typing full sentences is slow
- Mobile keyboards already suggest the next word
- Goal: Build the same thing from scratch using real data
Given a phrase, predict the most likely next word.
2026-06-25
Given a phrase, predict the most likely next word.
Step 1 — Training data
Step 2 — N-gram tables
Step 3 — Backoff prediction
Input: "thanks for the" → Look in quadgram table first → No match? Try trigram → No match? Try bigram → Return top 3 most frequent matches
| Metric | Value |
|---|---|
| Training words | ~14 million |
| Bigram pairs | ~500,000 |
| Trigram sequences | ~300,000 |
| Quadgram sequences | ~200,000 |
| Predictions returned | Up to 3 |
| Response time | < 1 second |
Backoff ensures a prediction is always returned, even for rare phrases.
How to use it:
Features:
Live app: shinyapps.io/next-word-prediction-shiny-app-swadhwa
| Data | Blogs + News + Twitter |
| Model | N-gram backoff (2 / 3 / 4-gram) |
| Output | Top 3 next-word predictions |
| Speed | Under 1 second |
| Built with | R + Shiny |
Simple, fast, and always returns a prediction.
Created by Sumit Wadhwa