Tanmay Padave
2026-06-14
Data Science Capstone — Johns Hopkins University
A smart, fast, and accurate next-word prediction app
built using N-gram language models on 4 million lines of text.
The Problem
Our Solution
| Dataset | Lines | Source |
|---|---|---|
| 2.36M | Social media | |
| Blogs | 899K | Long-form writing |
| News | 1.01M | Formal news |
N-gram Backoff Algorithm
User types: "I love the"
↓
Trigram lookup: "love the" → [way, most, best, ...]
↓
Returns: "way", "most", "best"
Why Stupid Backoff?
Accuracy on held-out test set (10% of corpus)
| Metric | Performance |
|---|---|
| Top-1 Accuracy | 32% |
| Top-3 Accuracy | 60% |
| Top-5 Accuracy | 74% |
| Avg Response Time | < 1 second |
Memory & Speed
Benchmark vs alternatives
| Model | Top-3 Accuracy | Speed |
|---|---|---|
| Our N-gram backoff | 60% | < 1s |
| Unigram only | 18% | < 1s |
| No prediction | 0% | — |
How to use the app
App features
Next Steps
Thank you! Questions welcome.