Next Word Predictor

Tanmay Padave
2026-06-14

Data Science Capstone — Johns Hopkins University

A smart, fast, and accurate next-word prediction app
built using N-gram language models on 4 million lines of text.

Slide 2: The Problem & Solution

The Problem

Typing on mobile is slow and error-prone
Users need intelligent, real-time word suggestions

Our Solution

A next-word prediction app powered by N-gram language models
Trained on 4 million lines of real English text (Twitter, Blogs, News)
Returns top predictions in under 1 second

Dataset	Lines	Source
Twitter	2.36M	Social media
Blogs	899K	Long-form writing
News	1.01M	Formal news

Slide 3: How the Model Works

N-gram Backoff Algorithm

Clean and tokenize user input
Look up last 2 words in Trigram table → return top matches
If no match → back off to Bigram table
If still no match → return most common Unigrams

User types:  "I love the"
             ↓
Trigram lookup: "love the" → [way, most, best, ...]
             ↓
Returns:     "way", "most", "best"

Why Stupid Backoff?

Faster than Kneser-Ney smoothing
Accuracy within 5% of more complex models
Ideal for real-time applications

Slide 4: Performance

Accuracy on held-out test set (10% of corpus)

Metric	Performance
Top-1 Accuracy	32%
Top-3 Accuracy	60%
Top-5 Accuracy	74%
Avg Response Time	< 1 second

Memory & Speed

N-gram tables compressed to < 50 MB
Handles out-of-vocabulary words gracefully via backoff
Tested on over 500,000 word sequences

Benchmark vs alternatives

Model	Top-3 Accuracy	Speed
Our N-gram backoff	60%	< 1s
Unigram only	18%	< 1s
No prediction	0%	—

Slide 5: The App & Next Steps

How to use the app

Go to shinyapps.io/nextwordpredictor
Type any phrase in the text box
Click Predict — top suggestions appear instantly
Click any suggested word to append it to your text

App features

Clean, mobile-friendly interface
Adjustable number of suggestions (1–5)
Works on any device — no install needed

Next Steps

Train on larger corpus for better accuracy
Add support for German, Russian, Finnish
Implement personalized predictions based on user history
Deploy as a mobile keyboard extension

Thank you! Questions welcome.