Next Word Predictor

Tanmay Padave
2026-06-14

Data Science Capstone — Johns Hopkins University

A smart, fast, and accurate next-word prediction app
built using N-gram language models on 4 million lines of text.

Slide 2: The Problem & Solution

The Problem

  • Typing on mobile is slow and error-prone
  • Users need intelligent, real-time word suggestions

Our Solution

  • A next-word prediction app powered by N-gram language models
  • Trained on 4 million lines of real English text (Twitter, Blogs, News)
  • Returns top predictions in under 1 second
Dataset Lines Source
Twitter 2.36M Social media
Blogs 899K Long-form writing
News 1.01M Formal news

Slide 3: How the Model Works

N-gram Backoff Algorithm

  1. Clean and tokenize user input
  2. Look up last 2 words in Trigram table → return top matches
  3. If no match → back off to Bigram table
  4. If still no match → return most common Unigrams
User types:  "I love the"
             ↓
Trigram lookup: "love the" → [way, most, best, ...]
             ↓
Returns:     "way", "most", "best"

Why Stupid Backoff?

  • Faster than Kneser-Ney smoothing
  • Accuracy within 5% of more complex models
  • Ideal for real-time applications

Slide 4: Performance

Accuracy on held-out test set (10% of corpus)

Metric Performance
Top-1 Accuracy 32%
Top-3 Accuracy 60%
Top-5 Accuracy 74%
Avg Response Time < 1 second

Memory & Speed

  • N-gram tables compressed to < 50 MB
  • Handles out-of-vocabulary words gracefully via backoff
  • Tested on over 500,000 word sequences

Benchmark vs alternatives

Model Top-3 Accuracy Speed
Our N-gram backoff 60% < 1s
Unigram only 18% < 1s
No prediction 0%

Slide 5: The App & Next Steps

How to use the app

  1. Go to shinyapps.io/nextwordpredictor
  2. Type any phrase in the text box
  3. Click Predict — top suggestions appear instantly
  4. Click any suggested word to append it to your text

App features

  • Clean, mobile-friendly interface
  • Adjustable number of suggestions (1–5)
  • Works on any device — no install needed

Next Steps

  • Train on larger corpus for better accuracy
  • Add support for German, Russian, Finnish
  • Implement personalized predictions based on user history
  • Deploy as a mobile keyboard extension

Thank you! Questions welcome.