Next Word Predictor

Anam Shaikh
25th June, 2026

Slide 1: The Problem

Typing is slow. Prediction makes it faster.

  • Mobile users type millions of words daily
  • Auto-complete is expected in every modern keyboard
  • A smart next-word predictor saves time and reduces errors

    Our Solution:

    A lightweight, fast, N-gram language model trained on
    real-world English text from blogs, news, and Twitter.

Slide 2: The Data & Model

Training Data (Coursera SwiftKey Corpus)

Source Lines Sampled (5%)
Blogs 899,288 ~45,000
News 1,010,242 ~50,500
Twitter 2,360,148 ~118,000

Algorithm: Stupid Backoff N-gram Model

  1. Clean & tokenize text → build 1–4 gram frequency tables
  2. Given input, look up quadgram match first
  3. If not found → back off to trigram
  4. If not found → back off to bigram
  5. If not found → return most frequent unigram

Slide 3: Algorithm Performance

Why Stupid Backoff?

  • No need to normalize probabilities → very fast
  • Simple to implement and scale
  • Handles unseen phrases gracefully via backoff
  • Memory efficient — tables stored as data.table objects

    Speed & Accuracy

    Metric Value
    Prediction time < 100ms
    Quadgram coverage ~42%
    Trigram coverage ~31%
    Bigram coverage ~22%
    Unigram fallback ~5%

Slide 4: The Shiny App

Live at: https://YOUR_ACCOUNT.shinyapps.io/NextWordApp/

Features:

  • Text input box — type any English phrase
  • One-click prediction with “Predict Next Word” button
  • Predicted word highlighted in green inline with your phrase
  • Algorithm transparency panel showing N-gram level used

    Instructions:

  • Type a partial sentence in the text box

  • Click Predict Next Word

  • See the predicted word highlighted in context

  • Adjust your phrase and predict again

Slide 5: Conclusion & Future Work

Key Takeaways

  • Fast, accurate N-gram predictor with Stupid Backoff
  • Trained on diverse real-world English corpora
  • Clean, intuitive Shiny interface
  • Deployed and accessible to anyone online

Future Improvements

  • Add top-3 word suggestions (not just top-1)
  • Incorporate Kneser-Ney smoothing for better accuracy
  • Add support for other languages
  • Use a neural language model (LSTM/Transformer) for harder cases

Thank You!