Next Word Predictor

Anam Shaikh
25th June, 2026

Slide 1: The Problem

Typing is slow. Prediction makes it faster.

Mobile users type millions of words daily
Auto-complete is expected in every modern keyboard
A smart next-word predictor saves time and reduces errors

Our Solution:

A lightweight, fast, N-gram language model trained on
real-world English text from blogs, news, and Twitter.

Slide 2: The Data & Model

Training Data (Coursera SwiftKey Corpus)

Source	Lines	Sampled (5%)
Blogs	899,288	~45,000
News	1,010,242	~50,500
Twitter	2,360,148	~118,000

Algorithm: Stupid Backoff N-gram Model

Clean & tokenize text → build 1–4 gram frequency tables
Given input, look up quadgram match first
If not found → back off to trigram
If not found → back off to bigram
If not found → return most frequent unigram

Slide 3: Algorithm Performance

Why Stupid Backoff?

No need to normalize probabilities → very fast
Simple to implement and scale
Handles unseen phrases gracefully via backoff
Memory efficient — tables stored as data.table objects

Speed & Accuracy

Metric Value

Prediction time < 100ms

Quadgram coverage ~42%

Trigram coverage ~31%

Bigram coverage ~22%

Unigram fallback ~5%

Slide 4: The Shiny App

Live at: https://YOUR_ACCOUNT.shinyapps.io/NextWordApp/

Features:

Text input box — type any English phrase
One-click prediction with “Predict Next Word” button
Predicted word highlighted in green inline with your phrase
Algorithm transparency panel showing N-gram level used

Instructions:
Type a partial sentence in the text box
Click Predict Next Word
See the predicted word highlighted in context
Adjust your phrase and predict again

Slide 5: Conclusion & Future Work

Key Takeaways

Fast, accurate N-gram predictor with Stupid Backoff
Trained on diverse real-world English corpora
Clean, intuitive Shiny interface
Deployed and accessible to anyone online

Future Improvements

Add top-3 word suggestions (not just top-1)
Incorporate Kneser-Ney smoothing for better accuracy
Add support for other languages
Use a neural language model (LSTM/Transformer) for harder cases

Thank You!