June 02, 2026

The Problem & The App

Mobile typing is slow. Predictive text — like SwiftKey — speeds it up by guessing the next word.

This project builds a lightweight, in-browser next-word predictor trained on a large English corpus (blogs, news, Twitter).

Try it now: NextWordPredictor on shinyapps.io

  • Type a phrase in the box → see the top predicted next word
  • Returns top 3 candidates with their frequencies
  • Sub-second response time

How It Works: N-gram Backoff Model

The app uses a Katz-style backoff n-gram model:

  1. Last 3 words? Look up in quadgram table (4-word sequences)
  2. No match → last 2 words? Look up in trigram table
  3. No match → last word? Look up in bigram table
  4. Still no match? Return most common English words
predict_next("have a nice")
# Try "have a nice" in quadgrams → no match
# Try "a nice" in trigrams → match → returns "day"

Training Data & Model Building

  • Data: ~4 million lines from English blogs, news, and Twitter (HC Corpora)
  • Sample: 5% random sample (~200K lines) for tractability
  • Preprocessing: Lowercased, punctuation/numbers/profanity removed
  • N-grams built: Bigrams, trigrams, quadgrams using quanteda
  • Pruning: Only n-grams appearing ≥ 2 times; top 3 predictions per prefix kept

Result: Lookup tables small enough to fit in <50 MB and load instantly.

App Performance & Usage

Speed: Predictions return in under 200 ms thanks to indexed data.table lookups.

Accuracy: Top-3 contains the correct next word for ~25-30% of held-out test phrases (typical for n-gram models without neural enhancement).

How to use:

  1. Open the app
  2. Type any English phrase in the input box
  3. Click Predict Next Word
  4. See the top predicted word + top 3 candidates with frequencies

Try It & Resources

Live app: https://desanipurvisha.shinyapps.io/predictapp/

Example phrases that work well:

  • “have a nice” → day
  • “I want to” → know / go / be
  • “thanks for the” → follow / help / great

Future improvements:

  • Smoothing (Kneser-Ney) for better rare-word handling
  • Neural model (RNN/Transformer) for context beyond 3 words
  • Personalization based on user typing history

Thank you for reviewing! — Created June 02, 2026