Next Word Predictor

author: Tulsidai Singh date: r format(Sys.Date(), "%B %d, %Y") autosize: true

The Problem

Every time you type, your phone guesses the next word.

Behind that guess is a language model trained on millions of sentences that learns which words tend to follow others.

This app does exactly that Built entirely in R from a 10% sample of the HC Corpora English dataset (~900,000 lines of blogs, news, and tweets).

Why does this matter?

Reduces typing effort
Powers assistive and accessibility tools
Foundation for more advanced NLP systems

How It Works

The model is a stupid backoff n-gram model — fast, interpretable, and effective for this scale.

At prediction time

1Take the last 2 words → look up matching trigrams

2No trigram found → fall back to the last word → look up bigrams

3No bigram found → return the most frequent unigrams

Why stupid backoff? No complex smoothing required. Runs in milliseconds on pre-built frequency tables. Singleton n-grams pruned to reduce memory footprint.

Model Performance

Corpus: 10% sample of HC Corpora English (~900K lines)

Metric	Value
Vocabulary size	~150,000 unique words
50% word coverage	131 unique words
90% word coverage	6,861 unique words
Top bigram	of the (26,000+ occurrences)
Prediction latency	< 1 second

The sharp coverage cliff (131 → 6,861 words for 50% → 90%) confirms Zipf’s law and justifies aggressive singleton pruning without meaningful accuracy loss.

The App

Live at tulsidai.shinyapps.io/en_US

How to use it

1Type any phrase into the text box

2Press Predict

3The top 3 predicted next words appear instantly — the most likely prediction is highlighted in pink

Example output for “arctic monkeys this” weekend time year

Summary

Next Word Predictor demonstrates that a lightweight n-gram model built entirely in R can deliver fast, reasonable text predictions with no external dependencies.

Key takeaways

Trained on ~900K lines of real English text
Trigram → bigram → unigram backoff chain
Sub-second predictions, minimal memory footprint
Clean, accessible UI deployable on any device

Built with R · shiny · tidytext · shinyapps.io