Next-Word Predictor

author: Rafael Del Grande date: r Sys.Date() autosize: true css: custom.css

A keyboard-style text prediction app built with N-gram language models and the SwiftKey corpus.

How It Works

Input: The user types any text phrase.

The model searches 3 tables in order:

Trigram table — looks up the last 2 words as a prefix
Bigram table — backs off to the last 1 word (if no trigram match)
Unigram fallback — returns the most common words overall

This is called Stupid Backoff (Brants et al., 2007):

\[S(w \mid \text{context}) = \begin{cases} \frac{f(\text{context},\,w)}{f(\text{context})} & \text{if seen} \\ 0.4 \times S(w \mid \text{shorter context}) & \text{otherwise} \end{cases}\]

Output: Top 3 predicted next words, displayed as clickable buttons.

Model Performance

left: 50%

Accuracy on held-out test set:

Metric	Score
Top-1 Accuracy	~15%
Top-2 Accuracy	~20%
Top-3 Accuracy	~25%
Avg prediction time	< 5 ms
Bigram perplexity	see report

Size vs. accuracy tradeoff:

```{r tradeoff_plot, echo=FALSE, fig.width=5, fig.height=4} df <- data.frame( min_freq = c(2, 3, 5), size_mb = c(8.0, 4.5, 2.1), top1_acc = c(15.0, 14.2, 12.8) )

par(mfrow = c(2,1), mar = c(3, 3,2,1))

plot(df\(min_freq, df\)size_mb, type = “b”, pch = 19, col = “steelblue”, xlab = “Min Frequency”, ylab = “Size (MB)”, main = “Model Size”)

plot(df\(min_freq, df\)top1_acc, type = “b”, pch = 19, col = “darkgreen”, xlab = “Min Frequency”, ylab = “Top-1 Acc (%)”, main = “Accuracy”)

par(mfrow = c(1,1))


Pruning to `min_freq = 3` cuts size ~44% with < 1% accuracy loss.


The App — How It Works
========================================================

**Try it:** [https://rdelgrande.shinyapps.io/shiny_app/]

User types: “I want to go” ↓ Trigram lookup: “want_to_go” → no match Bigram lookup: “to_go” → [ “to”, “the”, “back” ] ✓ ↓ App shows 3 buttons: [ to ] [ the ] [ back ] ↓ User clicks “the” → input becomes “I want to go the” ↓ Next prediction updates instantly ```

Predictions update on every keystroke
Clicking a suggestion appends it to your text
Works for any English input — unknown words fall back gracefully

Why This App?

The problem: Typing on mobile is slow and error-prone.

The solution: Real-time next-word prediction — just like your phone keyboard.

Key advantages of this model:

⚡ Fast — predictions in < 5 ms, imperceptible to users
🪶 Lightweight — entire model fits in ~5 MB of RAM
🔄 Robust — never fails, always returns a prediction via backoff
📱 Deployable — runs on shinyapps.io with no special hardware

Built with:

quanteda for tokenization and n-gram construction
data.table for O(1) prefix lookups
shiny for the interactive web interface
Training corpus: 800,000+ lines from blogs, news, and Twitter