Next-Word Predictor

author: Rafael Del Grande date: r Sys.Date() autosize: true css: custom.css

A keyboard-style text prediction app built with N-gram language models and the SwiftKey corpus.

How It Works

Input: The user types any text phrase.

The model searches 3 tables in order:

  1. Trigram table — looks up the last 2 words as a prefix
  2. Bigram table — backs off to the last 1 word (if no trigram match)
  3. Unigram fallback — returns the most common words overall

This is called Stupid Backoff (Brants et al., 2007):

\[S(w \mid \text{context}) = \begin{cases} \frac{f(\text{context},\,w)}{f(\text{context})} & \text{if seen} \\ 0.4 \times S(w \mid \text{shorter context}) & \text{otherwise} \end{cases}\]

Output: Top 3 predicted next words, displayed as clickable buttons.

Model Performance

left: 50%

Accuracy on held-out test set:

Metric Score
Top-1 Accuracy ~15%
Top-2 Accuracy ~20%
Top-3 Accuracy ~25%
Avg prediction time < 5 ms
Bigram perplexity see report

Size vs. accuracy tradeoff:

```{r tradeoff_plot, echo=FALSE, fig.width=5, fig.height=4} df <- data.frame( min_freq = c(2, 3, 5), size_mb = c(8.0, 4.5, 2.1), top1_acc = c(15.0, 14.2, 12.8) )

par(mfrow = c(2,1), mar = c(3, 3,2,1))

plot(df\(min_freq, df\)size_mb, type = “b”, pch = 19, col = “steelblue”, xlab = “Min Frequency”, ylab = “Size (MB)”, main = “Model Size”)

plot(df\(min_freq, df\)top1_acc, type = “b”, pch = 19, col = “darkgreen”, xlab = “Min Frequency”, ylab = “Top-1 Acc (%)”, main = “Accuracy”)

par(mfrow = c(1,1))


Pruning to `min_freq = 3` cuts size ~44% with < 1% accuracy loss.


The App — How It Works
========================================================

**Try it:** [https://rdelgrande.shinyapps.io/shiny_app/]

User types: “I want to go” ↓ Trigram lookup: “want_to_go” → no match Bigram lookup: “to_go” → [ “to”, “the”, “back” ] ✓ ↓ App shows 3 buttons: [ to ] [ the ] [ back ] ↓ User clicks “the” → input becomes “I want to go the” ↓ Next prediction updates instantly ```

Why This App?

The problem: Typing on mobile is slow and error-prone.

The solution: Real-time next-word prediction — just like your phone keyboard.

Key advantages of this model:

Built with: