author: Rafael Del Grande date: r Sys.Date() autosize:
true css: custom.css
A keyboard-style text prediction app built with N-gram language models and the SwiftKey corpus.
Input: The user types any text phrase.
The model searches 3 tables in order:
This is called Stupid Backoff (Brants et al., 2007):
\[S(w \mid \text{context}) = \begin{cases} \frac{f(\text{context},\,w)}{f(\text{context})} & \text{if seen} \\ 0.4 \times S(w \mid \text{shorter context}) & \text{otherwise} \end{cases}\]
Output: Top 3 predicted next words, displayed as clickable buttons.
left: 50%
Accuracy on held-out test set:
| Metric | Score |
|---|---|
| Top-1 Accuracy | ~15% |
| Top-2 Accuracy | ~20% |
| Top-3 Accuracy | ~25% |
| Avg prediction time | < 5 ms |
| Bigram perplexity | see report |
Size vs. accuracy tradeoff:
```{r tradeoff_plot, echo=FALSE, fig.width=5, fig.height=4} df <- data.frame( min_freq = c(2, 3, 5), size_mb = c(8.0, 4.5, 2.1), top1_acc = c(15.0, 14.2, 12.8) )
par(mfrow = c(2,1), mar = c(3, 3,2,1))
plot(df\(min_freq, df\)size_mb, type = “b”, pch = 19, col = “steelblue”, xlab = “Min Frequency”, ylab = “Size (MB)”, main = “Model Size”)
plot(df\(min_freq, df\)top1_acc, type = “b”, pch = 19, col = “darkgreen”, xlab = “Min Frequency”, ylab = “Top-1 Acc (%)”, main = “Accuracy”)
par(mfrow = c(1,1))
Pruning to `min_freq = 3` cuts size ~44% with < 1% accuracy loss.
The App — How It Works
========================================================
**Try it:** [https://rdelgrande.shinyapps.io/shiny_app/]
User types: “I want to go” ↓ Trigram lookup: “want_to_go” → no match Bigram lookup: “to_go” → [ “to”, “the”, “back” ] ✓ ↓ App shows 3 buttons: [ to ] [ the ] [ back ] ↓ User clicks “the” → input becomes “I want to go the” ↓ Next prediction updates instantly ```
The problem: Typing on mobile is slow and error-prone.
The solution: Real-time next-word prediction — just like your phone keyboard.
Key advantages of this model:
Built with:
quanteda for tokenization and n-gram constructiondata.table for O(1) prefix lookupsshiny for the interactive web interface