2026-02-13

1) What We Built

  • A production-style Shiny text prediction app (shiny/app.R).
  • Input: a short phrase (2+ words).
    Output: predicted next word + top suggestions + confidence bar.
  • Two interchangeable model backends:
    • N-gram backend: optimized trigram Stupid Backoff model.
    • Class backend: lightweight class transition model.
  • Interactive controls:
    • suggestion count
    • backend selection
    • common-word penalty tuning for n-gram predictions.
# From project root:
# shiny::runApp("shiny")

2) How the Prediction Model Works

  • Text is cleaned and tokenized consistently with training.
  • Next-word candidates come from:
    1. trigram context (strongest signal),
    2. bigram backoff,
    3. unigram fallback.
  • Scores are combined with backoff penalties.
  • To reduce generic outputs, we apply a common-word reranking penalty:
    • stronger on backoff candidates,
    • capped to avoid over-boosting rare words.
Step Purpose
Input phrase User context
Clean + tokenize Standardized context tokens
Trigram lookup High-context predictions
Bigram backoff Fallback if trigram sparse
Unigram fallback Final coverage safety net
Rerank + return top N Reduce common-word bias

3) Quantitative Performance Summary

Metric Value
Predictions evaluated 2,737
Top-1 accuracy 17.06%
Top-3 accuracy 27.91%
Avg ms per prediction 41.176

4) How the Product Works (User View)

  • Step 1: Type a phrase (recommended <= 12 words).
  • Step 2: Choose backend (ngram or class).
  • Step 3: Click Predict Next Word.
  • Step 4: Inspect:
    • top prediction
    • confidence bar
    • ranked suggestions.
  • Optional: tune common-word penalty knobs for n-gram backend.
Example benchmark tradeoff chart (from evaluate_model.R output).

Example benchmark tradeoff chart (from evaluate_model.R output).

5) Why This Product Is Awesome

  • Fast and practical: trigram-only model improves speed and memory footprint.
  • Robust UX: confidence display, validation, and guided tuning controls.
  • Flexible architecture: plug-and-play n-gram and class backends.
  • Data-driven optimization: evaluation pipeline quantifies speed/accuracy tradeoffs.
  • Deployment-ready: self-contained Shiny app with reproducible training scripts.
## App entry point: shiny/app.R
## Training entry point: R/train_model.R