2025-09-04

1. Problem & Goal

  • Writing on devices benefits from predictive text (autocomplete).
  • Goal: Build a model that predicts the next word in a phrase.
  • Data: SwiftKey English corpus (blogs, news, Twitter).
  • Deliverable: A Shiny app for real-time prediction.

2. Data & Algorithm

  • Preprocessing: sampled data, cleaned, tokenized, profanity filtered.
  • Model: n-gram language model (bigrams, trigrams, fourgrams).
  • Backoff: Katz backoff — 4-gram → 3-gram → 2-gram → unigram fallback.
  • Efficiency: pruned rare n-grams; keyed lookups with data.table.

3. Evaluation

  • Tested with Twitter/news-style phrases.
  • The app always returned a prediction (no blanks).
  • Predictions appear instantly during typing (smooth UX).
  • Model is small enough for shinyapps.io free tier.
  • Interface is simple and intuitive (top-1 + suggestion buttons).

4. The App (How to Use)

5. Why it Matters

  • Value: practical, lightweight NLP demo; real-time and deployable.
  • Uses: typing assistants, chat/email composition, mobile UX.