2025-09-04
1. Problem & Goal
- Writing on devices benefits from predictive text (autocomplete).
- Goal: Build a model that predicts the next word in a phrase.
- Data: SwiftKey English corpus (blogs, news, Twitter).
- Deliverable: A Shiny app for real-time prediction.
2. Data & Algorithm
- Preprocessing: sampled data, cleaned, tokenized, profanity filtered.
- Model: n-gram language model (bigrams, trigrams, fourgrams).
- Backoff: Katz backoff — 4-gram → 3-gram → 2-gram → unigram fallback.
- Efficiency: pruned rare n-grams; keyed lookups with
data.table.
3. Evaluation
- Tested with Twitter/news-style phrases.
- The app always returned a prediction (no blanks).
- Predictions appear instantly during typing (smooth UX).
- Model is small enough for shinyapps.io free tier.
- Interface is simple and intuitive (top-1 + suggestion buttons).
4. The App (How to Use)
5. Why it Matters
- Value: practical, lightweight NLP demo; real-time and deployable.
- Uses: typing assistants, chat/email composition, mobile UX.