2025-09-01
Slide 1: Problem & Data
- Goal: predict the next word given a user-typed phrase.
- Data: SwiftKey English corpora (blogs, news, twitter).
- We sampled data for prototype (10k lines per source).
Slide 2: Model
- Approach: Back-off n-gram model (trigram -> bigram -> unigram).
- Implementation in R using
tidytext, dplyr.
- Quick, explainable, small memory footprint for Shiny.
Slide 3: Performance
- Simple evaluation: top-1 accuracy on small held-out set (example result shown in report).
- Prediction time: <1 second per query on a typical laptop.
- Trade-off: simplicity vs. advanced ML (RNN) — good for prototype & fast deployment.
Slide 4: The App
- Shiny app: text input, single-word prediction output.
- Includes example test phrases.
- Easy to deploy on shinyapps.io and share.
Slide 5: Demo & Next Steps
- Demo five test phrases and show predictions.
- Next steps: use larger model, implement top-3 predictions, add smoothing (Kneser-Ney) and backoff, deploy updated app.