2025-09-01

Slide 1: Problem & Data

  • Goal: predict the next word given a user-typed phrase.
  • Data: SwiftKey English corpora (blogs, news, twitter).
  • We sampled data for prototype (10k lines per source).

Slide 2: Model

  • Approach: Back-off n-gram model (trigram -> bigram -> unigram).
  • Implementation in R using tidytext, dplyr.
  • Quick, explainable, small memory footprint for Shiny.

Slide 3: Performance

  • Simple evaluation: top-1 accuracy on small held-out set (example result shown in report).
  • Prediction time: <1 second per query on a typical laptop.
  • Trade-off: simplicity vs. advanced ML (RNN) — good for prototype & fast deployment.

Slide 4: The App

  • Shiny app: text input, single-word prediction output.
  • Includes example test phrases.
  • Easy to deploy on shinyapps.io and share.

Slide 5: Demo & Next Steps

  • Demo five test phrases and show predictions.
  • Next steps: use larger model, implement top-3 predictions, add smoothing (Kneser-Ney) and backoff, deploy updated app.