November 10, 2024

How the Model Works

  • Evaluated on a subset of the SwiftKey dataset: 1,500,000 lines (~45 million words).
  • The model uses n-grams (Unigram to Fivegram) for predicting the next word:
    • It considers the last 1 to 4 words typed by the user.
    • Leverages frequency tables to identify the most likely next words.
    • Implements a backoff strategy: If no match is found with longer n-grams, it falls back to shorter n-grams.
  • Optimized for real-time performance:
    • Response times are less than 0.5 seconds, suitable for interactive use.
    • Displays the top 3 predictions along with their probabilities.

How the App Works

  1. Tokenization: The user’s input is split into individual tokens (words).
  2. N-gram Matching: The app searches the n-gram tables for the best match based on the most recent words.
  3. Backoff Strategy: If a higher-order n-gram match isn’t found, the model falls back to lower-order n-grams.
  4. Prediction Output: Displays the top predictions, sorted by frequency, with confidence levels.

Shiny App Demonstration

  • Visit the app: Shiny App
  • Key features:
    • Real-time, responsive predictions as you type.
    • Visual progress bars indicate the confidence of each prediction.
    • Example sentences allow users to explore the app’s capabilities.

Screenshot of the App