November 10, 2024
How the Model Works
- Evaluated on a subset of the SwiftKey dataset: 1,500,000 lines (~45 million words).
- The model uses n-grams (Unigram to Fivegram) for predicting the next word:
- It considers the last 1 to 4 words typed by the user.
- Leverages frequency tables to identify the most likely next words.
- Implements a backoff strategy: If no match is found with longer n-grams, it falls back to shorter n-grams.
- Optimized for real-time performance:
- Response times are less than 0.5 seconds, suitable for interactive use.
- Displays the top 3 predictions along with their probabilities.
How the App Works
- Tokenization: The user’s input is split into individual tokens (words).
- N-gram Matching: The app searches the n-gram tables for the best match based on the most recent words.
- Backoff Strategy: If a higher-order n-gram match isn’t found, the model falls back to lower-order n-grams.
- Prediction Output: Displays the top predictions, sorted by frequency, with confidence levels.
Shiny App Demonstration
- Visit the app: Shiny App
- Key features:
- Real-time, responsive predictions as you type.
- Visual progress bars indicate the confidence of each prediction.
- Example sentences allow users to explore the app’s capabilities.
Screenshot of the App
