SwiftKeyPrediction

Yang Li

Overview

This web application offers real-time next-word prediction using two distinct methodologies: NGram + Probabilities and NGram + Backoff. It operates in the following modes

  • Pre-trained (Default): By default, the app leverages high-performance models pre-trained on 15,000 random samples from the en_US.twitter.txt dataset.

  • Custom Training: By selecting “Do Training”, you can generate a custom model using a specific number of samples from the Twitter dataset. Please note that increasing the sample size improves accuracy but results in longer processing times.

How to use this application

  1. Model Setup: Begin by training a new model or loading an existing one. Once complete, the “Predict” button will be enabled.

  2. Model Selection: Choose between two available architectures: N-Gram + Probabilities or N-Gram + Backoff.

  3. Training Modes: * If “Do Training” is disabled, the application will use a high-performance, pre-defined model.

    • If you choose to train a custom model, please note that it will not be automatically saved after the session.
  4. Generate Results: Click the “Predict” button to see your results.

Prediction with Pre-trained Model

Users select a preferred model before use. The application defaults to high-performance, pre-trained models generated from a corpus of 15,000 random samples from a Twitter dataset. The model is trained by me locally in my laptop and published together with the web application

Once a model is active, simply enter your text to receive the top three predicted words.

Prediction with training

Alternatively, users can select “Do Training” to custom-train either model type using a defined number of unique samples. Once the chosen model is active, simply type words into the interface; the application will process the input and display the top three most probable next-word predictions.

Note on Performance: Larger sample sizes require significantly more computational power and memory. Increasing this value will result in longer training and prediction times. (or cause no response in server)

Compare of 2 Models

The N-Gram Probabilities is like a specialist who only knows what they’ve seen. If you ask about something new, they give no result. The N-Gram Backoff is like a specialist who, when stumped, reverts to general knowledge to give you a “best guess” instead of a zero.

Screenshots (Load Model)

Load Model

Screenshots (Train Model)

Model Train

Screenshots (Words Prediction)

Word Prediction