SwiftKeyPrediction

Yang Li

Overview

This web application offers real-time next-word prediction using two distinct methodologies: NGram + Probabilities and NGram + Backoff. It operates in the following modes

Pre-trained (Default): By default, the app leverages high-performance models pre-trained on 15,000 random samples from the en_US.twitter.txt dataset.
Custom Training: By selecting “Do Training”, you can generate a custom model using a specific number of samples from the Twitter dataset. Please note that increasing the sample size improves accuracy but results in longer processing times.

How to use this application

Model Setup: Begin by training a new model or loading an existing one. Once complete, the “Predict” button will be enabled.
Model Selection: Choose between two available architectures: N-Gram + Probabilities or N-Gram + Backoff.
Training Modes: * If “Do Training” is disabled, the application will use a high-performance, pre-defined model.
- If you choose to train a custom model, please note that it will not be automatically saved after the session.
Generate Results: Click the “Predict” button to see your results.

Prediction with Pre-trained Model

Users select a preferred model before use. The application defaults to high-performance, pre-trained models generated from a corpus of 15,000 random samples from a Twitter dataset. The model is trained by me locally in my laptop and published together with the web application

Once a model is active, simply enter your text to receive the top three predicted words.

Prediction with training

Alternatively, users can select “Do Training” to custom-train either model type using a defined number of unique samples. Once the chosen model is active, simply type words into the interface; the application will process the input and display the top three most probable next-word predictions.

Note on Performance: Larger sample sizes require significantly more computational power and memory. Increasing this value will result in longer training and prediction times. (or cause no response in server)

Compare of 2 Models

The N-Gram Probabilities is like a specialist who only knows what they’ve seen. If you ask about something new, they give no result. The N-Gram Backoff is like a specialist who, when stumped, reverts to general knowledge to give you a “best guess” instead of a zero.

Screenshots (Load Model)

Load Model

Screenshots (Train Model)

Model Train

Screenshots (Words Prediction)

Word Prediction