December 23, 2025

Motivation

Why this product?

Typing on mobile devices is slow and error-prone.
Predictive text systems improve typing speed by suggesting the next likely word.

Goal:
Build a lightweight, fast, and accurate next-word prediction model using real-world text data.

Data & Modeling Approach

Training Data

  • Blogs
  • News articles
  • Twitter posts
    (SwiftKey corpus)

Model

  • N-gram language model (1–4 grams)
  • Text cleaning and normalization
  • Frequency-based probability estimation
  • Pruning to reduce model size

Prediction Strategy

Backoff Algorithm

The model predicts the next word using: 1. 4-gram match (highest priority) 2. 3-gram backoff 3. 2-gram backoff 4. Unigram fallback

Each level is weighted to balance accuracy and coverage.

This allows predictions even when word sequences are unseen.

Model Performance

Accuracy (held-out test set)

  • Top-1 accuracy: ~15–20%
  • Top-3 accuracy: ~30–40%

Efficiency

  • Pruned n-grams reduce memory usage
  • Predictions run in milliseconds
  • Suitable for real-time Shiny deployment

Shiny App Demonstration

How the app works

  • User enters text
  • Model predicts the top 3 next words
  • Results update instantly

🔗 Live App:

Summary

This project demonstrates how statistical language models can be used to build fast, interpretable predictive text systems suitable for real-world applications.