2026-01-24

Motivation

  • Mobile typing benefits greatly from word prediction
  • Predicting the next word improves speed and usability
  • This project builds a lightweight text prediction engine
  • Based on real-world text from blogs, news, and Twitter

Data & Exploration

  • Source: SwiftKey dataset
    • Blogs
    • News
    • Twitter
  • Data challenges:
    • Large file sizes
    • Noisy text
    • Inconsistent grammar
  • Solution:
    • Sampling
    • Cleaning
    • Tokenization

Modeling Approach

  • Use N-gram language models
    • Unigram
    • Bigram
    • Trigram
  • Strategy:
    • Predict next word using highest-order match
    • Backoff to lower-order n-grams if needed
  • Balance between:
    • Accuracy
    • Speed
    • Memory usage

Application Design

  • Built using Shiny
  • User types a phrase
  • App predicts the next most likely word
  • Features:
    • Fast response
    • Clean interface
    • Lightweight model

Conclusion & Next Steps

  • Successfully demonstrated feasibility of prediction
  • N-gram models are effective and interpretable
  • Future improvements:
    • Smoothing techniques
    • Larger training samples
    • Smarter backoff logic