Problem & Goal

  • Predict the next word given a short phrase
  • Useful for smart keyboards and text completion
  • Goal: build a fast and simple prediction product

Data

  • English text from:
    • News articles
    • Blogs
    • Twitter posts
  • Preprocessing:
    • Lowercasing
    • Removing punctuation and URLs
    • Tokenization

Prediction Algorithm

  • N-gram language model (2-gram, 3-gram, 4-gram)
  • Backoff strategy:
    • 4-gram → 3-gram → 2-gram → unigram
  • Frequency-based lookup for fast prediction

Shiny Application

  • Text input for entering a phrase
  • Submit button to trigger prediction
  • Outputs a single predicted word
  • Designed for fast response time

User Experience & Value

  • Simple and intuitive interface
  • Always returns a prediction
  • Lightweight and fast
  • Future improvements:
    • Multiple predictions
    • Profanity filtering
    • Smoothing techniques