2026-05-04

problem

  • predict the next word a user is typing
  • used in mobile keyboards
  • goal: improve typing speed

used data

  • blogs dataset
  • news dataset
  • twitter dataest
  • Preprocessing steps: - Lowercasing text - Removing numbers - Removing punctuation - Removing extra whitespace - Sampling 1% of data for efficiency

    model apprach

    Built using n-gram language model - Unigrams (single words) - Bigrams (2-word sequences) - Trigrams (3-word sequences) Prediction logic: - Try trigram first - If not found → fallback to bigram - If still not found → default prediction

    shiny application

    • User inputs a phrase in a text box
    • App predicts next word in real time
    • Built using Shiny (R web framework) Features:
    • Simple UI
    • Fast prediction
    • Handles unknown inputs gracefully

    conclusion

    • Successfully built predictive text model
    • Implements backoff n-gram strategy
    • Forms basis for smart keyboard systems Can be improved with:
    • larger dataset
    • smoothing techniques
    • deep learning models