2026-06-28

Project Overview

  • Predicts the next word from a user-entered phrase.
  • Uses an N-gram language model.
  • Built using the SwiftKey dataset.
  • Deployed as a Shiny application.

Data Processing

  • Sampled blogs, news, and Twitter datasets.
  • Converted text to lowercase.
  • Removed punctuation, numbers, URLs, and extra spaces.
  • Built unigram, bigram, and trigram models.

Prediction Algorithm

  • Uses trigram lookup first.
  • Falls back to bigram lookup.
  • Falls back to the most common unigram.
  • Returns one predicted next word.

Shiny Application

Future Improvements

  • Support four-gram and five-gram models.
  • Improve prediction accuracy.
  • Add spelling correction.
  • Optimize memory usage.