October 3, 2025

Project Overview

  • Data: HC Corpora (blogs, news, twitter).
  • Approach: N-gram frequency analysis + backoff algorithm.

Algorithm & Implementation

  • Data cleaning: lowercase, punctuation removal, stopwords, profanity filter.
  • Tokenization and counting: unigrams, bigrams, trigrams.
  • Backoff strategy: trigram → bigram → unigram as fallback.

Shiny App Features

  • User-friendly interface for text input.
  • Real-time predictions as you type.
  • Top word suggestions based on n-gram model.

App Screenshot