October 3, 2025
Project Overview
- Data: HC Corpora (blogs, news, twitter).
- Approach: N-gram frequency analysis + backoff algorithm.
Algorithm & Implementation
- Data cleaning: lowercase, punctuation removal, stopwords, profanity filter.
- Tokenization and counting: unigrams, bigrams, trigrams.
- Backoff strategy: trigram → bigram → unigram as fallback.
Shiny App Features
- User-friendly interface for text input.
- Real-time predictions as you type.
- Top word suggestions based on n-gram model.
App Screenshot