2026-03-07

Overview

Text Algorithm

A Shiny web application where users input a word or sentence, and algorithm predicts the next 3 most likely words.

  • Trained on blogs, news, and Twitter data
  • Data was cleaned to create a corpora (e.g., converted to lower case, remove punctuation, profanity, and white spaces)
  • Built a n-gram language model
  • Uses pre-computed frequency tables for speed

Goal: provide fast and accurate text prediction similar to smartphone keyboards.

Prediction Algorithm

The model uses a Backoff N-Gram Algorithm. Here is how it works:

  1. N-grams are generated: Bigrams, Trigrams, Quadgrams

  2. Prediction logic:

    • Match quadgram
    • If none → backoff to trigram
    • If none → backoff to bigram
    • If none → return most common starting words
  3. Top 3 predicted words are outputted

User Experience

Key strengths:

  • Clean UI: Simple and User-Friendly
  • Fast Predictions
  • Practical NLP Application

Future Improvements:

  • Use a larger training dataset to make more contextual predictions
  • Be able to tackle more advanced language models

Give it a Try!