2026-06-20

Overview

This project builds a next-word prediction model using natural language processing techniques.

Goals:

  • Uses n-gram modeling (trigrams)
  • Predicts the next word based on previous two words
  • Built and deployed as a Shiny web application

Data & Model

The model is trained on three datasets (Blogs, News, and Twitter) and processed into trigrams. A sample of the data was used to improve performance and reduce computation time

Steps:

  • Convert text to lowercase
  • Remove punctuation
  • Tokenize into words
  • Build trigram frequency table

Example:

  • Input: “I love data”
  • Model learns: “I love → data”

Prediction Function + App

The prediction function: - Cleans input text (lowercase, remove punctuation) - Splits input into words - Uses pattern matching to find matching trigrams

matches <- trigram_df[grepl(paste0("^", last_two), trigram_df$text), ]
predicted <- most_freq_word(matches)
  • Returns the most frequent next word

The Shiny app allows users to:

  1. Enter a phrase
  2. Click “Predict”
  3. View the next word instantly

App Features & Conclusion

The application includes:

  • Simple and clean user interface
  • Real-time prediction
  • Input validation (requires at least two words)
  • Fast response using precomputed n-grams

Key achievements:

  • Built a functional NLP prediction model
  • Demonstrated how n-grams predict language patterns
  • Delivered an interactive web application