2025-07-18

Introduction

  • The project aims to build a predictive text model that suggests the next word in a sentence.
  • Built using N-gram language modeling and deployed via ShinyApp.io.
  • Data pre-processing and model logic are implemented in R.
  • Goal: Create an interactive product demonstrating NLP capabilities.

Algorithm & Dataset

  • Approach: Trigram-based model.
  • Training Data:
    • Small sample corpus from common phrases (for demonstration).
    • Cleaned using tolower(), tokenized using tokenizers::tokenize_ngrams().
  • Model Logic:
    • For a given bigram (last 2 words), predict the most probable next word.
    • Frequency-based prediction using dplyr::count().

Shiny App Overview

  • User Input: A phrase with 2+ words.
  • Output: Prediction of the next word.
  • Features:
    • Text input box
    • Instant next-word prediction on submit
    • Lightweight interface
  • Built with shiny and deployed on shinyapps.io.

Demo & Instructions

  • App URL: https://mpbehera93.shinyapps.io/nextwordpredictor
  • Instructions: Type any phrase like: How are, I am, Let's go. Press -> Submit. ->The app will return the most likely next word.
  • Conclusion & Future Scope
  • ✅ Demonstrates the power of N-gram NLP in R.
  • ✅ Fully deployed app on the web.
  • 📈 Future Work:
    • Integrate larger datasets (e.g., Twitter, blogs).
    • Improve accuracy with Stupid Backoff or Kneser-Ney Smoothing.
  • Thank you!