2025-07-18
Introduction
- The project aims to build a predictive text model that suggests the next word in a sentence.
- Built using N-gram language modeling and deployed via ShinyApp.io.
- Data pre-processing and model logic are implemented in R.
- Goal: Create an interactive product demonstrating NLP capabilities.
Algorithm & Dataset
- Approach: Trigram-based model.
- Training Data:
- Small sample corpus from common phrases (for demonstration).
- Cleaned using
tolower(), tokenized using tokenizers::tokenize_ngrams().
- Model Logic:
- For a given bigram (last 2 words), predict the most probable next word.
- Frequency-based prediction using
dplyr::count().
Shiny App Overview
- User Input: A phrase with 2+ words.
- Output: Prediction of the next word.
- Features:
- Text input box
- Instant next-word prediction on submit
- Lightweight interface
- Built with
shiny and deployed on shinyapps.io.
Demo & Instructions
- App URL: https://mpbehera93.shinyapps.io/nextwordpredictor
- Instructions: Type any phrase like:
How are, I am, Let's go. Press -> Submit. ->The app will return the most likely next word.
- Conclusion & Future Scope
- ✅ Demonstrates the power of N-gram NLP in R.
- ✅ Fully deployed app on the web.
- 📈 Future Work:
- Integrate larger datasets (e.g., Twitter, blogs).
- Improve accuracy with Stupid Backoff or Kneser-Ney Smoothing.
- Thank you!