Next Word Prediction App using N-gram Models in R

2025-05-05

Slide 1: Project Overview

Project Title: Next Word Prediction App using N-gram Models
Author: Mayank Gaur
Course: Coursera Capstone Project (Data Science Specialization)
This 5-slide deck summarizes the algorithm used, application details, reflections, and the final deployment.

The prediction engine uses N-gram models:
- Bigrams (2-word phrases)
- Trigrams (3-word phrases)
- Quadgrams (4-word phrases)
Built using cleaned corpora from Blogs, News, and Twitter.
Prediction logic uses Stupid Backoff Algorithm:
- Tries to match the last 3 words to a quadgram.
- Falls back to trigram and bigram if not found.
- Picks the most frequent matching n-gram.
Preprocessing included:
- Lowercasing
- Removing punctuation, numbers, stopwords
- Tokenization
How the N-Gram Model Works
Data Collection:
- Brief Description: Gather a large corpus of text data from relevant sources. Preprocess the text by tokenizing and cleaning to ensure the data is in a suitable format for model training.
Model Training:
- Brief Description: Build the n-gram model by analyzing the sequences of words in the training data. The model learns the probability of a word occurring given the preceding one, two, or three words.
Prediction:
- Brief Description: Use the trained model to predict the next word in a sequence based on the previous one, two, or three words provided by the user.
Validation:

Built with the shiny package in R and deployed to shinyapps.io.
How to Use:
1. Enter a partial sentence in the input box.
2. Wait for the app to predict the next word.
3. Suggested word appears below.
Example:
- Input: "can wait" → Output: "see"
Features:
- Real-time prediction
- Clean and user-friendly UI
- Efficient model loading using .rds files

User Experience:
- Fast response time
- Accurate and context-aware predictions
- Minimalist, clean interface
Innovative Aspects:
- Built from scratch using public text data
- Efficient use of memory by converting data to .rds
- Backoff approach for robustness
- Combines NLP and Shiny into a deployable app

Key Takeaways:
- End-to-end understanding of NLP: preprocessing → modeling → deployment
- Learned to build scalable apps with Shiny
- Gained real-world insight into how predictive typing apps function
Real-World Application:
- Can be adapted to chatbots, messaging apps, predictive keyboards
- Potential to extend with deep learning or grammar correction