Word Prediction App – Capstone Project

Overview

Goal: Predict the next word from a user-typed phrase
Built using Shiny for NLP interaction
Real-time prediction using N-gram models
Useful for mobile typing suggestions, chatbots, etc.

Data Source & Cleaning

Dataset: HC Corpora (Blogs, News, Twitter)
Steps:
- Removed punctuation, numbers, stopwords
- Converted to lowercase
- Tokenized into N-grams (uni-, bi-, tri-)
Sampled data (~5%) for efficiency

Prediction Algorithm

Built N-gram language models
Used Stupid Backoff strategy:
- Try trigram → if not found, backoff to bigram → then unigram
Fast, simple, works well with sparse data
Implemented in R with dplyr, stringr, tidytext

The Shiny App

Input: user types a phrase (≥1 word)
Output: app predicts and shows next word
Hosted at: https://shreyashk.shinyapps.io/project/
Lightweight UI, real-time results

Reflection

Works well for many common phrases
Could improve with:
- Smarter backoff (e.g. Kneser-Ney)
- Contextual models (e.g. RNN, transformers)
Great experience building full NLP pipeline!