Capstone Final Project Pitch

1. Project Overview

Final project for the Johns Hopkins Data Science Capstone
Goal: Predict the next word from a phrase using NLP
Data: English blogs, news, and Twitter (HC Corpora)
App deployed at:
https://wishyouagoodfuture.shinyapps.io/capstone_shinyapp/

2. Modeling Approach

Preprocessing:
- Sampled ~1% from each source
- Converted to lowercase, removed punctuation/whitespace
Modeling:
- Constructed unigram, bigram, trigram tables
- Selected top 5000 TF-IDF words as vocabulary
- Used backoff model: trigram → bigram → top tf-idf

3. Prediction Function

Example outputs from our model:

Input Phrase	Predicted Word
I’d give anything to see arctic	monkeys
When you breathe I want to be	air
Talking to your mom has the same	effect
I like how the same people are in Adam Sandler’s	movies

Predictions made using frequency + context match
Efficient for real-time use in Shiny

4. Shiny App Summary

Simple UI: phrase input + predict button
Fast prediction from preloaded .rds models
Link:
https://wishyouagoodfuture.shinyapps.io/capstone_shinyapp/

Try phrases like: - Let me know if you - She said she was going to

5. Highlights & Conclusion

✅ Fast + lightweight model
✅ TF-IDF restricts vocabulary to high-signal words
✅ Handles unseen inputs via backoff fallback
✅ Shiny app loads fast and is interactive
✅ Deployed successfully and peer-review ready

Thanks for reviewing!