Next Word Prediction App

2025-03-05

Introduction

Project Overview:
A Next Word Prediction App built with R and Shiny built for the Coursera Data Science specialization capstone project.
Data Sources:
Combined text from blogs, news, and Twitter.
Core Approach:
Uses n‑gram models (unigram, bigram, trigram, quadgram) with Laplace smoothing.

Algorithm Overview

Data Processing:
- Text is cleaned, tokenized, and used to build n‑gram frequency tables.
Prediction Method:
- The app employs Laplace smoothing to calculate probabilities.
- It computes the Shannon entropy to gauge uncertainty.
Key Innovation:
- Integrating entropy-based measures provides insight into prediction confidence.

App Functionality & Instructions

User Interface:
- Simple text input for entering a phrase.
- A “Predict Next Word” button.
Output:
- The app returns the predicted word and the entropy (as a confidence metric).
Usage:
1. Enter a phrase (e.g., “The Violin Face”).
2. Click the prediction button.
3. View the next word prediction and entropy level.
Deployment:
- Fully deployed on https://florezalberto.shinyapps.io/prod/

Conclusion

Summary:
- A cutting-edge NLP tool combining proven techniques with innovative uncertainty metrics.
Impact:
- Offers actionable insights for end-users and potential enterprise applications.
Pitch:
- The project reflects the conclusion of the Data Science Specialization and advanced knowledge of R.
Next Steps:
- Further refinement and scaling