2025-03-05
Introduction
- Project Overview:
A Next Word Prediction App built with R and Shiny built for the Coursera Data Science specialization capstone project.
- Data Sources:
Combined text from blogs, news, and Twitter.
- Core Approach:
Uses n‑gram models (unigram, bigram, trigram, quadgram) with Laplace smoothing.
Algorithm Overview
- Data Processing:
- Text is cleaned, tokenized, and used to build n‑gram frequency tables.
- Prediction Method:
- The app employs Laplace smoothing to calculate probabilities.
- It computes the Shannon entropy to gauge uncertainty.
- Key Innovation:
- Integrating entropy-based measures provides insight into prediction confidence.
App Functionality & Instructions
- User Interface:
- Simple text input for entering a phrase.
- A “Predict Next Word” button.
- Output:
- The app returns the predicted word and the entropy (as a confidence metric).
- Usage:
- Enter a phrase (e.g., “The Violin Face”).
- Click the prediction button.
- View the next word prediction and entropy level.
- Deployment:
Conclusion
- Summary:
- A cutting-edge NLP tool combining proven techniques with innovative uncertainty metrics.
- Impact:
- Offers actionable insights for end-users and potential enterprise applications.
- Pitch:
- The project reflects the conclusion of the Data Science Specialization and advanced knowledge of R.
- Next Steps:
- Further refinement and scaling