2025-03-05

Introduction

  • Project Overview:
    A Next Word Prediction App built with R and Shiny built for the Coursera Data Science specialization capstone project.
  • Data Sources:
    Combined text from blogs, news, and Twitter.
  • Core Approach:
    Uses n‑gram models (unigram, bigram, trigram, quadgram) with Laplace smoothing.

Algorithm Overview

  • Data Processing:
    • Text is cleaned, tokenized, and used to build n‑gram frequency tables.
  • Prediction Method:
    • The app employs Laplace smoothing to calculate probabilities.
    • It computes the Shannon entropy to gauge uncertainty.
  • Key Innovation:
    • Integrating entropy-based measures provides insight into prediction confidence.

App Functionality & Instructions

  • User Interface:
    • Simple text input for entering a phrase.
    • A “Predict Next Word” button.
  • Output:
    • The app returns the predicted word and the entropy (as a confidence metric).
  • Usage:
    1. Enter a phrase (e.g., “The Violin Face”).
    2. Click the prediction button.
    3. View the next word prediction and entropy level.
  • Deployment:

Conclusion

  • Summary:
    • A cutting-edge NLP tool combining proven techniques with innovative uncertainty metrics.
  • Impact:
    • Offers actionable insights for end-users and potential enterprise applications.
  • Pitch:
    • The project reflects the conclusion of the Data Science Specialization and advanced knowledge of R.
  • Next Steps:
    • Further refinement and scaling