2025-05-05

Slide 1: Project Overview

  • Project Title: Next Word Prediction App using N-gram Models
  • Author: Mayank Gaur
  • Course: Coursera Capstone Project (Data Science Specialization)
  • This 5-slide deck summarizes the algorithm used, application details, reflections, and the final deployment.

Slide 2: Algorithm Used

  • The prediction engine uses N-gram models:

    • Bigrams (2-word phrases)
    • Trigrams (3-word phrases)
    • Quadgrams (4-word phrases)
  • Built using cleaned corpora from Blogs, News, and Twitter.

  • Prediction logic uses Stupid Backoff Algorithm:

    • Tries to match the last 3 words to a quadgram.
    • Falls back to trigram and bigram if not found.
    • Picks the most frequent matching n-gram.
  • Preprocessing included:

    • Lowercasing
    • Removing punctuation, numbers, stopwords
    • Tokenization
  • How the N-Gram Model Works

  • Data Collection:

    • Brief Description: Gather a large corpus of text data from relevant sources. Preprocess the text by tokenizing and cleaning to ensure the data is in a suitable format for model training.
  • Model Training:

    • Brief Description: Build the n-gram model by analyzing the sequences of words in the training data. The model learns the probability of a word occurring given the preceding one, two, or three words.
  • Prediction:

    • Brief Description: Use the trained model to predict the next word in a sequence based on the previous one, two, or three words provided by the user.
  • Validation:

Brief Description: Evaluate the model’s performance using a separate validation dataset. Assess metrics such as accuracy, precision, and recall to ensure the model predicts words effectively.

Slide 3: App Description & Instructions

  • Built with the shiny package in R and deployed to shinyapps.io.
  • How to Use:
    1. Enter a partial sentence in the input box.
    2. Wait for the app to predict the next word.
    3. Suggested word appears below.
  • Example:
    • Input: "can wait" → Output: "see"
  • Features:
    • Real-time prediction
    • Clean and user-friendly UI
    • Efficient model loading using .rds files

Slide 4: User Experience & Novelty

  • User Experience:
    • Fast response time
    • Accurate and context-aware predictions
    • Minimalist, clean interface
  • Innovative Aspects:
    • Built from scratch using public text data
    • Efficient use of memory by converting data to .rds
    • Backoff approach for robustness
    • Combines NLP and Shiny into a deployable app

Slide 5: Reflections & Real-World Relevance

  • Key Takeaways:
    • End-to-end understanding of NLP: preprocessing → modeling → deployment
    • Learned to build scalable apps with Shiny
    • Gained real-world insight into how predictive typing apps function
  • Real-World Application:
    • Can be adapted to chatbots, messaging apps, predictive keyboards
    • Potential to extend with deep learning or grammar correction