2025-07-27

Introduction

  • This project is the final Capstone for the Coursera Data Science Specialization.
  • Objective: Build a Shiny app that predicts the next word based on user input.
  • Based on SwiftKey NLP problem using real-world datasets (blogs, news, Twitter).
  • The app is available at: https://9rks8u-shashank-r.shinyapps.io/finalproject/

Model Building

  • Created N-grams: unigrams, bigrams, trigrams.
  • Used tokenizers, tidytext, and data.table for speed.
  • Stored frequency tables for prediction.
  • Example:
    • Input: “I love”
    • Trigram match → “you”

Prediction Algorithm & Shiny App

  • Backoff Strategy:
    1. Try trigram match.
    2. If not found, fallback to bigram.
    3. If still not found, return top unigram.
  • Deployed as Shiny app on shinyapps.io.
  • Input box predicts next word instantly.

Conclusion