2026-05-28

Project Overview

Data Science Capstone Final Project

  • Course: Data Science Specialization Track-Data Science Capstone Project
  • Provider: Coursera @ Johns Hopkins University
  • Goal: Build an interactive web app for next-word prediction
  • Technology: Shiny (R) + Distributed N-gram Language Model
  • Purpose: Predict the next word in a user-typed phrase

Algorithm Description

Distributed N-gram Language Model with Back-off Logic

Component Description
Quadgram Matches sequences of 4 words for highest accuracy
Trigram Fallback when quadgram finds no match
Bigram Secondary fallback for shorter sequences
Back-off Recursively searches n-grams until match found
Profanity Filter Automatically excludes non-safe words

Training Data: Twitter, Blogs, and News corpora

How it works: If no quadgram match exists, the model recursively searches trigrams, then bigrams (back-off logic)

App Description & Functionality

What the App Does

  • Type a phrase → N-gram model suggests what comes next
  • Displays user input with live analysis
  • Shows 1-3 recommended word suggestions (configurable)
  • Uses quadgram back-off for higher accuracy

How It Functions

  1. User enters text in the Text Input area
  2. Selects Suggestion Count (1-3 words) using slider
  3. Clicks Predict Next Word button
  4. App processes input through N-gram model
  5. Top predictions displayed as blue “chip” buttons with loading spinner

User Experience & Instructions