2025-11-24

Slide 1 — Overview & Purpose

A Simple, Fast, and Effective Next-Word Prediction App

This project delivers: - A Shiny application that predicts the next word from any English phrase. - A lightweight NLP model built from the HC Corpora (SwiftKey dataset). - A fast, deployable solution suitable for mobile keyboards and chat apps.

Goal: Build a working data product demonstrating NLP, modeling, and deployment skills.

Slide 2 — Algorithm Behind the App

Prediction Approach

Primary Algorithm: Stupid Backoff (SB) - Uses 4-gram → 3-gram → 2-gram → unigram backoff hierarchy. - Each backoff step uses weight α = 0.4. - Extremely fast and memory-efficient.

Enhancement (optional): Modified Kneser–Ney (MKN) - Pre-computed discounting for more accurate probability estimates. - Used when available; app falls back to SB to maintain speed.

Model Preparation - Cleaned corpus (lowercase, no URLs, ASCII only). - Tokenized. - Built unigram–4-gram tables. - Saved as .fst for rapid loading.

Slide 3 — The Shiny App & How to Use It

How the App Works

  1. User enters a phrase in the text box.
  2. App extracts last 1–3 tokens.
  3. Looks up the highest-probability continuation (quad → tri → bi → uni).
  4. Displays:
    • Top next word
    • Top-K table of alternatives
    • Model used (SB or MKN)

Instructions for Users

  • Type any English phrase.
  • Click Predict.
  • The prediction appears instantly.
  • Example buttons help demonstrate typical results.

The link provided leads to a functional 5-slide deck on RPubs.

Slide 4 — User Experience & Performance

User Experience

  • The app responds instantly on typical inputs.
  • Simple and intuitive interface.
  • Clear predictions and top-K alternatives.
  • Reliable on news, blogs, and conversational English.

Model Performance (Typical on Sample Set)

  • Top-1 accuracy: ~X%
  • Top-3 accuracy: ~Y%
  • Median latency: < 100 ms
  • Model size: small enough for cloud hosting

Experience Summary:
Smooth, fast, and interactive — exactly what a text prediction tool should feel like.

Slide 5 — Novelty, Quality, and Hiring Perspective

What Makes This App Stand Out?

  • Efficient design (fst tables, lazy loading).
  • Predictive logic modeled after mobile keyboard engines.
  • Clean code, strong engineering choices, deployable architecture.
  • Demonstrates end-to-end data product creation:
    • Data cleaning
    • Modeling
    • Evaluation
    • App development
    • Cloud deployment

Would You Hire This Person?

Yes — this project demonstrates: - Strong understanding of NLP modeling. - Ability to build functional, deployed applications. - Capability to combine data science + software engineering. - Clear communication through this slide deck.