Next Word Prediction App

Rene Persau
2025-11-30

Slide 1: Introduction

Next Word Prediction App

Capstone Project - Data Science Specialization

Problem Statement: Build a predictive text application that suggests the next word as users type, similar to smartphone keyboard suggestions.

Solution: An n-gram language model trained on millions of words from blogs, news articles, and Twitter data.

Key Features:

  • Real-time word prediction
  • Interactive Shiny web application
  • Trained on diverse text sources
  • Fast and responsive user experience

Slide 2: Algorithm Overview

Prediction Algorithm

N-gram Model with Backoff Strategy

The algorithm uses a hierarchical backoff approach:

  1. 4-grams: Match last 3 words → highest priority
  2. 3-grams: Match last 2 words → medium priority (40% weight)
  3. 2-grams: Match last 1 word → lower priority (20% weight)
  4. Unigrams: Most common words → fallback

How It Works:

Input: "I want to"
→ Check 4-grams starting with "i want to"
→ If found, return most frequent next word
→ Otherwise, check 3-grams starting with "want to"
→ Continue down the hierarchy

Scoring: Predictions are ranked by frequency in training data, with longer n-grams receiving higher weights.

Slide 3: Data and Model Training

Training Data

Corpus Statistics:

  • Blogs: Millions of words from blog posts
  • News: Formal text from news articles
  • Twitter: Informal messages and tweets

Model Building Process:

  1. Text Cleaning: Remove URLs, punctuation, convert to lowercase
  2. Tokenization: Split text into words
  3. N-gram Extraction: Generate 1-4 word sequences
  4. Frequency Counting: Count occurrences of each n-gram
  5. Model Storage: Save as efficient RDS file for fast loading

Performance:

  • Model size: Optimized for fast loading
  • Prediction speed: < 1 second for most inputs
  • Coverage: Handles common English phrases effectively

Slide 4: Shiny App Features

Application Interface

User Experience:

Input:

  • Text area for typing phrases
  • Submit button or automatic prediction
  • Real-time updates as you type

Output:

  • Top Prediction: Most likely next word highlighted
  • Top 5 Suggestions: Ranked list with confidence scores
  • Algorithm Explanation: How predictions are made

How to Use:

  1. Type a phrase in the text box (e.g., “I want to”)
  2. Click “Predict Next Word” or wait for automatic prediction
  3. View the top prediction and alternative suggestions
  4. Continue typing to see updated predictions

Try it: The app is deployed on shinyapps.io and ready to use!

Slide 5: Conclusion and Future Work

Summary

What We Built:

Working Prediction Algorithm: N-gram model with intelligent backoff
Interactive Shiny App: User-friendly web interface
Fast Performance: Optimized for real-time predictions
Robust Training: Diverse corpus from multiple sources

Potential Improvements:

  • Smoothing Techniques: Better handling of rare words
  • Context Awareness: Consider sentence structure
  • Personalization: Learn from user's typing patterns
  • Mobile Optimization: Enhanced mobile experience

Business Value:

This technology can be applied to:

  • Keyboard Apps: Smart text suggestions
  • Search Engines: Query completion
  • Writing Assistants: Content generation tools
  • Accessibility: Faster typing for users with disabilities

Thank you for your attention!