June 2025

Slide 1: Introduction

Slide 2: The Algorithm

  • N-gram Model: 3-grams with 2-gram back-off
  • Data Processing:
    • Used full text (~0.6MB) from Sherlock Holmes
    • Cleaned: lowercase, removed punctuation/numbers, Gutenberg header/footer
    • Robust handling of empty lines and corpus issues
  • Why It Works: Fast, lightweight, suited for narrative text

Slide 3: App Functionality

Slide 4: User Experience

  • Ease of Use: Type phrase, click, see prediction
  • Testing: Predicted next word for 5 phrases:
    • “it was a” → “very”
    • “he said” → “that”
    • “the door” → “was”
    • “i have” → “a”
    • “in the” → “room”
  • Feedback: Intuitive, reliable for narrative-style phrases

Slide 5: Why Hire Me?

  • Novelty: Compact N-gram model optimized for small datasets
  • Skills: R, Shiny, NLP, data preprocessing
  • Impact: Ready for integration into real-world applications
  • Hire Me: I bring technical expertise and innovation to your data science startup!