Slide 1: Problem Statement

  • Predict the next word based on user input
  • Useful for autocomplete systems
  • Built using R and NLP techniques

Slide 2: Data Used

  • Blogs dataset
  • News dataset
  • Twitter dataset
  • Combined and sampled for training

Slide 3: Approach

  • Text preprocessing (lowercase, clean text)
  • Tokenization into unigrams, bigrams, trigrams
  • Frequency-based n-gram model
  • Backoff strategy used for prediction

Slide 4: Model Logic

  • Trigram match → highest priority
  • Else Bigram match
  • Else Unigram fallback
  • Final output = most frequent next word

Slide 5: Deployment

  • Built interactive Shiny app
  • Hosted on shinyapps.io
  • Real-time prediction interface