Slide 1: Introduction

Goal of the Project

  • Objective: Build a Shiny app that predicts the next word based on a given phrase.
  • Key Features:
    • Takes a phrase as input (multiple words).
    • Predicts the next word using a bigram model.
    • Displays the predicted next word as the output.

Slide 2: Overview of the Bigram Model

What is a Bigram Model?

  • A bigram is a sequence of two adjacent words in a text.
  • The model predicts the next word based on the previous word.
  • We use the probability of the next word given the current word in the text.

Model Training

  • Tokenized the text into words.
  • Created bigrams (pairs of consecutive words).
  • Calculated the frequency and probability of the next word based on previous words.

Slide 3: Shiny App Workflow

Workflow

  1. User Input: The user enters a phrase (multiple words).
  2. Preprocessing: The app extracts the last word from the input.
  3. Prediction: Using the bigram model, the app predicts the most probable next word.
  4. Output: The predicted next word is displayed to the user.

Slide 4: Why This Algorithm?

Why is this Algorithm Effective?

  • Simple and Efficient: The bigram model is a simple and efficient way to predict the next word based on the most recent word.
  • Real-Time Prediction: The Shiny app provides real-time prediction, offering an interactive user experience.
  • Scalable: This approach can be extended to more complex models like trigrams or even neural networks.

Slide 5: Future Improvements & Conclusion

Future Improvements

  • Multiple Predictions: Expand the app to show the top N predicted words.
  • Better Model: Consider using a trigram or neural network model for better predictions.
  • User Customization: Allow users to provide custom training data for personalized predictions.

Conclusion

  • The Next Word Prediction App offers an interactive and user-friendly interface for predicting the next word in a given phrase.
  • It leverages a simple bigram model to provide useful, real-time predictions for users.