November 29, 2025

Next Word Predictor

Introduction

  • The goal of the capstone project is to create a predictive text model using a large text corpus of documents as training data.
  • Natural language processing techniques will be used to perform the analysis.
  • This document should be concise and explain only the major features of the data you have identified and briefly summarize your plans for creating the prediction algorithm and Shiny app.

Algorithm

The predictive algorithm will be based on an N-gram model.

  • A Stupid Backoff or similar smoothing technique will be implemented to handle cases where a specific N-gram sequence has never appeared in the training data (zero-frequency problem).
  • The final model will use a larger sample of the full corpus for better accuracy.

Shiny App

The final output will be a Shiny application designed for a seamless user experience:

  • Input: The user types a word or phrase into a text box.
  • Prediction: The application uses the N-gram model to predict the next 3 most likely words.
  • Output: These 3 words will appear as clickable buttons, allowing the user to select and append the next word quickly.

Conclusion

  • Create a basic N-gram model based on probabilities.
  • Create a shiny application which can predict the next word to be typed by the user.
  • Generate a final report to summarize all information about this project.



Thank you for your attention.