This project aims to develop a next-word prediction model based on large-scale English text datasets. Using n-gram language modeling, the application predicts the most likely next word given a user’s input phrase. The model leverages data from blogs, news articles, and Twitter feeds to capture diverse language patterns.
Due to resource constraints, the current implementation focuses on 2- to 4-gram models, balancing prediction accuracy with computational efficiency and deployment limitations. The final product is a Shiny web application that allows users to input phrases and receive real-time next-word predictions.
This project showcases natural language processing techniques, data cleaning, and predictive modeling, culminating in an interactive app designed for usability and scalability. Future work includes expanding the n-gram range and improving the prediction algorithm for enhanced performance.