LaKeya King
2026-06-19
This project builds a predictive text model using natural language processing techniques.
Goal: Predict the next word given a user-entered phrase using real-world text data.
The model is trained on three datasets: - Blogs - News - Twitter
A sample of the data was used to improve performance and reduce computation time.
The model uses n-grams: - Bigrams (2-word sequences) - Trigrams (3-word sequences)
Each n-gram is counted and ranked by frequency.
Predictions are based on the most frequent matching phrase.
The prediction function: - Cleans input text (lowercase, remove punctuation) - Splits input into words - Uses pattern matching to find matching n-grams
matches <- trigram_df[grepl(paste0("^", last_two), trigram_df$text), ]
predicted <- most_freq_word(matches)The Shiny app allows users to: 1. Enter a phrase 2. Click “Predict” 3. View the next word instantly