Predict the Next Word from a Given String Collaboration between Johns Hopkins University and SwiftKey Objective: Build a functioning predictive text model Data: HC Corpora (English only)
Sampled 1,000,000 lines from Twitter, Blogs, and News datasets Cleaned data by: Removing non-ASCII characters (emojis) Converting to lowercase Removing contractions, punctuation, numbers, profanities, extra whitespaces Tokenized data to create MLE n-grams (up to 6-grams)
Built Maximum Likelihood Estimation (MLE) matrices Used Back-off model for prediction Output: Top 3 predicted words for user input Accuracy enhanced by showing multiple predictions rather than 1
Hosted at: Capstone Prediction App Features: Clickable predicted words to append to input Instant predictions UI preview: