Capstone Project: Next Word Prediction App
Problem and Goal
Typing on mobile devices can be slow and error-prone.
Our goal is to build an app that predicts the next word given a phrase to improve typing speed and accuracy.
Data and Preprocessing
- Data source: SwiftKey dataset (Blogs, News, Twitter)
- Cleaning steps:
- Lowercasing, removing punctuation and numbers
- Removing stopwords and profanity
- Tokenization and stemming
Prediction Algorithm
We use a simple backoff n-gram model: - Try to match last 3 words → trigram - If not found, use last 2 → bigram - If still not found, use last word → unigram
Summary and Future Work
- Shows potential of n-gram modeling
- Future: personalization, LSTM, BERT