2026-01-24
Motivation
- Mobile typing benefits greatly from word prediction
- Predicting the next word improves speed and usability
- This project builds a lightweight text prediction engine
- Based on real-world text from blogs, news, and Twitter
Data & Exploration
- Source: SwiftKey dataset
- Data challenges:
- Large file sizes
- Noisy text
- Inconsistent grammar
- Solution:
- Sampling
- Cleaning
- Tokenization
Modeling Approach
- Use N-gram language models
- Strategy:
- Predict next word using highest-order match
- Backoff to lower-order n-grams if needed
- Balance between:
- Accuracy
- Speed
- Memory usage
Application Design
- Built using Shiny
- User types a phrase
- App predicts the next most likely word
- Features:
- Fast response
- Clean interface
- Lightweight model
Conclusion & Next Steps
- Successfully demonstrated feasibility of prediction
- N-gram models are effective and interpretable
- Future improvements:
- Smoothing techniques
- Larger training samples
- Smarter backoff logic