Problem & Goal
- Predict the next word given a short phrase
- Useful for smart keyboards and text completion
- Goal: build a fast and simple prediction product
Data
- English text from:
- News articles
- Blogs
- Twitter posts
- Preprocessing:
- Lowercasing
- Removing punctuation and URLs
- Tokenization
Prediction Algorithm
- N-gram language model (2-gram, 3-gram, 4-gram)
- Backoff strategy:
- 4-gram → 3-gram → 2-gram → unigram
- Frequency-based lookup for fast prediction
Shiny Application
- Text input for entering a phrase
- Submit button to trigger prediction
- Outputs a single predicted word
- Designed for fast response time
User Experience & Value
- Simple and intuitive interface
- Always returns a prediction
- Lightweight and fast
- Future improvements:
- Multiple predictions
- Profanity filtering
- Smoothing techniques