Rohit Padebettu
26-December-2016
A Coursera-SwiftKey Final Capstone Project
GOAL
DATASET
MODELING CHALLENGES
stringi, ngram,tm,quanteda to build prediction modelDESIGN CHALLENGES
Algorithm
In sample Accuracy
| Ngram | Accuracy |
|---|---|
| Bi-gram | 0.157 |
| Tri-gram | 0.350 |
| Quad-gram | 0.610 |
Data Description
| File | Size(MB) | Length | Words |
|---|---|---|---|
| en_US.blogs.txt | 200.4242 | 899288 | 37334131 |
| en_US.news.txt | 196.2775 | 1010242 | 34372530 |
| en_US.twitter.txt | 159.3641 | 2360148 | 30373583 |
Smarter & Smaller dictionaries
Faster Search & Match logic
User Adaptive Predictions
More Useful Predictions
Contextual Predictions
Environment driven Speed vs Accuracy tradeoff