Scott Jacobs
12/31/16
I've built a proof of concept text prediction engine based on a large data set from SwiftKey. I've deployed it in a web app to show you how easy and useful it can be to prototype data products.
Key Takeaways;
Firstly, for any natural language processing project some amount of data preparation is needed. For this project a specific routine was created to meet the objectives of the project. It is reuseable, but also adaptable for future projects.
Secondly, using tidytext and tidyverse we can build robust pipelines for processing clean text into n gram frequency tables.
Using a holdout set for testing, random samples were taken and a prediction was generated.
Performance can be evaluated in terms of accuracy as well as speed and storage. While the predictive model does not posses great accuracy (~15%), it is fast and lightweight.
Most importantly, it met our objectives for this proof of concept.