Angelo Klin
August/2015
Specialisation: Data Science
Course: SwiftKey Capstone Project
Education Institution: Johns Hopkins
Publisher: Coursera
The goal of the Coursera's Data Science Specialisation: SwiftKey Capstone Project, is to expose the students to a real life problem, where the overall scope is known, but not much more than a source dataset is given.
One of the purposes is to instigate the student to not only understand the problem, but in order to find a solution, search the best way to approach the problem, seek for alternatives, and even create something customised to solve the problem.
The original set of data was provided and comes from Blogs, News and a Tweeter feed.
After an initial cleanup the number of ngrams produced is show on the table.
Blogs | News | ||
---|---|---|---|
1 ngram | 145764 | 141822 | 138536 |
2 ngrams | 1766213 | 1828608 | 1335629 |
3 ngrams | 2862731 | 2833636 | 1893462 |
4 ngrams | 1898999 | 1931072 | 1258080 |