The goal of this report is to provide a brief exploratory analysis of the SwiftKey training data. We examine the basic features of the three English data files: Blogs, News, and Twitter to prepare for building a word prediction algorithm.
The following table summarizes the dimensions and word counts for the English datasets.
| File_Source | Line_Count | Word_Count |
|---|---|---|
| Blogs | 899,288 | 37,334,131 |
| News | 1,010,242 | 34,372,533 |
| 2,360,148 | 30,373,543 |