This report presents an exploratory analysis of the SwiftKey training dataset. The data consists of text collected from blogs, news articles, and Twitter posts. The objective is to understand the structure of the datasets and prepare for building a predictive text model.
The datasets analyzed are:
| Dataset | Lines | Words |
|---|---|---|
| Blogs | 1000 | 41890 |
| News | 1000 | 33489 |
| 1000 | 12782 |
The histogram below illustrates the distribution of blog line lengths.
The next phase of the project will focus on:
The exploratory analysis confirms that the SwiftKey datasets provide a strong foundation for building a predictive text application.
The next phase of the project will focus on:
The exploratory analysis confirms that the SwiftKey datasets provide a strong foundation for building a predictive text application.