1. Introduction

This report explores the SwiftKey dataset to prepare for building a next-word prediction model for my MSc Business Analytics project.

2. Basic Statistics

Below are the line counts for the original dataset files.

##   File_Source Line_Counts
## 1       Blogs      899288
## 2        News     1010242
## 3     Twitter     2360148