Introduction

This is an initial exploratory analysis of the text datasets provided for the SwiftKey Capstone Project. The datasets contain English-language data from blogs, news articles, and Twitter posts. The goal is to build a text prediction algorithm and deploy it using a Shiny app.

Summary Statistics

##   Dataset Line_Count Word_Count
## 1   Blogs     899288   37546250
## 2    News    1010242   34762395
## 3 Twitter    2360148   30093413