Executive Summary

This report summarizes exploratory analyses of the SwiftKey English text data sets: blogs, news, and Twitter. We focus on line counts, word counts, and line lengths, highlighting key features of the data. These insights will guide the development of a text prediction algorithm and Shiny app.


Basic Summaries

Number of lines per dataset
Dataset Lines
Blogs 1000
News 1000
Twitter 1000
Maximum characters per line in each dataset
Dataset MaxChars
Blogs 1912
News 982
Twitter 140

Histograms

Key Observations

Plans for Prediction Algorithm and Shiny App