Introduction

This report provides a basic exploratory data analysis of the milestone datasets for the capstone project.

Data Summary

Below is a summary of the lines and words from the training datasets (Blogs, News, and Twitter).

# Sample Summary Table for Non-Data Scientist Manager
Dataset <- c("Blogs", "News", "Twitter")
Line_Count <- c(899288, 1010242, 2360148)
Word_Count <- c(37334131, 34372589, 30373583)

summary_table <- data.frame(Dataset, Line_Count, Word_Count)
print(summary_table)
##   Dataset Line_Count Word_Count
## 1   Blogs     899288   37334131
## 2    News    1010242   34372589
## 3 Twitter    2360148   30373583