This report provides a basic exploratory data analysis of the milestone datasets for the capstone project.
Below is a summary of the lines and words from the training datasets (Blogs, News, and Twitter).
# Sample Summary Table for Non-Data Scientist Manager
Dataset <- c("Blogs", "News", "Twitter")
Line_Count <- c(899288, 1010242, 2360148)
Word_Count <- c(37334131, 34372589, 30373583)
summary_table <- data.frame(Dataset, Line_Count, Word_Count)
print(summary_table)
## Dataset Line_Count Word_Count
## 1 Blogs 899288 37334131
## 2 News 1010242 34372589
## 3 Twitter 2360148 30373583