Week 2 : Exploratory Data Analysis

Milestone Summary

The goal of this task was to familiarize myself with the content of the three datasets provided for this task. Each file was uploaded in R and I first ran some summary statistics to get a general understanding of their size and shape:

I then cleaned the data and conducted analysis on aggregate word count, 2-gram count and 3-gram count in each data set.

Twitter Word Analysis

Blogs Word Analysis

News Word Analysis

Twitter N-gram Analysis

Blogs N-gram Analysis

News N-gram Analysis

All N-Gram

Assessment of the Source Data

When reviewing the outcomes of the EDA process there were some consistent themes:

Items for Consideration