Download DataSet

  • Download the Zip file from the link given in the assignment brief.Link
  • Unzip the zip file and three file namely en_US.blogs.txt, en_US.news.txt, and en_US.twitter.txt can be found in folder en_US.

download.file(‘https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip’,method = “auto”, quiet=FALSE)

Before starting the primary data exploration. Load the libraries.

Build Corpus

10 % of the total data of each file is used to build Corpus.

No of Lines No of words
89928 3800858
7725 269492
236014 3019330

Data Analysis

1-gram

2-gram

3-gram