Synopsis

The report provides a short overview of the exploratory analysis of the text data for the Capstone project.

The motivation for this project is to: 1. Demonstrate that you’ve downloaded the data and have successfully loaded it in. 2. Create a basic report of summary statistics about the data sets. 3. Report any interesting findings that you amassed so far. 4. Get feedback on your plans for creating a prediction algorithm and Shiny app.

Data Loading and Analysis

path = 'https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip'
download.file(path,method = 'curl',destfile = 'swiftkey.zip')
#unzip
unzip('swiftkey.zip')

Basic Summary

The data consists of 3 sources: blogs, twitter, news.

library(stringr)

twitter= suppressWarnings(readLines('final/en_US/en_US.twitter.txt'))
news = suppressWarnings(readLines('final/en_US/en_US.news.txt'))
blog = suppressWarnings(readLines('final/en_US/en_US.blogs.txt'))

ltwitter = suppressWarnings(length(twitter))
lnews = suppressWarnings(length(news))
lblog = suppressWarnings(length(blog))

wtwitter = sum(str_count(twitter))
wnews = sum(str_count(news))
wblog = sum(str_count(blog))

summary = data.frame(file.name=c("en_US.blogs.txt","en_US.twitter.txt","en_US.news.txt"),
lines.count=c(lblog,ltwitter,lnews),
word.count=c(wblog,wtwitter,wnews))

summary
##           file.name lines.count word.count
## 1   en_US.blogs.txt      899288  206824257
## 2 en_US.twitter.txt     2360148  162095755
## 3    en_US.news.txt     1010242  203223153

Fraction of Data

I will use random sample of 1% for further analysis since the file is rather large.

set.seed(100)
blogs_sample = sample(blog, 0.01*length(blog), replace = FALSE)
news_sample = sample(news, 0.01*length(news), replace = FALSE)
twitter_sample = sample(twitter, 0.01*length(twitter), replace = FALSE)
sampled_data = c(blogs_sample, news_sample, twitter_sample)

Build corpus

We need to clean the data first. We will remove numbers, punctations, white space, and stopwords.

library(wordcloud)
## Loading required package: RColorBrewer
library(RColorBrewer)
library(NLP)
library(SnowballC)
library(tm)


sample_data = iconv(sampled_data, 'UTF-8', 'ASCII')
corpus = Corpus(VectorSource(as.data.frame(sample_data, stringsAsFactors = FALSE)))
corpus = corpus %>%
        tm_map(tolower) %>%
        tm_map(removePunctuation)%>%
        tm_map(removeNumbers)%>%
        tm_map(stripWhitespace)%>%
        tm_map(removeWords, stopwords("en"))%>%
        tm_map(PlainTextDocument)
## Warning in as.POSIXlt.POSIXct(Sys.time(), tz = "GMT"): unknown timezone
## 'zone/tz/2018c.1.0/zoneinfo/America/Los_Angeles'

Plots

You can also embed plots, for example:

## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : just could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : also could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : dont could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : media could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : back could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : check could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : welcome could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : going could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : understand could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : city could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : amazing could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : move could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : world could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : town could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : matter could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : afternoon could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : child could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : super could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : building could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : saturday could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : tickets could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : time could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : things could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : wouldnt could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : john could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : coach could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : spring could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : children could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : street could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : began could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : didnt could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : people could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : according could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : youre could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : program could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : excited could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : think could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : will could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : week could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : come could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : recently could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : wish could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : plans could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : moving could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : inside could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : history could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : two could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : call could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : working could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : used could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : shows could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : wanted could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : teams could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : please could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : side could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : education could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : yearold could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : perfect could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : wasnt could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : conference could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : even could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : long could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : head could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : pictures could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : able could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : awesome could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : friend could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : like could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : haha could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : team could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : strong could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : often could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : community could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : serious could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : life could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : miss could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : write could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : makes could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : event could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : door could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : away could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : sure could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : although could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : wont could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : really could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : ones could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : money could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : nice could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : try could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : water could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : added could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : days could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : total could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : group could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : line could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : pass could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : nearly could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : find could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : given could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : years could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : sports could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : agree could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : california could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : issues could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : men could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : keep could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : county could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : free could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : director could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : make could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : cleveland could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : loved could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : tweet could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : american could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : told could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : pay could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : students could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : drive could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : much could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : fire could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : show could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : looks could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : take could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : yeah could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : best could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : face could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : give could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : stay could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : instead could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : ever could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : special could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : series could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : love could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : half could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : states could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : always could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : reading could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : must could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : may could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : came could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : open could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : sometimes could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : eyes could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : good could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : okay could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : times could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : tonight could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : sense could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : talk could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : win could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : never could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : within could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : cute could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : starting could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : point could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : year could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : first could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : shes could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : follow could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : man could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : nothing could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : hit could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : jersey could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : hate could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : order could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : known could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : help could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : buy could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : become could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : former could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : lot could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : watch could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : major could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : wednesday could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : case could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : political could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : percent could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : thought could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : tried could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : say could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : actually could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : plan could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : well could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : knew could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : night could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : lots could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : wife could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : watching could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : available could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : plus could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : guy could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : death could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : theyre could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : sunday could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : couldnt could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : others could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : see could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : left could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : university could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : takes could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : monday could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : space could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : weeks could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : favorite could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : place could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : morning could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : share could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : everyone could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : room could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : big could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : thinking could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : needed could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : hope could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : ahead could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : jobs could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : ball could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : career could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : early could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : worth could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : company could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : right could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : hes could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : minutes could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : pretty could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : happen could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : center could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : hours could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : cup could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : waiting could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : police could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : system could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : sound could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : class could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : along could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : kids could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : experience could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : want could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : weve could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : turned could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : especially could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : might could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : april could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : course could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : thank could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : comes could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : seen could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : single could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : decided could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : done could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : ohio could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : started could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : young could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : asked could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : worked could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : york could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : taking could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : little could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : theres could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : change could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : already could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : learn could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : months could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : break could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : writing could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : taken could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : another could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : mean could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : fact could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : baby could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : chicago could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : following could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : friday could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : weekend could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : cool could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : isnt could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : meeting could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : something could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : higher could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : members could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : one could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : million could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : america could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : doesnt could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : public could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : possible could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : boys could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : around could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : gonna could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : fine could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : feeling could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : run could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : issue could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : mom could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : couple could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : card could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : stop could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : players could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : home could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : song could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : gone could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : support could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : box could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : remember could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : health could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : questions could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : record could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : power could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : sounds could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : know could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : parents could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : weather could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : dinner could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : office could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : someone could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : spend could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : totally could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : full could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : different could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : without could not be fit on page. It will not be plotted.
## Warning in wordcloud(dm$word, dm$freq, min.freq = 100, random.order =
## TRUE, : include could not be fit on page. It will not be plotted.

N-gram

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
## 
##     annotate
library(RWeka)

trigram <- NGramTokenizer(corpus, Weka_control(min = 3, max = 3))
trigram.df <- data.frame(table(trigram))
trigram.df <- trigram.df[order(trigram.df$Freq, decreasing = TRUE),]

ggplot(trigram.df[1:25,], aes(x=trigram, y=Freq)) +
  geom_bar(stat="Identity", fill="blue")+
  xlab("Trigrams") + ylab("Frequency")+
  ggtitle("Most common 25 Trigrams") +
  theme(axis.text.x=element_text(angle=90, hjust=1))

Summary

Based on the exploratory analysis performed, the strategy would be to utilize a frequency lookup using ngram models. However, further cleansing of data and reduction of data is needed since n-grams takes a long time even when down sampling to 1%. Caching should help speeding the process up.

Recap: 1. Import Data 2. Cleanse Data 3. Using n-grams model 4. Plot the n-gram model

To improve: 1. Better sample size 2. More data cleansing 3. Determine test and train dataset for later models.