Creating a wordcloud with R is easy.
But first, you will need a few packages:
- tm
- NLP
- wordcloud
and
- readr to read the data
April 26, 2019
Creating a wordcloud with R is easy.
But first, you will need a few packages:
and
Once you have those package installed, just load them.
library(tm) library(NLP) library(wordcloud) library(readr)
nms <- read_lines( "data/webdb_niagara_movement_speech.txt")
An important step is the text cleaning and transformation process:
# Load the text as corpus
nmsCorpus <- Corpus(VectorSource(nms))
# cleaning the text with tm_map
nmsCorpus <- tm_map(nmsCorpus, tolower)
nmsCorpus <- tm_map(nmsCorpus, removePunctuation)
nmsCorpus <- tm_map(nmsCorpus,removeWords,
stopwords('english'))
nmsCorpus <- tm_map(nmsCorpus, removeNumbers)
# Build a term document matrix
nms_dtm <- TermDocumentMatrix(nmsCorpus)
nms_dtm_matrix <- as.matrix(nms_dtm)
# Finding the word frequencies by adding the "1s"
# in the rows of the tdm
v <- sort(rowSums(nms_dtm_matrix), decreasing=TRUE)
# Turning the matrix into a dataframe
d <- data.frame(word = names(v),freq=v)
Now, we’re ready to create our wordcloud:
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 2,
max.words=200, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))
Wordcloud of W.E.B. Dubois’s Niagara Movement Speech
Let’s find the most frequent words:
findFreqTerms(nms_dtm, lowfreq = 4)
## [1] "black" "discrimination" "men" "simply" ## [5] "work" "manhood" "right" "white" ## [9] "will" "want" "race" "south" ## [13] "education" "john" "violence"
And the most common associations:
findAssocs(nms_dtm, terms = "work", corlimit = 0.65)
## $work ## actually afraid ask bread brethren ## 0.77 0.77 0.77 0.77 0.77 ## capital citizens coming daily decencies ## 0.77 0.77 0.77 0.77 0.77 ## defenders earning fifty flourished hard ## 0.77 0.77 0.77 0.77 0.77 ## hater hearing moment name nation’s ## 0.77 0.77 0.77 0.77 0.77 ## ordinary pausing progressed representatives retreated ## 0.77 0.77 0.77 0.77 0.77 ## spread stealing stolen thunder toil ## 0.77 0.77 0.77 0.77 0.77 ## travel turn weaker whispering year ## 0.77 0.77 0.77 0.77 0.77 ## year’s step ## 0.77 0.67
And create a barplot: