Christine Monnier
April 26, 2019
Creating a wordcloud with R is easy.
But first, you will need a few packages:
and
Once you have those package installed, just load them.
An important step is the text cleaning and transformation process:
# Load the text as corpus
nmsCorpus <- Corpus(VectorSource(nms))
# cleaning the text with tm_map
nmsCorpus <- tm_map(nmsCorpus, tolower)
nmsCorpus <- tm_map(nmsCorpus, removePunctuation)
nmsCorpus <- tm_map(nmsCorpus,removeWords,
stopwords('english'))
nmsCorpus <- tm_map(nmsCorpus, removeNumbers)
# Build a term document matrix
nms_dtm <- TermDocumentMatrix(nmsCorpus)
nms_dtm_matrix <- as.matrix(nms_dtm)
# Finding the word frequencies by adding the "1s" in the rows of the tdm
v <- sort(rowSums(nms_dtm_matrix), decreasing=TRUE)
# Turning the matrix into a dataframe
d <- data.frame(word = names(v),freq=v)Now, we’re ready to create our wordcloud:
Now let’s take a look:
Wordcloud of W.E.B. Dubois’s Niagara Movement Speech
Let’s find the most frequent words:
## [1] "black" "discrimination" "men" "simply"
## [5] "work" "manhood" "right" "white"
## [9] "will" "want" "race" "south"
## [13] "education" "john" "violence"
And the most common associations:
## $work
## actually afraid ask bread brethren
## 0.77 0.77 0.77 0.77 0.77
## capital citizens coming daily decencies
## 0.77 0.77 0.77 0.77 0.77
## defenders earning fifty flourished hard
## 0.77 0.77 0.77 0.77 0.77
## hater hearing moment name nation’s
## 0.77 0.77 0.77 0.77 0.77
## ordinary pausing progressed representatives retreated
## 0.77 0.77 0.77 0.77 0.77
## spread stealing stolen thunder toil
## 0.77 0.77 0.77 0.77 0.77
## travel turn weaker whispering year
## 0.77 0.77 0.77 0.77 0.77
## year’s step
## 0.77 0.67
And create a barplot: