It’s easier to make a word cloud if you first save the document of interest as a plain text file. One way to create a plain text version of a file is to paste the text into a new TextEdit (Mac) or Notepad (PC) document and save as a plain text (.txt) file. Make sure to save the file in your working directory.
Check the working directory in R with getwd().
getwd()
## [1] "/Users/mwk/r/tutorials"
Set the working directory with setwd().
setwd("/Users/mwk/r/tutorials")
And then check again to make sure everything worked.
getwd()
## [1] "/Users/mwk/r/tutorials"
I wanted to make a word cloud out of my thesis, which is a Word document (.docx), so I selected ‘Save As’ and Microsoft Word provides a ‘plain text’ option. I named my file ‘thetext.txt’.
Once the text is saved as a plain text file, read the file in R using the readLines() function.
words <- readLines("thetext.txt")
Load necessary packages. In this case, I’m using ‘tm’ to filter unwanted words/numbers from my wordcloud and ‘wordcloud’ to create the actual word cloud.
# install.packages(c('tm', 'wordcloud')
library(tm) # for filtering unwanted words/numbers
library(wordcloud)
Since my thesis included repetitive values (e.g., .05) and years (2009, 2010, 2011, etc.), I decided to filter all numbers from my thesis.
words <- removeNumbers(words)
If I wanted to remove specific words, there’s also a removeWords() function.
words <- removeWords(words, c("word1","word2","word3"))
The wordcloud() function includes a variety of options. In the example below, I specified the following: - scale=c(7.5,0.5) sets font-size range from 7.5 to .5 - max.words=100 limits the maximum number of words to 100 - min.freq=5 excludes words that appear fewer than 5 times - random.order=FALSE arranges words according to frequency - rot.per=0.40 means 40% of words will appear vertically
wordcloud(words,
scale=c(5.5,0.5),
max.words=100,
min.freq=5,
random.order=FALSE,
rot.per=0.40)
Colors can be customized as well. You can select favorite colors or premade palettes. For my word cloud, I decided to use a function that mimics colors used in ggplot2. I also decided to add a few colors to the “Dark2” palette (with up to 8 colors) from brewer.pal().
ggColors <- function(n) {
hues = seq(15, 375, length=n+1)
hcl(h=hues, l=65, c=100)[1:n]
}
gg.cols <- ggColors(8)
bp.cols<- c("light blue","cornflowerblue", "coral2", brewer.pal(8,"Dark2"))
You can run it over and over with colors=cols and random.color=TRUE to get different versions. I saved a couple of my favorites below.
wordcloud(words,
scale=c(5.5,0.5),
max.words=100,
min.freq=5,
random.order=FALSE,
rot.per=0.40,
use.r.layout=FALSE,
random.color=TRUE,
colors=gg.cols)
wordcloud(words,
scale=c(5.5,0.5),
max.words=100,
min.freq=5,
random.order=FALSE,
rot.per=0.40,
use.r.layout=FALSE,
random.color=TRUE,
colors=bp.cols)
And that’s it!