This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more of my R tutorials visit http://mikewk.com/statistics.

Contents

1. Prepping Original Text

It’s easier to make a word cloud if you first save the document of interest as a plain text file. One way to create a plain text version of a file is to paste the text into a new TextEdit (Mac) or Notepad (PC) document and save as a plain text (.txt) file. Make sure to save the file in your working directory.

Check the working directory in R with getwd().

getwd()
## [1] "/Users/mwk/r/tutorials"

Set the working directory with setwd().

setwd("/Users/mwk/r/tutorials")

And then check again to make sure everything worked.

getwd()
## [1] "/Users/mwk/r/tutorials"

I wanted to make a word cloud out of my thesis, which is a Word document (.docx), so I selected ‘Save As’ and Microsoft Word provides a ‘plain text’ option. I named my file ‘thetext.txt’.

Once the text is saved as a plain text file, read the file in R using the readLines() function.

words <- readLines("thetext.txt")

2. Loading Packages

Load necessary packages. In this case, I’m using ‘tm’ to filter unwanted words/numbers from my wordcloud and ‘wordcloud’ to create the actual word cloud.

# install.packages(c('tm', 'wordcloud')
library(tm) # for filtering unwanted words/numbers
library(wordcloud)

3. Removing Certain Numbers and Words

Since my thesis included repetitive values (e.g., .05) and years (2009, 2010, 2011, etc.), I decided to filter all numbers from my thesis.

words <- removeNumbers(words)

If I wanted to remove specific words, there’s also a removeWords() function.

words <- removeWords(words, c("word1","word2","word3"))

4. Creating the Word Cloud

The wordcloud() function includes a variety of options. In the example below, I specified the following: - scale=c(7.5,0.5) sets font-size range from 7.5 to .5 - max.words=100 limits the maximum number of words to 100 - min.freq=5 excludes words that appear fewer than 5 times - random.order=FALSE arranges words according to frequency - rot.per=0.40 means 40% of words will appear vertically

wordcloud(words, 
          scale=c(5.5,0.5), 
          max.words=100, 
          min.freq=5, 
          random.order=FALSE, 
          rot.per=0.40)

Colors can be customized as well. You can select favorite colors or premade palettes. For my word cloud, I decided to use a function that mimics colors used in ggplot2. I also decided to add a few colors to the “Dark2” palette (with up to 8 colors) from brewer.pal().

ggColors <- function(n) {
  hues = seq(15, 375, length=n+1)
  hcl(h=hues, l=65, c=100)[1:n]
}
gg.cols <- ggColors(8)
bp.cols<- c("light blue","cornflowerblue", "coral2", brewer.pal(8,"Dark2"))

You can run it over and over with colors=cols and random.color=TRUE to get different versions. I saved a couple of my favorites below.

wordcloud(words, 
          scale=c(5.5,0.5), 
          max.words=100, 
          min.freq=5, 
          random.order=FALSE, 
          rot.per=0.40, 
          use.r.layout=FALSE, 
          random.color=TRUE, 
          colors=gg.cols)

wordcloud(words, 
          scale=c(5.5,0.5), 
          max.words=100, 
          min.freq=5, 
          random.order=FALSE, 
          rot.per=0.40, 
          use.r.layout=FALSE, 
          random.color=TRUE, 
          colors=bp.cols)

And that’s it!