This tutorial is how to create a word cloud. For this example I used the words of the Katy Perry song Dark Horse, but any text file will do.

To set up this tutorial, you will need to install several packages:

install.packages("tm") # for text mining install.packages("SnowballC") # for text stemming install.packages("wordcloud") # word-cloud generator install.packages("RColorBrewer") # color palettes

library("tm") library("SnowballC") library("wordcloud") library("RColorBrewer")

From here we will set up our file pathway:

filePath <- "file:///C:/Users/Dakota/Documents/DarkHorse.txt"
text <- readLines(filePath)

From here we will load the data as a corpus, this will help us format the word cloud easier.

docs <- Corpus(VectorSource(text))

from here we will inspect the contents of the document.

inspect(docs)
## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 68
## 
##  [1] I knew you were                                       
##  [2] You were gonna come to me                             
##  [3] And here you are                                      
##  [4] But you better choose carefully                       
##  [5] 'Cause I'm capable of anything                        
##  [6] Of anything and everything                            
##  [7] Make me your Aphrodite                                
##  [8] Make me your one and only                             
##  [9] "But don't make me your enemy, your enemy, your enemy"
## [10] So you wanna play with magic                          
## [11] "Boy, you should know whatcha falling for"            
## [12] Baby do you dare to do this                           
## [13] 'Cause I'm coming atcha like a dark horse             
## [14] "Are you ready for, ready for"                        
## [15] "A perfect storm, a perfect storm"                    
## [16] "'Cause once you're mine, once you're mine"           
## [17] There's no going back                                 
## [18] Mark my words                                         
## [19] This love will make you levitate                      
## [20] Like a bird                                           
## [21] Like a bird without a cage                            
## [22] But down to earth                                     
## [23] "If you choose to walk away, don't walk away"         
## [24] It's in the palm of your hand now baby                
## [25] "It's a yes or no, no maybe"                          
## [26] So just be sure before you give it up to me           
## [27] "Up to me, give it up to me"                          
## [28] So you wanna play with magic                          
## [29] "Boy, you should know whatcha falling for"            
## [30] Baby do you dare to do this                           
## [31] 'Cause I'm coming atcha like a dark horse             
## [32] "Are you ready for, ready for"                        
## [33] "A perfect storm, a perfect storm"                    
## [34] "'Cause once you're mine, once you're mine"           
## [35] There's no going back                                 
## [36] She's a beast                                         
## [37] I call her Karma                                      
## [38] She eat your heart out                                
## [39] Like Jeffrey Dahmer                                   
## [40] Be careful                                            
## [41] Try not to lead her on                                
## [42] Shawty's heart was on steroids                        
## [43] 'Cause her love was so strong                         
## [44] You may fall in love                                  
## [45] When you meet her                                     
## [46] If you get the chance you better keep her             
## [47] She swears by it but if you break her heart           
## [48] She turn cold as a freezer                            
## [49] That fairy tale ending with a knight in shining armor 
## [50] She can be my Sleeping Beauty                         
## [51] I'm gon' put her in a coma                            
## [52] Woo! Damn I think I love her                          
## [53] Shawty so bad                                         
## [54] I'm sprung and I don't care                           
## [55] She got me like a roller coaster                      
## [56] Turn the bedroom into a fair                          
## [57] Her love is like a drug                               
## [58] I was tryna hit it and quit it                        
## [59] But lil' mama so dope                                 
## [60] I messed around and got addicted                      
## [61] So you wanna play with magic                          
## [62] "Boy, you should know whatcha falling for"            
## [63] Baby do you dare to do this                           
## [64] 'Cause I'm coming atcha like a dark horse             
## [65] "Are you ready for, ready for"                        
## [66] "A perfect storm, a perfect storm"                    
## [67] "'Cause once you're mine, once you're mine"           
## [68] There's no going back

We will not format the text to replace any characters with spaces, if there are numbers or other characters listed, similar commands can be added that do the same thing:

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
## Warning in tm_map.SimpleCorpus(docs, toSpace, "/"): transformation drops
## documents
docs <- tm_map(docs, toSpace, "@")
## Warning in tm_map.SimpleCorpus(docs, toSpace, "@"): transformation drops
## documents
docs <- tm_map(docs, toSpace, "\\|")
## Warning in tm_map.SimpleCorpus(docs, toSpace, "\\|"): transformation drops
## documents

From here we will clean the text. The tm_map() function will remove white space, and convert all the text to lowercase, and remove common stopwords for us.

# Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(docs, content_transformer(tolower)):
## transformation drops documents
# Remove numbers
docs <- tm_map(docs, removeNumbers)
## Warning in tm_map.SimpleCorpus(docs, removeNumbers): transformation drops
## documents
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(docs, removeWords, stopwords("english")):
## transformation drops documents
# Remove your own stop word
# specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("blabla1", "blabla2")) 
## Warning in tm_map.SimpleCorpus(docs, removeWords, c("blabla1", "blabla2")):
## transformation drops documents
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
## Warning in tm_map.SimpleCorpus(docs, removePunctuation): transformation drops
## documents
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
# Text stemming
docs <- tm_map(docs, stemDocument)
## Warning in tm_map.SimpleCorpus(docs, stemDocument): transformation drops
## documents

From here we will build a term-document matrix, this is a table that displayes the frequency of the words which will help us make those words most prominent in our word cloud.

dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)
##            word freq
## caus       caus    8
## like       like    8
## readi     readi    6
## perfect perfect    6
## storm     storm    6
## mine       mine    6
## love       love    5
## come       come    4
## make       make    4
## fall       fall    4

Lastly, we can finally make our word cloud: first change your initial margins to be all 1 and make the width and height large enough for the word cloud. Here are some of the functions that will be used:

words : the words to be plotted

freq : their frequencies

min.freq : words with frequency below min.freq will not be plotted

max.words : maximum number of words to be plotted

random.order : plot words in random order. If false, they will be plotted in decreasing frequency

rot.per : proportion words with 90 degree rotation (vertical text)

colors : color words from least to most frequent. Use, for example, colors ="black" for single color.

par(mar=c(1,1,1,1))
dev.new(width = 500, height = 500)
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
          max.words=100, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"))