Text Mining method allows us to highlight the most frequently used keywords in a paragraph of texts. After we finish mining the text, our output can be in forms of word cloud (aka, text cloud or tag cloud), Word Frequency Plot, or Term Association.

Advantages of using word clouds are its simplicity in communicating qualitative findings in terms of words, as well as producing the could itself, which is visually engaging than tables.

library("tm")              # for text mining
library("SnowballC")       # for text stemming
library("wordcloud")       # word-cloud generator
library("RColorBrewer")    # color palettes

#Loading the text----

To make a word-cloud, the text file that we will derive our text from should be in plain-text format (.txt). In this practice, we will use the I have a dream speech from Martin Luther King.

The text will be loaded using Corpus() from the tm package. Corpus is a list of documents that we will mine the text from.

To import a text file that is saved locally in your computer, use text <- readLines(file.choose()). You will be asked to choose the text file interactively.

However, we will load a .txt file directly from the STHDA website in this practice.

#Read the text file from internet
filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt"

#Assign the file as a text object
text <- readLines(filePath)

Next, we load the data as a corpus. VectorSource() function creates a corpus of character vectors.

docs <- Corpus(VectorSource(text))

We can also inspect the text file line by line.

inspect(docs)  
## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 46
## 
##  [1]                                                                                                                                                                                                                                                                                                                                                                                                               
##  [2] And so even though we face the difficulties of today and tomorrow, I still have a dream. It is a dream deeply rooted in the American dream.                                                                                                                                                                                                                                                                   
##  [3]                                                                                                                                                                                                                                                                                                                                                                                                               
##  [4] I have a dream that one day this nation will rise up and live out the true meaning of its creed:                                                                                                                                                                                                                                                                                                              
##  [5]                                                                                                                                                                                                                                                                                                                                                                                                               
##  [6] We hold these truths to be self-evident, that all men are created equal.                                                                                                                                                                                                                                                                                                                                      
##  [7]                                                                                                                                                                                                                                                                                                                                                                                                               
##  [8] I have a dream that one day on the red hills of Georgia, the sons of former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood.                                                                                                                                                                                                                         
##  [9]                                                                                                                                                                                                                                                                                                                                                                                                               
## [10] I have a dream that one day even the state of Mississippi, a state sweltering with the heat of injustice, sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice.                                                                                                                                                                                                   
## [11]                                                                                                                                                                                                                                                                                                                                                                                                               
## [12] I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character.                                                                                                                                                                                                                                     
## [13]                                                                                                                                                                                                                                                                                                                                                                                                               
## [14] I have a dream today!                                                                                                                                                                                                                                                                                                                                                                                         
## [15]                                                                                                                                                                                                                                                                                                                                                                                                               
## [16] I have a dream that one day, down in Alabama, with its vicious racists, with its governor having his lips dripping with the words of interposition and nullification, one day right there in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers.                                                                             
## [17]                                                                                                                                                                                                                                                                                                                                                                                                               
## [18] I have a dream today!                                                                                                                                                                                                                                                                                                                                                                                         
## [19]                                                                                                                                                                                                                                                                                                                                                                                                               
## [20] I have a dream that one day every valley shall be exalted, and every hill and mountain shall be made low, the rough places will be made plain, and the crooked places will be made straight; and the glory of the Lord shall be revealed and all flesh shall see it together.                                                                                                                                 
## [21]                                                                                                                                                                                                                                                                                                                                                                                                               
## [22] This is our hope, and this is the faith that I go back to the South with.                                                                                                                                                                                                                                                                                                                                     
## [23]                                                                                                                                                                                                                                                                                                                                                                                                               
## [24] With this faith, we will be able to hew out of the mountain of despair a stone of hope. With this faith, we will be able to transform the jangling discords of our nation into a beautiful symphony of brotherhood. With this faith, we will be able to work together, to pray together, to struggle together, to go to jail together, to stand up for freedom together, knowing that we will be free one day.
## [25]                                                                                                                                                                                                                                                                                                                                                                                                               
## [26] And this will be the day, this will be the day when all of God s children will be able to sing with new meaning:                                                                                                                                                                                                                                                                                              
## [27]                                                                                                                                                                                                                                                                                                                                                                                                               
## [28] My country  tis of thee, sweet land of liberty, of thee I sing.                                                                                                                                                                                                                                                                                                                                               
## [29] Land where my fathers died, land of the Pilgrim s pride,                                                                                                                                                                                                                                                                                                                                                      
## [30] From every mountainside, let freedom ring!                                                                                                                                                                                                                                                                                                                                                                    
## [31] And if America is to be a great nation, this must become true.                                                                                                                                                                                                                                                                                                                                                
## [32] And so let freedom ring from the prodigious hilltops of New Hampshire.                                                                                                                                                                                                                                                                                                                                        
## [33] Let freedom ring from the mighty mountains of New York.                                                                                                                                                                                                                                                                                                                                                       
## [34] Let freedom ring from the heightening Alleghenies of Pennsylvania.                                                                                                                                                                                                                                                                                                                                            
## [35] Let freedom ring from the snow-capped Rockies of Colorado.                                                                                                                                                                                                                                                                                                                                                    
## [36] Let freedom ring from the curvaceous slopes of California.                                                                                                                                                                                                                                                                                                                                                    
## [37]                                                                                                                                                                                                                                                                                                                                                                                                               
## [38] But not only that:                                                                                                                                                                                                                                                                                                                                                                                            
## [39] Let freedom ring from Stone Mountain of Georgia.                                                                                                                                                                                                                                                                                                                                                              
## [40] Let freedom ring from Lookout Mountain of Tennessee.                                                                                                                                                                                                                                                                                                                                                          
## [41] Let freedom ring from every hill and molehill of Mississippi.                                                                                                                                                                                                                                                                                                                                                 
## [42] From every mountainside, let freedom ring.                                                                                                                                                                                                                                                                                                                                                                    
## [43] And when this happens, when we allow freedom ring, when we let it ring from every village and every hamlet, from every state and every city, we will be able to speed up that day when all of God s children, black men and white men, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the old Negro spiritual:                                             
## [44] Free at last! Free at last!                                                                                                                                                                                                                                                                                                                                                                                   
## [45]                                                                                                                                                                                                                                                                                                                                                                                                               
## [46] Thank God Almighty, we are free at last!

#Text Transformation----

Transformation is performed using tm_map() function to replace, for example, special characters from the text. We will replace “/”, “@” and “|” with space.

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, toSpace, "\\|")

#Cleaning the text----

Using tm_map(), we can remove unnecessary white space, to convert the text to lower case, remove common stopwords like the, we, and many more actions in preparing out text file.

The reason we are removing stopwords is that it contains near zero value for analysis due to its common usage. You could also remove numbers and punctuation with removeNumbers and removePunctuation arguments.

Another important preprocessing step is to make a text stemming, which reduces words to their root form. In other words, this process removes suffixes from words to make it simple and to get the common origin. For example, a stemming process reduces the words “moving”, “moved” and “movement” to the root word, “move”.

However, we won’t be using text stemming in this practice to retain its meaning.

# Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))

# Remove numbers
docs <- tm_map(docs, removeNumbers)

# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))

# Remove your own stop word
# specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("blabla1", "blabla2")) 

# Remove punctuations
docs <- tm_map(docs, removePunctuation)

# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)

# Text stemming
#docs <- tm_map(docs, stemDocument)

#Build a term-document matrix----

Document matrix is a table containing the frequency of the words. Column names are words and row names are documents. The function TermDocumentMatrix() from text mining package can be used as follow:

dtm <- TermDocumentMatrix(docs) #Create a term document matrix summary.

m <- as.matrix(dtm) #Unpack the summary into a matrix.

v <- sort(rowSums(m),decreasing=TRUE) #Count the words.

d <- data.frame(word = names(v),freq=v) #convert the count into a data frame.

head(d, 10) #Display top 10 words in terms of frequency.
##              word freq
## will         will   17
## freedom   freedom   13
## ring         ring   12
## dream       dream   11
## day           day   11
## let           let   11
## every       every    9
## one           one    8
## able         able    8
## together together    7

#Generate the Word cloud----

From the term document matrix, we can generate a word cloud based on term frequency.

set.seed(1234) #For replicability
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
          max.words=200, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"))

Arguments of the word cloud generator function are:
* words : the words to be plotted
* freq : their frequencies
* min.freq : words with frequency below min.freq will not be plotted
* max.words : maximum number of words to be plotted
* random.order : plot words in random order. If false, they will be plotted in decreasing frequency
* rot.per : proportion words with 90 degree rotation (vertical text)
* colors : color words from least to most frequent. Use, for example, colors =“black” for single color.


#Word association----  

You can have a look at the frequent terms in the term-document matrix as follow. In the example below we want to find words that occur at least four times:

findFreqTerms(dtm, lowfreq = 4)
##  [1] "dream"    "day"      "nation"   "one"      "will"     "able"    
##  [7] "together" "freedom"  "every"    "mountain" "shall"    "faith"   
## [13] "free"     "let"      "ring"

You can analyze the association between frequent terms (i.e., terms which correlate) using findAssocs() function. The R code below identifies which words are associated with “freedom” in I have a dream speech :

findAssocs(dtm, terms = "freedom", corlimit = 0.3)
## $freedom
##          let         ring  mississippi        stone mountainside        state 
##         0.89         0.86         0.34         0.34         0.34         0.32 
##        every     mountain 
##         0.32         0.32

#Word Frequency Plot----  

We can also plot a bar graph in word frequencies as follows:

barplot(d[1:10,]$freq, las = 2, names.arg = d[1:10,]$word,
        col ="lightblue", main ="Most frequent words",
        ylab = "Word frequencies")