WordCloud for Social and Web analytics

How to Make a Wordcloud Using R

Recently I have been using R for some basic data visualisations, outputs like word clouds and heat maps. I don’t have a programming background so upon first look the R command line based environment can seem a little daunting. However, the ease at which I have been able to create some pretty amazing outputs with very little code has surprised me. In this blog I will attempt to share the steps in a simple process as well as the small amount of code that is needed.

RStudio + Packages First of all, you will need to install RStudio. The program gives the user a nice interface to operate within. The code can be typed in the window to the top left of the program, useful particularly if you want to save your code as a script. The code can be sent to the command line from there, or you can simply start typing the code into the Console.

# Install required packages
#install.packages(c("tm", "wordcloud","SnowballC"))

# Load libraries
library(tm)

## Loading required package: NLP

library(wordcloud)

## Loading required package: RColorBrewer

library(SnowballC)

Load the Text This is the point where you load the text with which you would like to create your word cloud with. For this example I am using mr. president’s speech from some summit.

Create a new folder e.g. ~/Desktop/test/ containing a speech.txt file.

# Create a corpus variable
mooncloud <- Corpus(DirSource("C:/Users/112-user/Desktop/test/"))

# Make sure it has loaded properly - have a look!
#inspect(mooncloud)

Format and Clean the Text These commands will remove various things like punctuations and english words you aren’t particularly interested in for the cloud like conjunctions. Additionally it will format the case of the text, I am going to go with lowercase, however you can run various combinations of these arguments including arguments not listed here.

(Replace start)

# Strip unnecessary whitespace
mooncloud <- tm_map(mooncloud, stripWhitespace)
# Convert to lowercase
mooncloud <- tm_map(mooncloud, tolower)
# Remove conjunctions etc.
mooncloud <- tm_map(mooncloud, removeWords, stopwords("english"))
# Remove suffixes to the common 'stem'
mooncloud <- tm_map(mooncloud, stemDocument)
# Remove commas etc.
mooncloud <- tm_map(mooncloud, removePunctuation)

#(optional) arguments of 'tm' are converting the document to something other than text, to avoid, run this line
mooncloud <- tm_map(mooncloud, PlainTextDocument)

(Replace end)

Word Cloud Time! Time to produce a word cloud, run the following command and watch RStudio populate the ‘Plots’ window to the right of the console.

# Time to generate a wordcloud!
wordcloud(mooncloud
        , scale=c(5,0.5)     # Set min and max scale
        , max.words=100      # Set top n words
        , random.order=FALSE # Words in decreasing freq
        , rot.per=0.35       # % of vertical words
        , use.r.layout=FALSE # Use C++ collision detection
        , colors=brewer.pal(8, "Dark2"))

In addition: If you want to analyse texts in Azerbaijan, just replace above code between Replace from and end with following script.

az_stopwords <- c("bir", "və", "ki", "bu", "ilə", "üçün", "da","də","öz","ancaq","hər") # Add more Azerbaijani stopwords

# Define a custom function to remove Azerbaijani stopwords
removeAzStopwords <- function(text) {
  text <- tolower(text)  # Convert text to lowercase for case insensitivity
  words <- unlist(strsplit(text, " "))  # Tokenize the text into words
  words <- words[!words %in% az_stopwords]  # Remove Azerbaijani stopwords
  cleaned_text <- paste(words, collapse = " ")  # Recreate the cleaned text
  return(cleaned_text)
}

# Apply the custom function to your Corpus
mooncloud <- tm_map(mooncloud, content_transformer(removeAzStopwords))

Source: https://lukesingham.com/how-to-make-a-word-cloud-using-r/

WordCloud for Social and Web analytics

Tural Naghi

2023-09-14

How to Make a Wordcloud Using R