Romeo and Juliet: a word cloud for each act

Noelia Oses
December 19th, 2015

This app displays a word cloud for the act of Shakespeare's Romeo and Juliet specified by the user.
The word cloud is important because it allows seeing the relative frequency of the words that appear in the act.
With this information the user can have an idea of which topics appear in the act and their relative importance.
When started, the app reads the text of the book from a file and finds where the acts begin:

booktext <<- readLines("pg1513.txt",encoding="UTF-8")
booktext <<- booktext[ which(booktext!="") ]
act_indeces <<- grep("ACT ",booktext)

The user has a numeric input widget to select the number of the act for which she wishes to see the word cloud.
For illustration purposes, we will define a variable to hold the act number:

actnumber <- 1

When the user selects an act the app performs the following steps:

– Transform the text of the act into a corpus.

– Transform all letters to lower case, remove punctuation, remove numbers, and remove the words “thy”, “thou”, “thee”, “the”, “and”, and “but”.

– Apply the 'TermDocumentMatrix' function to the corpus to construct a term-document matrix.

– Use the 'wordcloud' function to construct and render a word cloud of the words which have a minimum frequency of 5.

(See code in Github )

Word cloud for act 1 of Romeo and Juliet:

wordcloud(names(v), v, scale=c(5,0.2), min.freq = 5,colors=brewer.pal(8, "Dark2"))