Romeo and Juliet: a word cloud for each act

Noelia Oses
December 19th, 2015

The app

  • This app displays a word cloud for the act of Shakespeare's Romeo and Juliet specified by the user.
  • The word cloud is important because it allows seeing the relative frequency of the words that appear in the act.
  • With this information the user can have an idea of which topics appear in the act and their relative importance.
  • When started, the app reads the text of the book from a file and finds where the acts begin:
booktext <<- readLines("pg1513.txt",encoding="UTF-8")
booktext <<- booktext[ which(booktext!="") ]
act_indeces <<- grep("ACT ",booktext)

User input

  • The user has a numeric input widget to select the number of the act for which she wishes to see the word cloud.
  • For illustration purposes, we will define a variable to hold the act number:
actnumber <- 1

Calculations

When the user selects an act the app performs the following steps:

– Transform the text of the act into a corpus.

– Transform all letters to lower case, remove punctuation, remove numbers, and remove the words “thy”, “thou”, “thee”, “the”, “and”, and “but”.

– Apply the 'TermDocumentMatrix' function to the corpus to construct a term-document matrix.

– Use the 'wordcloud' function to construct and render a word cloud of the words which have a minimum frequency of 5.

(See code in Github )

Output

Word cloud for act 1 of Romeo and Juliet:

wordcloud(names(v), v, scale=c(5,0.2), min.freq = 5,colors=brewer.pal(8, "Dark2"))

plot of chunk unnamed-chunk-4