Word Cloud

Benjamin Smith
March 18, 2018

Introduction

The application is located at https://misterliver52.shinyapps.io/WordCloud2/.

  • Uses excerpted text from selected poets.
  • Is based on user selection of individual poet.
  • Allows user to choose minimum word frequency
  • Allows user to choose maximum words displayed.

Data Sets

The word cloud application is based on excerpts from selected 19th Century English-language poetic works by:

  • Henry Wordsworth Longfellow
  • William Blake
  • Samuel Taylor Coleridge
  • Emily Dickinson

All the works were sourced from Project Gutenberg https://www.gutenberg.org, and are referenced in the server.R file in the application.

Word Cloud Function

The subject text is selected based on user selection and passed to a corpus, trims common words (a, the, an, etc.), and places in a matrix.

library(tm);library(wordcloud);library(memoise)
#text <- readLines(sprintf("./%s.txt.gz", poet), encoding="UTF-8")
 text <- "The quick brown fox jumps over the lazy dog."       
myCorpus = Corpus(VectorSource(text))
myCorpus = tm_map(myCorpus, removeWords, c(stopwords("SMART"), "thy", "the"))
myDTM = TermDocumentMatrix(myCorpus,control = list(minWordLength = 1))
print(myDTM)
<<TermDocumentMatrix (terms: 7, documents: 1)>>
Non-/sparse entries: 7/0
Sparsity           : 0%
Maximal term length: 5
Weighting          : term frequency (tf)

Word Cloud UI

  • Select author and click Change button
  • Select Minimum Word Frequency
  • Select Maximum Number of Words in cloud