This application was created for the Developing Data Products Coursera class
Ever wondered which words appear more frequently than others in your favorite text? Here's an app for that!
The app outputs two neat plots that show word frequencies
You can adjust how many words are plotted and even take out those common "filler" words
The second plot outputs a "Word Cloud", created with the wordcloud package
This is a graphical representation of relative word frequencies
The most common words are largest and appear closer to the center
Words are also colored according to their frequencies
The clean.word.count function does the heavy lifting for the application
It starts with making sure that all https links are transformed into http
Then it reads the page in and takes out all numbers and punctuation
The words are finally then checked against a dictionary before being counted
The next slide has an example of a code snippet
text <- c("Th1is. I!s A?n EXA,,.MPLE 0of me3ssY! w00o00r!!ds")
##Get rid of numbers and punctuation. Make everything lower case.
text<-gsub("[[:punct:]]", "", text)
text<-gsub("[[:digit:]]", "", text)
text<-tolower(text)
##Split large character vector into substrings.
text<-strsplit(text,split=" ")
text
## [[1]]
## [1] "this" "is" "an" "example" "of" "messy" "words"