Introduction

  • This application was created for the Developing Data Products Coursera class

  • Ever wondered which words appear more frequently than others in your favorite text? Here's an app for that!

  • The app outputs two neat plots that show word frequencies

  • You can adjust how many words are plotted and even take out those common "filler" words

First Plot

  • You can see the application in action here

  • The first plot is a bar plot that has the frequencies of various words

  • The frequencies are ordered in decreasing order and a slider allows the user to adjust the number of words in both plots

  • This is created using ggplot2

Second Plot

  • The second plot outputs a "Word Cloud", created with the wordcloud package

  • This is a graphical representation of relative word frequencies

  • The most common words are largest and appear closer to the center

  • Words are also colored according to their frequencies

Code Structure

  • The clean.word.count function does the heavy lifting for the application

  • It starts with making sure that all https links are transformed into http

  • Then it reads the page in and takes out all numbers and punctuation

  • The words are finally then checked against a dictionary before being counted

  • The next slide has an example of a code snippet

Cleaning Up Scraped Text

text <- c("Th1is. I!s A?n EXA,,.MPLE 0of me3ssY! w00o00r!!ds")

##Get rid of numbers and punctuation. Make everything lower case.
text<-gsub("[[:punct:]]", "", text)
text<-gsub("[[:digit:]]", "", text)
text<-tolower(text)

##Split large character vector into substrings.
text<-strsplit(text,split=" ")

text
## [[1]]
## [1] "this"    "is"      "an"      "example" "of"      "messy"   "words"