BU.520.650.Su19: Assignment 2

Description

The main goal of this assignment is to practice wordcloud visualization, text mining, web scraping, R Markdown HTML, %>% operator, and building R functions.

In this assignment, you are asked to write a function in R named wikiWebScraper which returns a wordcloud visualization of the words in the content of the wikipedia website for the given keyword. For example, wikiWebScraper(“Data_analysis”) returns a wordcloud visualization from https://en.wikipedia.org/wiki/Data_analysis.

Note: In order to produce a page similar to this page, you should do the following:

  • Use output: rmdformats::readthedown to change the theme of the page.
  • You can use code-element or mark-element to make some parts of a text in R Markdown highlighed.
  • In order to include code, but not have it run, you should set eval=FALSE.
  • Get the content of the element with the attribute ID=“bodyContent” using the following code:
 %>% html_nodes("#bodyContent")
  • Convert every word to lower case.
  • Remove the numbers, punctuations, whitespace, and English Stopwords from the corpus.
  • Remove the following StopWords from the corpus.
myStopWords <- c("may", "now", "also", "many", "use", "used", "typically","given",
               "like", "will", "can", "often", "see", "one", "pdf", "issn", "journal",
               tolower(month.name))
  • Remove the character “–” using the following code:
   %>% tm_map(content_transformer(function(x) {
     x  %>%  gsub(pattern = "–", replacement = "") %>% return}))
  • Remove the character “•” as well.
  • Show exactly 15 words with the highest frequency.
  • Use the sqrt of frequency as the measure of both size and color of the words in the visualization.
  • Use set.seed(5).
  • Use 8 colors from Set1 pallet in brewer.pal() function to visualize the words.
  • The other parameters of wordcloud function should be random.order=FALSE and rot.per=0.35.

Examples

Data Mining

wikiWebScraper("Data_mining")

Data Analysis

wikiWebScraper("Data_analysis")

Big Data

wikiWebScraper("Big_data")

Data Wrangling

wikiWebScraper("Data_wrangling")

Data Visualization

wikiWebScraper("Data_visualization")

Data Science

wikiWebScraper("Data_science")

Conclusion

Jiawei Xia

6/13/2019