BU.520.650.Su19: Assignment 2

Description

The main goal of this assignment is to practice wordcloud visualization, text mining, web scraping, R Markdown HTML, %>% operator, and building R functions.

In this assignment, you are asked to write a function in R named wikiWebScraper which returns a wordcloud visualization of the words in the content of the wikipedia website for the given keyword. For example, wikiWebScraper(“Data_analysis”) returns a wordcloud visualization from https://en.wikipedia.org/wiki/Data_analysis.

Note: In order to produce a page similar to this page, you should do the following:

  • Use output: rmdformats::readthedown to change the theme of the page.

  • You can use code-element or mark-element to make some parts of a text in R Markdown highlighed.

  • In order to include code, but not have it run, you should set eval=FALSE.

  • Get the content of the element with the attribute ID=“bodyContent” using the following code:

  • Convert every word to lower case.

  • Remove the numbers, punctuations, whitespace, and English Stopwords from the corpus.

  • Remove the following StopWords from the corpus.

  • Remove the character “–” using the following code:
  • Remove the character “•” as well.

  • Show exactly 15 words with the highest frequency.

  • Use the sqrt of frequency as the measure of both size and color of the words in the visualization.

  • Use set.seed(5).

  • Use 8 colors from Set1 pallet in brewer.pal() function to visualize the words.

  • The other parameters of wordcloud function should be random.order=FALSE and rot.per=0.35.

Conclusion

[Compare all the above six visualizations and write two to three paragraphs about it!]

YourName

6/8/2019