BU.520.650.Su19: Assignment 2
Description
The main goal of this assignment is to practice wordcloud visualization, text mining, web scraping, R Markdown HTML, %>% operator, and building R functions.
In this assignment, you are asked to write a function in R named wikiWebScraper which returns a wordcloud visualization of the words in the content of the wikipedia website for the given keyword. For example, wikiWebScraper(“Data_analysis”) returns a wordcloud visualization from https://en.wikipedia.org/wiki/Data_analysis.
Note: In order to produce a page similar to this page, you should do the following:
Use
output: rmdformats::readthedownto change the theme of the page.You can use
code-elementormark-elementto make some parts of a text in R Markdown highlighed.In order to include code, but not have it run, you should set
eval=FALSE.Get the content of the element with the attribute
ID=“bodyContent”using the following code:
Convert every word to lower case.
Remove the numbers, punctuations, whitespace, and English Stopwords from the corpus.
Remove the following StopWords from the corpus.
myStopWords <- c("may", "now", "also", "many", "use", "used", "typically","given",
"like", "will", "can", "often", "see", "one", "pdf", "issn", "journal",
tolower(month.name))- Remove the character
“–”using the following code:
%>% tm_map(content_transformer(function(x) {
x %>% gsub(pattern = "–", replacement = "") %>% return)}))Remove the character
“•”as well.Show exactly 15 words with the highest frequency.
Use the
sqrtof frequency as the measure of both size and color of the words in the visualization.Use
set.seed(5).Use 8 colors from
Set1pallet inbrewer.pal()function to visualize the words.The other parameters of
wordcloudfunction should berandom.order=FALSEandrot.per=0.35.
Examples
Conclusion
[Compare all the above six visualizations and write two to three paragraphs about it!]