BU.520.650.Su19: Assignment 2
Description
The main goal of this assignment is to practice wordcloud visualization, text mining, web scraping, R Markdown HTML, %>% operator, and building R functions.
In this assignment, you are asked to write a function in R named wikiWebScraper which returns a wordcloud visualization of the words in the content of the wikipedia website for the given keyword. For example, wikiWebScraper(“Data_analysis”) returns a wordcloud visualization from https://en.wikipedia.org/wiki/Data_analysis.
Note: In order to produce a page similar to this page, you should do the following:
-
Use
output: rmdformats::readthedownto change the theme of the page. -
You can use
code-elementormark-elementto make some parts of a text in R Markdown highlighed. -
In order to include code, but not have it run, you should set
eval=FALSE. -
Get the content of the element with the attribute
ID=“bodyContent”using the following code:
%>% html_nodes("#bodyContent")- Convert every word to lower case.
- Remove the numbers, punctuations, whitespace, and English Stopwords from the corpus.
- Remove the following StopWords from the corpus.
myStopWords <- c("may", "now", "also", "many", "use", "used", "typically","given",
"like", "will", "can", "often", "see", "one", "pdf", "issn", "journal",
tolower(month.name))-
Remove the character
“–”using the following code:
%>% tm_map(content_transformer(function(x) {
x %>% gsub(pattern = "–", replacement = "") %>% return}))-
Remove the character
“•”as well. - Show exactly 15 words with the highest frequency.
-
Use the
sqrtof frequency as the measure of both size and color of the words in the visualization. -
Use
set.seed(5). -
Use 8 colors from
Set1pallet inbrewer.pal()function to visualize the words. -
The other parameters of
wordcloudfunction should berandom.order=FALSEandrot.per=0.35.
Examples
Data Mining
wikiWebScraper("Data_mining")Data Analysis
wikiWebScraper("Data_analysis")Big Data
wikiWebScraper("Big_data")Data Wrangling
wikiWebScraper("Data_wrangling")Data Visualization
wikiWebScraper("Data_visualization")Data Science
wikiWebScraper("Data_science")