inclassact

In-Class Activity

Wiki Scraping

I got 10 random wiki pages scraping off the web and put it in a data-frame to get the names of each article.

I did this by using my professor’s weird random wiki link 10 times, and looping my data harvesting task for each one.

I did this by using my professor’s weird random wiki link 10 times, and looping my data harvesting task for each one.

for(i in seq_along(wikis)) {
  wiki_pages[i] <-
    wikis[i] %>% 
    read_html() %>% 
    html_elements("h1") %>% 
    html_text2()

  Sys.sleep(1) 
}

Now with the data frame, I also made a cool visual to show the number of characters in each wiki page.