wikis <-
rep("http://asayanalytics.com/bored",10)
wiki_pages <-
c()
for(i in seq_along(wikis)) {
wiki_pages[i] <-
wikis[i] %>%
read_html() %>%
html_elements("h1") %>%
html_text2()
Sys.sleep(1)
}
wiki_pagesWiki Activity
I pulled multiple Wikipedia pages at random through a lopping function.
Process was done by using replication function to find 10 Wikipedia pages and then looping this object to harvest each page.
Here’s visual of distribution of 10 Wikipedia page title lengths
wiki_pages %>% nchar() wiki_pages
222
wiki_pages %>%
ggplot(aes(x= nchar(wiki_pages))) +
geom_histogram() +
labs(title = "Character Count for Wikipedia titles",
y = "Number of Characters",
x= "page")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.