Wiki Activity

Author

Ken

I pulled multiple Wikipedia pages at random through a lopping function.

Process was done by using replication function to find 10 Wikipedia pages and then looping this object to harvest each page.

wikis <- 
  rep("http://asayanalytics.com/bored",10)

wiki_pages <- 
  c()


for(i in seq_along(wikis)) {
  wiki_pages[i] <-
    wikis[i] %>% 
    read_html() %>% 
    html_elements("h1") %>% 
    html_text2()
    Sys.sleep(1) 
}


wiki_pages

Here’s visual of distribution of 10 Wikipedia page title lengths

wiki_pages %>% nchar()

wiki_pages 
       222

wiki_pages %>% 
  ggplot(aes(x= nchar(wiki_pages))) + 
  geom_histogram() + 
  labs(title = "Character Count for Wikipedia titles",
       y = "Number of Characters",
       x= "page")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.