In-Class Activity Web Scraping

Wikipedia Web Scraping

15 random Wikipedia articles were scraped off of the web so that we get a data frame recording the names of the articles.

Running Code

Load in your data that stores the names of the wiki articles that were scraped

library(tidyverse)

wiki_pages_df <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/grotjana_xavier_edu/Eckm9F_hfGRIsMAcLU5aLD0Bur7Kry1oe9BNmf54yOzllA?download=1")

You should now have a data frame which you can now use to make a histogram. This histogram below shows the distribution of character lengths for the different article names.

wiki_pages_df %>% 
  ggplot(aes(x=nchar(wiki_pages))) +
  geom_histogram()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.