Wiki Titles Scraping

Author

J Markley

Libraries and import the Data

library(tidyverse) # The tidyverse collection of packages
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
qwiki <- 
  read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/markleyj3_xavier_edu/EQhkckijJMZOmpqmctNlf9YBvNWCPWVwNHdEEzKPmLrRkg?download=1")
Rows: 10 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): title

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

I mutated a new vector counting the number of characters in each title

This is then graphed nicely for you to view

qwiki %>%
  mutate(length = nchar(title)) %>% 
  ggplot(aes(x = reorder(title, length), y = length)) +
  geom_col(fill = "darkblue") +
  coord_flip() +
  labs(title = "Length of Titles",
       x = "Wiki Page Title",
       y = "Number of Characters") +
  theme_minimal() +
    theme(
    panel.grid.major = element_line(color = "gray40", size = 0.5),
    panel.grid.minor = element_line(color = "gray70", size = 0.10),
    panel.background = element_rect(fill = "gray70"))
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.