Random Wikipedia Pages

Author

Ender Wiggen

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

df <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/lynchs6_xavier_edu/EbG6CWV6Y29KvGGZq3IrYnoB6pbQKL8girxwgr0baWzUyA?download=1")

New names:
Rows: 10 Columns: 2
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): Random_Bits_and_Bobs dbl (1): ...1
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

#some poor sod forgot to not take row numbers when writing a csv so i am removing them here
df <- df[ ,-1]

head(df)

# A tibble: 6 × 1
  Random_Bits_and_Bobs
  <chr>               
1 Corma tamosi        
2 Savannah Bananas    
3 Tojeiro EE          
4 Molk-e Daraq        
5 Mounton             
6 Sherwood Dam

P.S. the poor sod from the comments was me

df %>% 
  ggplot(aes(x=nchar(Random_Bits_and_Bobs))) +
  geom_histogram()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This is such an interesting histogram, why wouldnt you want to know the dispersion of characters in the random selection of wikipedia titles.