On a blissful Sunday afternoon, I was listening to a radio interview on the Proxy War Project out of Virginia Tech. Then I said to myself, gee I wonder if I could find data on that?
The first thing I found was the Wikipedia page. Probably not the most comprehensive, but it has a nice feature: The page creates a typology of proxy wars. So, I decided to scrape it and started some EDA.
I made very little effort clean the data or make this page pretty. This is just a quick and dirty look loop across the categories of wars for a quick clean and plot to see what we got. A lot could be done with this data if you’re so inclined…
Scraping Wikipedia
Code
# librarieslibrary(rvest)library(tidyverse)# Scrape the tables from the pageproxy_tables <-read_html('https://en.wikipedia.org/wiki/List_of_proxy_wars') %>%html_table(fill =TRUE) %>%set_names(c('Caveat','Series', 'Pre-World War I proxy wars', 'Inter-war period proxy wars','Cold War proxy wars','Modern proxy wars','Ongoing proxy wars' ))
The Wikipedia page displays the following caveat:
This article or section appears to be slanted towards recent events. Please try to keep recent events in historical perspective and add more content related to non-recent events. (October 2022) (Learn how and when to remove this template message)
Cleaning
I did some superficial cleaning by building loop (purrr::map()) to iterate across the categories, which are currently in a list, of proxy wars. Then, I plotted each category of proxy war to look at the durations.
Code
setdiff( proxy_tables %>%names(), c('Caveat', 'Series')) %>%# iterate acrossmap(~ proxy_tables[[.x]] %>%tibble() %>% janitor::clean_names() %>%separate( dates, into =c('start_year', 'end_year'), sep ="–") %>%mutate(across( start_year:end_year,~str_extract(.x, '[0-9]{4}') %>%as.numeric(.x))) %>%mutate(war =str_remove(war, "\\[.*\\]$")) %>%ggplot(aes(x = war, y = start_year)) +geom_segment(aes(xend = war, yend = end_year), color ="royalblue",alpha =0.8) +# year startgeom_point(color ="tomato", size =3,alpha =0.8) +# year endgeom_point(aes(y = end_year), color ="royalblue", size =3,alpha =0.8) +coord_flip() +labs(x ="", y ="",title = .x) )