Proxy Wars - Wiki Scrape

Author

Michael Davies

Published

June 5, 2023

Proxy wars

On a blissful Sunday afternoon, I was listening to a radio interview on the Proxy War Project out of Virginia Tech. Then I said to myself, gee I wonder if I could find data on that?

The first thing I found was the Wikipedia page. Probably not the most comprehensive, but it has a nice feature: The page creates a typology of proxy wars. So, I decided to scrape it and started some EDA.

I made very little effort clean the data or make this page pretty. This is just a quick and dirty look loop across the categories of wars for a quick clean and plot to see what we got. A lot could be done with this data if you’re so inclined…

Scraping Wikipedia

Code
# libraries
library(rvest)
library(tidyverse)

# Scrape the tables from the page
proxy_tables <- 
  read_html('https://en.wikipedia.org/wiki/List_of_proxy_wars') %>%
  html_table(fill = TRUE) %>% 
  set_names(
    c(
      'Caveat',
      'Series', 
      'Pre-World War I proxy wars', 
      'Inter-war period proxy wars',
      'Cold War proxy wars',
      'Modern proxy wars',
      'Ongoing proxy wars'
    )) 

The Wikipedia page displays the following caveat:

This article or section appears to be slanted towards recent events. Please try to keep recent events in historical perspective and add more content related to non-recent events. (October 2022) (Learn how and when to remove this template message)

Cleaning

I did some superficial cleaning by building loop (purrr::map()) to iterate across the categories, which are currently in a list, of proxy wars. Then, I plotted each category of proxy war to look at the durations.

Code
setdiff(
  proxy_tables %>% 
    names(), 
  c('Caveat', 'Series')) %>% 
  # iterate across
  map(
    ~ proxy_tables[[.x]] %>% 
      tibble() %>% 
      janitor::clean_names() %>% 
      separate(
        dates, 
        into = c('start_year', 'end_year'), 
        sep = "–") %>%
      mutate(across(
        start_year:end_year,
        ~ str_extract(.x, '[0-9]{4}') %>% 
          as.numeric(.x))) %>% 
      mutate(war = str_remove(war, "\\[.*\\]$")) %>% 
      ggplot(aes(
        x = war, 
        y = start_year)) +
      geom_segment(aes(
        xend = war, 
        yend = end_year), 
        color = "royalblue",
        alpha = 0.8) +
      # year start
      geom_point(color = "tomato", 
                 size = 3,
                 alpha = 0.8) +
      # year end
      geom_point(aes(
        y = end_year), 
        color = "royalblue", 
        size = 3,
        alpha = 0.8) +  
      coord_flip() +
      labs(
        x = "", 
        y = "",
        title = .x) )
[[1]]


[[2]]


[[3]]


[[4]]


[[5]]