Making Adjacency Matrices in R

The Pokemon Matrix

0. Introduction

Social Network Analysis typically relies on adjacency matrices, square matrices which encode relationships (edges) between individuals (nodes). However, if you grew up in the late 1990s like me, you might already know intrinsically what an adjacency matrix does, even if you’ve never heard the word before. That’s because many (okay, some) kids of the late 1990s thought a lot about the relationships between Pokemon of different types.

Recently, I’ve been designing some network analysis R tutorials that work surprisingly well with Pokemon, so I decided to write up another tutorial describing how to build an adjacency matrix that describes how well each Pokemon will probably fare when matched up with another, depending on their respective types. If that sounds delightful, read on!


Your tiny Starter Pokemon Nodes!

Your tiny Starter Pokemon Nodes!

0.1 Type Relationships

In the Pokemon series, each type of Pokemon (eg. Fire, Water, Grass, Electric, etc.) had differently levels of weakness or resistance to moves of certain types. Fire was weak to Water; Water was weak to Electric; Grass was strong against Water, etc. Each Pokemon could have 1 to 2 types, predisposing them to be strong or weak against certain attacks, with an array of moves that (usually) matched those types. So, can we use

0.2 Source

For our exercise today, we’re going to - and I’m delighted to say this - use an adjacency matrix of Pokemon. This beautiful Kaggle dataset by Rounak Banik contains an edgelist of every Pokemon and the types of Pokemon they are weak to. (If you cite my tutorial, please be sure to cite their dataset too!) So, we can use this to create an edgelist pairing every Pokemon that is weak to the other, showing the expected damage inflicted by a moderate move (power = 40) on that type. We’re also going to rely on some images scraped from Bulbapedia.

This tutorial is entirely for fun, has no commercial value, and does not lay claim on any of the images or raw source data from Kaggle/Bulbapedia, which were used in keeping with a Creative Commons Attribution-NonCommercial-ShareAlike license.

0.3 Load Packages

To get started, please load the following packages!

library(tidyverse) # data wrangling
library(knitr) # viewing images
library(xml2) # webscraping
library(rvest) # webscraping
library(magick) # image properties



1. Get Images

1.1 Get Image URLs

First, we’re going to scrape a list of Pokemon images to use for later visualizations!

library(tidyverse)
library(xml2)
library(rvest)

# First, we're going to webscrape the image files for tiny images of each Pokemon.
# "https://bulbapedia.bulbagarden.net/wiki/List_of_Pok%C3%A9mon_by_National_Pok%C3%A9dex_number" 
read_html("pokedex.html") %>%
  xml_find_all(".//table") %>%
  xml_find_all(".//img") %>%
  with(
    tibble(
      # Extract names
      name =  xml_attr(., "alt"),
      # Get the URL for the miniature image
      image_mini = xml_attr(., "src") %>% str_replace(., "//archives", "https://archives"))) %>%
  # Add a unique ID
  mutate(id = str_extract(image_mini, "[/][0-9]{3}.*") %>% str_extract("[0-9]{3}") %>% parse_number()) %>%
  # Build a link for the bigger image too.
  mutate(page = name %>% paste("https://bulbapedia.bulbagarden.net/wiki/", 
                                    ., "_(Pok%C3%A9mon)", sep = "")) %>%
  write_csv("pokedex.csv")

1.2 Scrap Images (in Batches as able)

Next, we’re going to scrape them from Bulbapedia. (Note: this gets kind of hard, and needs to be done in small amounts.)

library(magick)

test <- read_csv("pokedex.csv")

dir.create("images")
n = length(test$image_mini)

for(i in 1:n){
  

  print(i)
  
  test$image_mini[i] %>% 
    image_read() %>%
    image_transparent(color = "white") %>%
    image_write(
      path = paste("images/", test$id[i], ".png", sep = ""), 
      format = "png", quality = 100)  
  
  Sys.sleep(time = 1.1)

  
}

rm(list=ls())

2. Build Node Dataset

Next, to build our dataset, first, we must grab the set of nodes in our network.

# Reading in our Kaggle dataset,
read_csv("pokemon.csv") %>% 
  # Let's grab the pokemon id, name, generation, capture rate,
  select(name, id = pokedex_number, generation, capture_rate, type1, type2,
         # their base hp, attack, defense, sp_attack, sp_defense, speed, and an indicator of whether or not they are legendary
         hp, attack, defense, sp_attack, sp_defense, speed, legendary = is_legendary) %>%
  mutate(capture_rate = parse_number(capture_rate)) %>%
  # Distill this into distinct entries
  distinct() %>%
  # Last, join in the image urls
  left_join(by = c("id", "name"), y = read_csv("pokedex.csv") %>% 
  mutate(image = paste("images/", id, ".png", sep = "")) %>%
  select(id, name, image) %>%
  distinct()) %>%
  # and save to file
  write_csv("nodes.csv")

3. Build Basic Edge Dataset

3.1 Get Weaknesses

Next, we need to get the list of pokemons’ weaknesses.

# First, get the list of pokemon's weaknesses
dat <- read_csv("pokemon.csv") %>% 
  select(gen = generation, id = pokedex_number, name, type1, type2, contains("against")) %>%
  # Now put all the potential opponent types as rows, not columns
  pivot_longer(cols = c(contains("against")), names_to = "against", values_to = "weakness") %>%
  mutate(against = str_remove(against, "against_")) %>%
  # Now make each row reflect a type entry; eg. Bulbasaur is grass ans poison, so they get 2 rows
  pivot_longer(cols = c(type1, type2), values_to = "type", 
               names_to = "whatever", values_drop_na = TRUE) %>%
  # Keep just the key variables
  select(gen, id, name, type, against, weakness) %>%
  # Get just the distinct entries
  distinct()

3.2 Get Type Pairings

Now, we’re going to distill that into a data.frame for the hometeam (recipient of attack) and awayteam (attacker).

# Get the 'home team', as in the list of weaknesses for each member of the team
hometeam <- dat %>%
  select(from_gen = gen, from_id = id, from_name = name, 
         from_type = type, to_type = against, weakness)

# Get the set of unique opponents
awayteam <- dat %>%
  select(to_gen = gen, to_id = id, to_name = name, to_type = type) %>%
  distinct()

We’ll join these data.frames together to get the full list of Pokemon-pair-types, named pairs, where any pair of 2 Pokemon could yield up to 4 rows.

For example, imagine a match between 2 Bulbasaurs, which are each Grass and Poison types. (Any Pokemon has a max of 2 types.)

Here, Bulbasaur (grass/poison) vs. Bulbasaur (grass/poison) yields:

  • Grass vs. Poison

  • Poison vs. Grass

  • Grass vs. Grass

  • Poison vs. Poison

pairs <- hometeam %>%
  inner_join(by = c("to_type"), y = awayteam)

3.3 Get Pokemon-Pairings

So, we’ll consolidate our pairs of Pokemon-pair-types into a simpler list of Pokemon-Pairs. To do so, we will take the max weaknesses value for any type pairing, so as to show the worst case scenario. (We take the max, rather than multiplying, because you’re never going to get attacked by a two-type move; only one-type moves exist.) We’ll name this data.frame el, since we’ve finally create a true edgelist (although not the final edgelist).

# For each unique pokemon pairing, please get max of their (up to 4) weaknesses
el <- pairs %>%
  group_by(from_gen, from_id, from_name, to_gen, to_id, to_name) %>%
  summarize(weakness = max(weakness)) %>%
  ungroup() 

The observed weaknesses in the dataset include 0, 0.25, 0.50, 1, 2.0, and 4.0. You can check this using the hometeam data.frame.

hometeam$weakness %>% unique() %>% sort()
[1] 0.00 0.25 0.50 1.00 2.00 4.00

3.4 Get Stats per Pokemon-Pairing

Finally, let’s actually join in the stats of each Pokemon. This will help us eventually estimate how much damage will get incurred from an attack. We’ll save this as edgelist_base.csv, so that we can use it multiple times if anyone ever wants to.

el %>% 
  left_join(by = c("from_id" = "id"),
            y = read_csv("nodes.csv") %>% 
              select(id, from_hp = hp, 
                     from_attack = attack, from_defense = defense, 
                     from_sp_attack = sp_attack, from_sp_defense = sp_defense)) %>%
  left_join(by = c("to_id" = "id"),
            y = read_csv("nodes.csv") %>% 
              select(id, to_hp = hp, 
                     to_attack = attack, to_defense = defense, 
                     to_sp_attack = sp_attack, to_sp_defense = sp_defense))  %>%
    # Let's save this to file!
  write_csv("edgelist_base.csv")

# Remove extraneous data
rm(list= ls())

4. Calculate Damage

Next, we need to calculate damage! This is a big, tricky one. Fortunately, some like-minded folk figured out the damage functions for every Pokemon Game on Bulbapedia. We’re going to use the various formula from the newest game, Pokemon Arceus, both because it’s newest and they slightly simplified the stats.

4.1 Effort Level Bonus Function

To get started, we’re going to write a function including these constants that estimates for any stat the effort level bonus, a non-zero quantity that gets added to the base stat each time the Pokemon levels up. (Read here for more information.) We’re going to assume an effort level of 0, which means that they evolve at the minimum rate, with no extra bells and whistles.

get_bonus = function(stat, level){
  # Assume an effort level (0 to 10) of ZERO, and therefore a multiplier of ZERO
  effort = 0
  # We assume an effort level of ZERO (from 0 to 10), 
  
  bonus <- (sqrt(stat) * effort + level) / 2.5
  
  bonus %>%
    round(digits = 0) %>%
    return()
}

4.2 Stats-per-Level Function

Then, we’re going to write a second function that generates the projected stat for any given level, pre-bonus. Unfortunately, hp has a slightly different formula than all the other stats, so we’ll add a logical argument to the function called hp, which set by default to FALSE to render the usual formula; if set to TRUE, it will calculate the stat for hp.

Note: Non-hp stats will get a nature argument, which we will hold as a constant at 1 for simplicity, meaning that they don’t have a particularly advantageous or disadvantageous nature.

get_stat = function(stat, hp = FALSE, level){
  # If the stat in question is HP,
  if(hp == TRUE){
    # Calculate it like so
    newstat = ( (level / 100 + 1) * stat + level)
    
  }else{
    # If not HP, then...
    # For simplicity, we'll assume a nature of 1
    nature = 1
    # Calculate statistic
    newstat = ((level / 100 + 1) * stat + level)*nature
    
  }
  return(newstat)
}

4.3 Level-up Function

Now, let’s build a wrapper function for these, which will act on all of our stats at once.

get_levelup = function(data, level_from, level_to = level_from){
  # Take the raw dataset
  data %>%
    # Level up HP
    # First for defending Pokemon
    mutate_at(vars(from_hp),
              list(~get_stat(stat = ., hp = TRUE, level = level_from)  + 
                     get_bonus(stat = ., level = level_from) )) %>%
    # repeat for attacking Pokemon
    mutate_at(vars(to_hp),
              list(~get_stat(stat = ., hp = TRUE, level = level_to)  + 
                     get_bonus(stat = ., level = level_to) )) %>%
    # Level up other stats relevant to damage
    # here, hp is FALSE by default
    # First for defending Pokemon
    mutate_at(vars(from_defense, from_sp_defense, from_attack, from_sp_attack),
              list(~get_stat(stat = ., level = level_from) + 
                     get_bonus(stat = ., level = level_from))) %>%
    # repeat for attacking Pokemon
    mutate_at(vars(to_defense, to_sp_defense, to_attack, to_sp_attack),
              list(~get_stat(stat = ., level = level_to) + 
                     get_bonus(stat = ., level = level_to))) %>%
    # And return the dataset
    return()
}

4.4 Damage Calculator Function

Finally, we’ll write a function to get damage for each Pokemon pair, contingent on the Pokemon’s level, level-adjusted stats from above, the power of their attack, and the overall weakness of the receiving Pokemon to the attacking Pokemon’s type(s). We will use Bulbapedia’s Arceus formula to calculate attack damage, holding constant at 1 the number of targets, weather, status effects, and any other traits other than those specified below. As the default power setting, we’ll use power = 40, the power of ‘tackle,’ one of the most common starter moves.

In this function, we will calculate both the level-adjusted damage/damage_sp the move inflicts (*_sp indicates a special attack), as well as the number of turns it will take to reduce the defending Pokemon’s level-adjusted hp to zero.

get_damage = function(data, level_from, level_to = level_from, power = 40){
  data %>%
    mutate(
      # Calculate the estimated damage due to stats
      damage = (0.2 * to_attack + 3 * level_to + 20) / (from_defense + 50) * power * weakness,
      damage_sp = (0.2 * to_sp_attack + 3 * level_to + 20) / (from_sp_defense + 50) * power * weakness) %>%
    # Last, using the level-adjusted hp stat
    # Estimate the number of turns it will take to reduce the 'from' pokemon's hp to 0
    mutate(turns = ceiling(from_hp / damage),
           turns_sp = ceiling(from_hp / damage_sp)) %>%
    return()
}

4.5 Save Functions

Finally, let’s save these functions as a bunch together, in case we want to use them again later.

save(get_bonus, get_stat, get_levelup, get_damage, file = "functions.RData")

4.6 Calculate Damage for Level 5 Pokemon and Opponent

Knowing what we do now, let’s run these functions to produce our estimated damage and turns.

library(tidyverse)
load("functions.RData")

edges <- read_csv("edgelist_base.csv")

edges %>%
  get_levelup(level_from = 5) %>%
  get_damage(level_from = 5, power = 40) %>%
  # Select just relevant columns
  select(from = from_name, to = to_name, weakness, damage, turns, damage_sp, turns_sp) %>%
  write_csv("edgelist.csv")

4.7 Clean house

Great! We’re all done! Remove extra data.

rm(list=ls()) # remove data
gc() # clear memory
          used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 1212281 64.8    1971030 105.3  1971030 105.3
Vcells 2117731 16.2    8388608  64.0  4478879  34.2

5. Using the Dataset

And there you have it! Future tutorials using this dataset will be available soon.

5.1 Ranking

We can use these data to calculate various quantities of interest about our nodes (in this case, our Pokemon). For example:

# Get starter pokemon
starters <- c("Bulbasaur", "Ivysaur", "Venusaur", 
  "Charmander", "Charmeleon", "Charizard", 
  "Squirtle", "Wartortle", "Blastoise")

# Get the corresponding links for these pokemon
links <- paste("<img src='images/", 1:length(starters), ".png'>", sep = "")

# Get just the edges for starter pokemon
edges <- read_csv("edgelist.csv") %>% 
  filter(from %in% starters & to %in% starters) %>%
  mutate(from = factor(from, levels = starters, labels = links),
         to = factor(to, levels = starters, labels = links))

# Let's make a box highlighting the important results
box1 <- edges %>%
  filter(to %in% levels(edges$to)[4:6], from %in% levels(edges$from)[1:3])


box2 <- edges %>%
  filter(to %in% levels(edges$to)[1:3], from %in% levels(edges$from)[7:9])

# Let's use ggtext to visualize images
library(ggtext)

# Now visualize!
edges %>% 
  ggplot(mapping = aes(x = from, y = to, fill = turns, label = turns)) +
  # Use tiles to show the adjacency matrix
  geom_tile(color = "white") +
  geom_text(color = "#373737") +
  # Add boxes to highlight important results
  geom_tile(data = box1, fill = NA, color = "black", size = 1) +
  geom_tile(data = box2, fill = NA, color = "black", size = 1) +
  # Add a color scheme
  scale_fill_gradient(low = "white", high = "gold", trans = "log",
                      breaks = c(1, 2, 3, 5, 10, 15)) + 
  theme_classic(base_size = 14) +
  theme(axis.ticks = element_blank(), axis.line = element_blank(),
        axis.text.y = element_markdown(hjust = 1),
        axis.text.x = element_markdown(hjust = 0.5)) +
  labs(fill = "Turns\nRequired\nto KO",
       x = "Recipient", y = "Attacker",
       title = "Which Starter Pokemon is Most Likely to Win at Level 5?",
       subtitle = "Charizard and Venosaur, oddly, trump Blastoise.",
       caption = "Assuming Physical Attack of Worst Type (Power = 40, eg. Bubble)")
Sample Adjacency Matrix

Sample Adjacency Matrix