Making Adjacency Matrices in R
The Pokemon Matrix
0. Introduction
Social Network Analysis typically relies on adjacency matrices, square matrices which encode relationships (edges) between individuals (nodes). However, if you grew up in the late 1990s like me, you might already know intrinsically what an adjacency matrix does, even if you’ve never heard the word before. That’s because many (okay, some) kids of the late 1990s thought a lot about the relationships between Pokemon of different types.
Recently, I’ve been designing some network analysis R tutorials that work surprisingly well with Pokemon, so I decided to write up another tutorial describing how to build an adjacency matrix that describes how well each Pokemon will probably fare when matched up with another, depending on their respective types. If that sounds delightful, read on!
Your tiny Starter Pokemon Nodes!
0.1 Type Relationships
In the Pokemon series, each type of Pokemon (eg. Fire, Water, Grass, Electric, etc.) had differently levels of weakness or resistance to moves of certain types. Fire was weak to Water; Water was weak to Electric; Grass was strong against Water, etc. Each Pokemon could have 1 to 2 types, predisposing them to be strong or weak against certain attacks, with an array of moves that (usually) matched those types. So, can we use
0.2 Source
For our exercise today, we’re going to - and I’m delighted to say
this - use an adjacency matrix of Pokemon. This beautiful
Kaggle dataset by Rounak Banik contains an edgelist of every Pokemon
and the types of Pokemon they are weak to. (If you cite my tutorial,
please be sure to cite their dataset too!) So, we can use this to create
an edgelist pairing every Pokemon that is weak to the other, showing the
expected damage inflicted by a moderate move (power = 40)
on that type. We’re also going to rely on some images scraped from Bulbapedia.
This tutorial is entirely for fun, has no commercial value, and does not lay claim on any of the images or raw source data from Kaggle/Bulbapedia, which were used in keeping with a Creative Commons Attribution-NonCommercial-ShareAlike license.
0.3 Load Packages
To get started, please load the following packages!
library(tidyverse) # data wrangling
library(knitr) # viewing images
library(xml2) # webscraping
library(rvest) # webscraping
library(magick) # image properties
1. Get Images
1.1 Get Image URLs
First, we’re going to scrape a list of Pokemon images to use for later visualizations!
library(tidyverse)
library(xml2)
library(rvest)
# First, we're going to webscrape the image files for tiny images of each Pokemon.
# "https://bulbapedia.bulbagarden.net/wiki/List_of_Pok%C3%A9mon_by_National_Pok%C3%A9dex_number"
read_html("pokedex.html") %>%
xml_find_all(".//table") %>%
xml_find_all(".//img") %>%
with(
tibble(
# Extract names
name = xml_attr(., "alt"),
# Get the URL for the miniature image
image_mini = xml_attr(., "src") %>% str_replace(., "//archives", "https://archives"))) %>%
# Add a unique ID
mutate(id = str_extract(image_mini, "[/][0-9]{3}.*") %>% str_extract("[0-9]{3}") %>% parse_number()) %>%
# Build a link for the bigger image too.
mutate(page = name %>% paste("https://bulbapedia.bulbagarden.net/wiki/",
., "_(Pok%C3%A9mon)", sep = "")) %>%
write_csv("pokedex.csv")1.2 Scrap Images (in Batches as able)
Next, we’re going to scrape them from Bulbapedia. (Note: this gets kind of hard, and needs to be done in small amounts.)
library(magick)
test <- read_csv("pokedex.csv")
dir.create("images")
n = length(test$image_mini)
for(i in 1:n){
print(i)
test$image_mini[i] %>%
image_read() %>%
image_transparent(color = "white") %>%
image_write(
path = paste("images/", test$id[i], ".png", sep = ""),
format = "png", quality = 100)
Sys.sleep(time = 1.1)
}
rm(list=ls())2. Build Node Dataset
Next, to build our dataset, first, we must grab the set of nodes in our network.
# Reading in our Kaggle dataset,
read_csv("pokemon.csv") %>%
# Let's grab the pokemon id, name, generation, capture rate,
select(name, id = pokedex_number, generation, capture_rate, type1, type2,
# their base hp, attack, defense, sp_attack, sp_defense, speed, and an indicator of whether or not they are legendary
hp, attack, defense, sp_attack, sp_defense, speed, legendary = is_legendary) %>%
mutate(capture_rate = parse_number(capture_rate)) %>%
# Distill this into distinct entries
distinct() %>%
# Last, join in the image urls
left_join(by = c("id", "name"), y = read_csv("pokedex.csv") %>%
mutate(image = paste("images/", id, ".png", sep = "")) %>%
select(id, name, image) %>%
distinct()) %>%
# and save to file
write_csv("nodes.csv")3. Build Basic Edge Dataset
3.1 Get Weaknesses
Next, we need to get the list of pokemons’ weaknesses.
# First, get the list of pokemon's weaknesses
dat <- read_csv("pokemon.csv") %>%
select(gen = generation, id = pokedex_number, name, type1, type2, contains("against")) %>%
# Now put all the potential opponent types as rows, not columns
pivot_longer(cols = c(contains("against")), names_to = "against", values_to = "weakness") %>%
mutate(against = str_remove(against, "against_")) %>%
# Now make each row reflect a type entry; eg. Bulbasaur is grass ans poison, so they get 2 rows
pivot_longer(cols = c(type1, type2), values_to = "type",
names_to = "whatever", values_drop_na = TRUE) %>%
# Keep just the key variables
select(gen, id, name, type, against, weakness) %>%
# Get just the distinct entries
distinct()3.2 Get Type Pairings
Now, we’re going to distill that into a data.frame for the
hometeam (recipient of attack) and awayteam
(attacker).
# Get the 'home team', as in the list of weaknesses for each member of the team
hometeam <- dat %>%
select(from_gen = gen, from_id = id, from_name = name,
from_type = type, to_type = against, weakness)
# Get the set of unique opponents
awayteam <- dat %>%
select(to_gen = gen, to_id = id, to_name = name, to_type = type) %>%
distinct()We’ll join these data.frames together to get the full list of
Pokemon-pair-types, named pairs, where any pair of 2
Pokemon could yield up to 4 rows.
For example, imagine a match between 2 Bulbasaurs, which are each Grass and Poison types. (Any Pokemon has a max of 2 types.)
Here, Bulbasaur (grass/poison) vs. Bulbasaur (grass/poison) yields:
Grass vs. Poison
Poison vs. Grass
Grass vs. Grass
Poison vs. Poison
pairs <- hometeam %>%
inner_join(by = c("to_type"), y = awayteam)3.3 Get Pokemon-Pairings
So, we’ll consolidate our pairs of Pokemon-pair-types into a simpler
list of Pokemon-Pairs. To do so, we will take the max weaknesses value
for any type pairing, so as to show the worst case scenario. (We take
the max, rather than multiplying, because you’re never going to get
attacked by a two-type move; only one-type moves exist.) We’ll name this
data.frame el, since we’ve finally create a true edgelist
(although not the final edgelist).
# For each unique pokemon pairing, please get max of their (up to 4) weaknesses
el <- pairs %>%
group_by(from_gen, from_id, from_name, to_gen, to_id, to_name) %>%
summarize(weakness = max(weakness)) %>%
ungroup() The observed weaknesses in the dataset include 0, 0.25, 0.50, 1, 2.0,
and 4.0. You can check this using the hometeam
data.frame.
hometeam$weakness %>% unique() %>% sort()[1] 0.00 0.25 0.50 1.00 2.00 4.00
3.4 Get Stats per Pokemon-Pairing
Finally, let’s actually join in the stats of each Pokemon. This will
help us eventually estimate how much damage will get incurred
from an attack. We’ll save this as edgelist_base.csv, so
that we can use it multiple times if anyone ever wants to.
el %>%
left_join(by = c("from_id" = "id"),
y = read_csv("nodes.csv") %>%
select(id, from_hp = hp,
from_attack = attack, from_defense = defense,
from_sp_attack = sp_attack, from_sp_defense = sp_defense)) %>%
left_join(by = c("to_id" = "id"),
y = read_csv("nodes.csv") %>%
select(id, to_hp = hp,
to_attack = attack, to_defense = defense,
to_sp_attack = sp_attack, to_sp_defense = sp_defense)) %>%
# Let's save this to file!
write_csv("edgelist_base.csv")
# Remove extraneous data
rm(list= ls())4. Calculate Damage
Next, we need to calculate damage! This is a big, tricky one. Fortunately, some like-minded folk figured out the damage functions for every Pokemon Game on Bulbapedia. We’re going to use the various formula from the newest game, Pokemon Arceus, both because it’s newest and they slightly simplified the stats.
4.1 Effort Level Bonus Function
To get started, we’re going to write a function including these
constants that estimates for any stat the effort level bonus, a non-zero
quantity that gets added to the base stat each time the Pokemon levels
up. (Read here for more
information.) We’re going to assume an effort level of 0,
which means that they evolve at the minimum rate, with no extra bells
and whistles.
get_bonus = function(stat, level){
# Assume an effort level (0 to 10) of ZERO, and therefore a multiplier of ZERO
effort = 0
# We assume an effort level of ZERO (from 0 to 10),
bonus <- (sqrt(stat) * effort + level) / 2.5
bonus %>%
round(digits = 0) %>%
return()
}4.2 Stats-per-Level Function
Then, we’re going to write a second function that generates the
projected stat for any given level, pre-bonus. Unfortunately, hp has a
slightly different formula than all the other stats, so we’ll add a
logical argument to the function called hp, which set by
default to FALSE to render the usual formula; if set to
TRUE, it will calculate the stat for hp.
Note: Non-hp stats will get a nature
argument, which we will hold as a constant at 1 for
simplicity, meaning that they don’t have a particularly advantageous or
disadvantageous nature.
get_stat = function(stat, hp = FALSE, level){
# If the stat in question is HP,
if(hp == TRUE){
# Calculate it like so
newstat = ( (level / 100 + 1) * stat + level)
}else{
# If not HP, then...
# For simplicity, we'll assume a nature of 1
nature = 1
# Calculate statistic
newstat = ((level / 100 + 1) * stat + level)*nature
}
return(newstat)
}4.3 Level-up Function
Now, let’s build a wrapper function for these, which will act on all of our stats at once.
get_levelup = function(data, level_from, level_to = level_from){
# Take the raw dataset
data %>%
# Level up HP
# First for defending Pokemon
mutate_at(vars(from_hp),
list(~get_stat(stat = ., hp = TRUE, level = level_from) +
get_bonus(stat = ., level = level_from) )) %>%
# repeat for attacking Pokemon
mutate_at(vars(to_hp),
list(~get_stat(stat = ., hp = TRUE, level = level_to) +
get_bonus(stat = ., level = level_to) )) %>%
# Level up other stats relevant to damage
# here, hp is FALSE by default
# First for defending Pokemon
mutate_at(vars(from_defense, from_sp_defense, from_attack, from_sp_attack),
list(~get_stat(stat = ., level = level_from) +
get_bonus(stat = ., level = level_from))) %>%
# repeat for attacking Pokemon
mutate_at(vars(to_defense, to_sp_defense, to_attack, to_sp_attack),
list(~get_stat(stat = ., level = level_to) +
get_bonus(stat = ., level = level_to))) %>%
# And return the dataset
return()
}4.4 Damage Calculator Function
Finally, we’ll write a function to get damage for each Pokemon pair,
contingent on the Pokemon’s level,
level-adjusted stats from above, the power of
their attack, and the overall weakness of the receiving
Pokemon to the attacking Pokemon’s type(s). We will use Bulbapedia’s
Arceus formula to calculate attack damage, holding constant at 1 the
number of targets, weather, status effects, and any other traits other
than those specified below. As the default power setting, we’ll use
power = 40, the power of ‘tackle,’
one of the most common starter moves.
In this function, we will calculate both the level-adjusted
damage/damage_sp the move inflicts
(*_sp indicates a special attack), as well as the number of
turns it will take to reduce the defending Pokemon’s level-adjusted
hp to zero.
get_damage = function(data, level_from, level_to = level_from, power = 40){
data %>%
mutate(
# Calculate the estimated damage due to stats
damage = (0.2 * to_attack + 3 * level_to + 20) / (from_defense + 50) * power * weakness,
damage_sp = (0.2 * to_sp_attack + 3 * level_to + 20) / (from_sp_defense + 50) * power * weakness) %>%
# Last, using the level-adjusted hp stat
# Estimate the number of turns it will take to reduce the 'from' pokemon's hp to 0
mutate(turns = ceiling(from_hp / damage),
turns_sp = ceiling(from_hp / damage_sp)) %>%
return()
}4.5 Save Functions
Finally, let’s save these functions as a bunch together, in case we want to use them again later.
save(get_bonus, get_stat, get_levelup, get_damage, file = "functions.RData")4.6 Calculate Damage for Level 5 Pokemon and Opponent
Knowing what we do now, let’s run these functions to produce our
estimated damage and turns.
library(tidyverse)
load("functions.RData")
edges <- read_csv("edgelist_base.csv")
edges %>%
get_levelup(level_from = 5) %>%
get_damage(level_from = 5, power = 40) %>%
# Select just relevant columns
select(from = from_name, to = to_name, weakness, damage, turns, damage_sp, turns_sp) %>%
write_csv("edgelist.csv")4.7 Clean house
Great! We’re all done! Remove extra data.
rm(list=ls()) # remove data
gc() # clear memory used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1212281 64.8 1971030 105.3 1971030 105.3
Vcells 2117731 16.2 8388608 64.0 4478879 34.2
5. Using the Dataset
And there you have it! Future tutorials using this dataset will be available soon.
5.1 Ranking
We can use these data to calculate various quantities of interest about our nodes (in this case, our Pokemon). For example:
# Get starter pokemon
starters <- c("Bulbasaur", "Ivysaur", "Venusaur",
"Charmander", "Charmeleon", "Charizard",
"Squirtle", "Wartortle", "Blastoise")
# Get the corresponding links for these pokemon
links <- paste("<img src='images/", 1:length(starters), ".png'>", sep = "")
# Get just the edges for starter pokemon
edges <- read_csv("edgelist.csv") %>%
filter(from %in% starters & to %in% starters) %>%
mutate(from = factor(from, levels = starters, labels = links),
to = factor(to, levels = starters, labels = links))
# Let's make a box highlighting the important results
box1 <- edges %>%
filter(to %in% levels(edges$to)[4:6], from %in% levels(edges$from)[1:3])
box2 <- edges %>%
filter(to %in% levels(edges$to)[1:3], from %in% levels(edges$from)[7:9])
# Let's use ggtext to visualize images
library(ggtext)
# Now visualize!
edges %>%
ggplot(mapping = aes(x = from, y = to, fill = turns, label = turns)) +
# Use tiles to show the adjacency matrix
geom_tile(color = "white") +
geom_text(color = "#373737") +
# Add boxes to highlight important results
geom_tile(data = box1, fill = NA, color = "black", size = 1) +
geom_tile(data = box2, fill = NA, color = "black", size = 1) +
# Add a color scheme
scale_fill_gradient(low = "white", high = "gold", trans = "log",
breaks = c(1, 2, 3, 5, 10, 15)) +
theme_classic(base_size = 14) +
theme(axis.ticks = element_blank(), axis.line = element_blank(),
axis.text.y = element_markdown(hjust = 1),
axis.text.x = element_markdown(hjust = 0.5)) +
labs(fill = "Turns\nRequired\nto KO",
x = "Recipient", y = "Attacker",
title = "Which Starter Pokemon is Most Likely to Win at Level 5?",
subtitle = "Charizard and Venosaur, oddly, trump Blastoise.",
caption = "Assuming Physical Attack of Worst Type (Power = 40, eg. Bubble)")Sample Adjacency Matrix