DnD Monster Analysis

An investigation into the “stat blocks” of Dungeons and Dragons 5th edition creatures.

While consolidating my learning experience with R, I have put together a few graphs, functions and processes with the incidental aim of exploring the DnD5e creature stat blocks.

I intended to analyse the link between health points, armour class and challenge rating, as well as other factors.

Setting up my environment:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.2.0
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

library(RColorBrewer)

While putting this together, I referred to my course notes, the videos in Coursera, stackoverflow and github. ChatGTP was also quite useful when it came to getting over obstacles. Two out of three times, when I had serious “Arrrgh” moments and stalled, it gave me the solution in a timely fashion. The third time… well, read on to find out!

Finding data:

I looked on kaggle and found several data sets. I decided to start with data which has a useability rating of 10.0:
https://www.kaggle.com/datasets/patrickgomes/dungeons-and-dragons-5e-monsters 7kB “Dd5e_monsters.csv”, web-scraped from an online platform (unspecified).
https://www.kaggle.com/datasets/mrpantherson/dnd-5e-monsters 17kB “dnd_monsters.csv” scraped from www.aidedd.org
https://www.kaggle.com/datasets/travistyler/dnd-5e-monster-manual-stats 195kB, three files:
“roll_20_items.csv” about items, not creatures, “aidedd_blocks2.csv” from the website mentioned above and a “cleaned_monsters_basic.csv” file.

Loading data:

unspecified <- read.csv("Dd5e_monsters.csv")
aidedd1 <- read.csv("dnd_monsters.csv")
aidedd2 <- read.csv("aidedd_blocks2.csv")
precleaned <- read.csv("cleaned_monsters_basic.csv")

I immediately saw that:
* aidedd1 and aidedd2 have the same number of rows (762), but aidedd2 has over three times as many variables (53 as opposed to 17)
* precleaned and unspecified have the same number of observations, but precleaned has 45 variables, unspecified only has 7 variables.

As a result, I decided to begin working with the aidedd2 dataset, since it has the most information.

remove(unspecified)
remove(aidedd1)
remove(precleaned)

Getting to know the data

colnames(aidedd2)

##  [1] "name"               "size"               "type"              
##  [4] "alignment"          "languages"          "ac"                
##  [7] "hp"                 "cr"                 "speed"             
## [10] "swim"               "fly"                "climb"             
## [13] "burrow"             "passive_perception" "darkvision"        
## [16] "truesight"          "tremorsense"        "blindsight"        
## [19] "strength"           "str_mod"            "dex"               
## [22] "dex_mod"            "con"                "con_mod"           
## [25] "intel"              "int_mod"            "wis"               
## [28] "wis_mod"            "cha"                "cha_mod"           
## [31] "str_save"           "dex_save"           "con_save"          
## [34] "int_save"           "wis_save"           "cha_save"          
## [37] "history"            "perception"         "stealth"           
## [40] "persuasion"         "insight"            "deception"         
## [43] "arcana"             "religion"           "acrobatics"        
## [46] "athletics"          "intimidation"       "senses"            
## [49] "attributes"         "actions"            "legendary_actions" 
## [52] "legendary"          "source"

Which shows the various statistics I could to work with.

head(aidedd2)

Which gives the first few rows of the data.

First graphs

I had a feeling there was a clear relationship between CR and a mix of AC, HP (armour class, hit points), so I began with that and a simple scatter plot with CR shown on a colour scale. I chose jitter for the scatterplot, so it is easier to tell there are a lot of data points in one area.

fewest_cols <- select(aidedd2, ac, hp, cr)
ggplot(data = fewest_cols) +
  geom_jitter(mapping = aes(x = hp, y = ac, color = cr)) +
  geom_smooth(mapping = aes(x = hp, y = ac )) +
  labs(title = "Hit points, Armour Class and Challenge rating", 
       subtitle = "As HP increase, so does AC and CR", 
       x = "Hit Points", y = "Armour Class")

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

The trend line shows there clearly is a relationship between the variables. The highest CR is only given to creatures with very high HP and AC

Given that AC governs how hard it is to hit the monsters, it might be of interest to scale the HP by their AC. That is, assuming the attacker has +5 to hit with their weapon, an AC of 20 is overcome 25% of the time, an AC of 15 is overcome 50% of the time, AC 10 is overcome 75% and so forth. I proposed a function whereby the resilience of a creature is equal to: HP * (AC-5)/20

fewest_cols %>%  
  mutate(resilience = hp * (ac - 5) / 20) %>% 
  ggplot() +
  geom_jitter(mapping = aes(x = resilience, y = cr)) +
  geom_smooth(mapping = aes(x = resilience, y = cr ), 
              method = "loess", formula = "y~x") +
  labs(title="Resilience and Challenge rating", 
       x = "Resilience", y = "Challenge Rating")

Given the curve here, I wondered about square rooting resilience

fewest_cols %>% 
  mutate(fewest_cols, sqrt_res = sqrt(hp * (ac - 5) / 20)) %>% 
  ggplot() +
  geom_jitter(mapping = aes(x = sqrt_res, y = cr)) +
  geom_smooth(mapping = aes(x = sqrt_res, y = cr ), 
              method = "loess", formula = "y~x") +
  labs(title = 
         "Square root of Resilience and Challenge rating", 
       x = "Square root of Resilience", y = "Challenge Rating")

Still curvy, and adds an extra step, which feels unnecessary. But still, nice to try out a different function.

So, let’s get (a little) more complex. Saving throws represent robustness and supercede underlying stat modifiers for strength, wisdom and so forth. Which means we should use the maximum of e.g. str_mod and str_save when rolling for resisting a spell or physical effect. Throwing this into a graph looks like:

few_more_cols <- select(aidedd2, ac, hp, cr, str_mod, str_save, 
                     dex_mod, dex_save, con_mod, con_save, 
                     int_mod, int_save, wis_mod, wis_save,
                     cha_mod, cha_save) %>% 
  mutate(resilience = hp*(ac-5)/20) %>% 
  rowwise() %>% 
  mutate(sum_saves = pmax(str_mod, str_save) + pmax(dex_mod, dex_save) 
         + pmax(con_mod, con_save) + pmax(int_mod, int_save) 
         + pmax(wis_mod, wis_save) + pmax(cha_mod, cha_save)) 

ggplot(data = few_more_cols) +
  geom_jitter(mapping = aes(x = resilience, y = sum_saves, color = cr)) +
  geom_smooth(mapping = aes(x = resilience, y = sum_saves ), 
              method = "loess", formula = "y~x") +
  labs(title = "Resilience and Sum of Saving throw modifiers", 
       x = "Resilience", y = "Sum of Saves")

Creatures generally follow an upwards trend - saves increase as resilience increases. That is, until the last few incredibly resilient creatures, where I imagine the game designers were concerned if anything could harm or affect them. I also learned about the rowwise() function and pmax() versus max() (Thank you ChatGPT!).

A diverse bestiary?

Next, I wanted to think about the diversity of the world(s) in the game, I wondered if there was an over-representation of creatures of one particular type over others. Conversely, perhaps one type is under-utilised. Also, as a DM myself, I have noticed that certain types of creature have variants which hover around a specific CR.

creature_type_cr <- select(aidedd2, type, cr)

When I first tried this barchart, I got way too many tiny bars…

split_types <- separate(creature_type_cr, type, 
                        into = c("basic_type", "specific_type"), 
                        sep = " ", remove = FALSE, convert = TRUE, 
                        extra = "merge", fill = "right") %>% 
  group_by(basic_type)

counted_types <- split_types %>%
  count(basic_type)

Having split the creature types into general species and more specific labels, then counted them, I felt ready to go back to the barchart.

ggplot(data = counted_types, aes(x = reorder(basic_type, -n), y = n, 
       fill = basic_type)) +
  geom_bar(stat = "identity") +
  labs(title = "Bar chart showing number of creatures by basic type",
       x = "Creature Type", y = "Count")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        legend.position = "none")

I was not surprised to see humanoid the overwhelming majority type - they are the kinds who populate the cities and make societies (or the rampaging hordes tearing them down).
I did think I would see a lot more fey (seelie and unseelie courts from traditional folklore) since they are a large part of storytelling through human history.
It’s a shame the celestials are so outnumbered by the fiends, but perhaps this simply reflects the nature of story telling in DnD: the heroes face off against unspeakable evil, with powerful good characters in the background as questgivers or distant authority figures.

This chart proved quite challenging as it was throwing an error message and dropping the fill aesthetic. I asked ChatGPT and, with a few prompts, it suggested the factor() function, although I had to suggest the reason myself, which ChatGPT confirmed (CR has some decimal values, so R doesn’t automatically see it as a category which can group items).

However, I then tried to get the bars arranged in descending order from left to right… ChatGPT had a LOT of trouble with this. It invented a URL for my data source, repeated the same code twice, claiming it would fix the error… I can’t be 100% sure I wasn’t doing something wrong in my instruction of it, or in following it’s instructions, of course, but I really feel it dropped the ball here somewhere. Which was disappointing after two excellent assists. Maybe, just maybe I shouldn’t have said “I am a bit worried about becoming too dependent on you. final question…” Is it possible ChatGPT gave me flawed advice to enable me to grow more without it as a crutch?! I may never know. In the end, I went back to the data source and created counted_types instead of thrashing around with recalcitrant code.

Overall, I am very pleased with the final result and feel it was worth the effort! (And again, I learned something new.)

Miscellaneous charts

Finally, to round off the R graphing experience, I thought of a few other data representations which may be of interest.

This is quite an attractive chart, which I would love to order by bar height… if anyone has any ideas how to do so, please let me know!

colours <- brewer.pal(11, "RdYlBu")
colours <- colorRampPalette(colours)(31)

ggplot(data = split_types) +
  geom_bar(mapping = aes(x = basic_type, 
                         fill = factor(cr, levels = rev(levels(factor(cr))))), 
           stat = "count") +
  scale_fill_manual(values = colours) +
  labs(title = "Number of creature types, coloured by CR rating", 
       x = "Creature Type", y = "Count", fill = "CR") +
  theme(panel.background = element_rect(fill = "black"),
                                        axis.text.x = element_text(angle = 45, 
                                                                   hjust = 1))

This shows that humanoids span the full range of abilities, while beasts are always pretty low level. Dragons and fiends make up a lot of the higher level creatures while giants and aberrations hole the middle ground.

Apparently pie charts are rubbish? https://www.data-to-viz.com/caveat/pie.html But… I couldn’t resist trying it out in R.

pie(counted_types$n, labels = counted_types$basic_type, radius = 1.05, 
    main = "Creature types in DnD5e")

And, because I haven’t used this feature yet, facets! How does separating out legendary creatures affect that very first graph?

fewest_cols_legend <- select(aidedd2, ac, hp, cr, legendary)
ggplot(data = fewest_cols_legend) +
  geom_jitter(mapping = aes(x = hp, y = ac, color = cr)) +
  geom_smooth(mapping = aes(x = hp, y = ac ), 
              method = "loess", formula = "y~x") +
  labs(title = "Hit points, Armour Class and Challenge rating", 
       subtitle = "0 = non-legendary, 1 = legendary", 
       x = "Hit Points", y = "Armour Class") +
  facet_wrap(~ legendary)

All well and good, but the most familiar grid in DnD may arguably be the alignment chart. I wanted to know if I would see more devils than angels (ethically speaking), as I witnessed above with creature types.

fewest_cols_align <- select(aidedd2, ac, hp, cr, alignment)
ggplot(data = fewest_cols_align) +
  geom_jitter(mapping = aes(x = hp, y = ac, color = cr)) +
  geom_smooth(mapping = aes(x = hp, y = ac ), 
              method = "loess", formula = "y~x") +
  labs(title = "Hit points, Armour Class and Challenge rating", 
       subtitle = "Split by alignment", 
       x = "Hit Points", y = "Armour Class") +
  facet_wrap(~ alignment)

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : span too small.  fewer data values than degrees of freedom.

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : at 29.815

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : radius 0.034225

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : all data on boundary of neighborhood. make span bigger

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 29.815

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 0.185

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 1

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : at 67.185

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : radius 0.034225

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : all data on boundary of neighborhood. make span bigger

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : There are other near singularities as well. 0.034225

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : zero-width neighborhood. make span bigger

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : zero-width neighborhood. make span bigger

## Warning: Computation failed in `stat_smooth()`
## Caused by error in `predLoess()`:
## ! NA/NaN/Inf in foreign function call (arg 5)

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : span too small.  fewer data values than degrees of freedom.

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 74.7

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 45.3

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 0

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : There are other near singularities as well. 2840.9

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.  fewer
## data values than degrees of freedom.

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
## 74.7

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 45.3

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
## number 0

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other near
## singularities as well. 2840.9

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : span too small.  fewer data values than degrees of freedom.

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 8.28

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 24.72

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 0

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : There are other near singularities as well. 14573

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.  fewer
## data values than degrees of freedom.

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
## 8.28

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 24.72

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
## number 0

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other near
## singularities as well. 14573

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : span too small.  fewer data values than degrees of freedom.

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 10.725

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 54.275

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 0

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : There are other near singularities as well. 1.6256

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.  fewer
## data values than degrees of freedom.

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
## 10.725

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 54.275

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
## number 0

## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other near
## singularities as well. 1.6256

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

Hideous! Clearly, the column needs cleaning… The nine alignments exist on a chart. Ethics (Lawful, Neutral and Chaotic) on one axis, and morals (Good, Neutral and Evil) on the other axis.
As such, I shall filter out those which don’t fall neatly into one of the nine categories. Which feels a bit bad… Again, something for the future.

simple_align <- c("chaotic evil", "chaotic neutral", "chaotic good", 
                  "neutral evil", "neutral", "neutral good", 
                  "lawful evil", "lawful neutral", "lawful good")
fewest_cols_align_clean <- subset(fewest_cols_align, alignment %in% simple_align)
ggplot(data = fewest_cols_align_clean) +
  geom_jitter(mapping = aes(x = hp, y = ac, color = cr)) +
  geom_smooth(mapping = aes(x = hp, y = ac ), 
              method = "loess", formula = "y~x") +
  labs(title = "Hit points, Armour Class and Challenge rating", 
       subtitle = "Split by alignment", 
       x = "Hit Points", y = "Armour Class") +
  facet_wrap(~ alignment)

So evil charts definitely more populous than the others, fitting the narrative necessity. Lawful good has some surprisingly tough creatures.

Next steps

Things which I didn’t get around to doing:

Finding a way to allocate creatures to the alignment grid without excluding any of them. Possibly with a conditional mutate command? And also arranging the alignment charts in their traditional layout (L,N,C on x; E,N,G on y).
Arranging the “creature types with CR colouring” bar chart by size order.