Overview

Our data is from D&D Beyond’s character creation feature on their website. Dungeons and dragons, or D&D for short, is a fantasy tabletop role-playing game (TTRPG) originally created and designed by Gary Gygax and Dave Arneson in 1974. D&D is in the category of wargaming, but it differs by allowing players to create their own individual characters instead of traditionally controlling military formations. The characters travers fantatsy worlds and go on adventures with a dungeon master (DM) creating and taking the characters and their players through a story mostly improvised. Our data contains characters created for the game with multiple attributes, stats, skills, and even the region where users are from.

Loading data and libraries

library(tidyverse)
library(RCurl)
y <- getURL("https://raw.githubusercontent.com/isaias-soto/CUNY_DAT607/refs/heads/main/Project%202/dnd_chars_all.tsv") 
dnd <- read.csv(text = y, sep = "\t") 
glimpse(dnd)
## Rows: 10,894
## Columns: 35
## $ ip                 <chr> "", "", "6b5d3f4", "9b7218f", "9b7218f", "bf0845c",…
## $ finger             <chr> "ed15f9d", "ed15f9d", "d922658", "b5d19a0", "b5d19a…
## $ hash               <chr> "fe3ed6570067d2cd808bcee0a4396824", "aa656cef94740b…
## $ name               <chr> "ee1e382c", "ee1e382c", "f1f6ff43", "f92bdd74", "f9…
## $ race               <chr> "Hill Dwarf", "Hill Dwarf", "Human", "Fallen Aasima…
## $ background         <chr> "Guild Member - Justice", "Guild Member - Justice",…
## $ date               <chr> "2022-08-23T20:02:11Z", "2022-08-23T19:43:25Z", "20…
## $ class              <chr> "Sorcerer 13|Cleric 1", "Sorcerer 13|Cleric 1", "Fi…
## $ justClass          <chr> "Sorcerer|Cleric", "Sorcerer|Cleric", "Fighter", "S…
## $ subclass           <chr> "Clockwork Soul|Order Domain", "Clockwork Soul|Orde…
## $ level              <int> 14, 14, 13, 5, 5, 1, 20, 20, 4, 4, 20, 5, 11, 3, 1,…
## $ feats              <chr> "Fey Touched|War Caster|Metamagic Adept", "", "Heav…
## $ HP                 <int> 146, 133, 140, 34, 34, 10, 94, 116, 41, 35, 144, 42…
## $ AC                 <int> 10, 10, 21, 16, 16, 13, 16, 13, 15, 13, 16, 19, 18,…
## $ Str                <int> 9, 9, 20, 8, 8, 10, 13, 14, 14, 9, 13, 15, 15, 9, 1…
## $ Dex                <int> 11, 11, 12, 10, 10, 16, 16, 15, 16, 17, 16, 17, 19,…
## $ Con                <int> 20, 18, 19, 14, 14, 14, 13, 12, 15, 13, 13, 12, 15,…
## $ Int                <int> 14, 14, 14, 10, 10, 12, 15, 11, 11, 17, 15, 11, 14,…
## $ Wis                <int> 14, 14, 11, 16, 16, 8, 15, 17, 13, 10, 15, 10, 18, …
## $ Cha                <int> 20, 20, 10, 16, 16, 16, 14, 11, 11, 11, 14, 9, 11, …
## $ alignment          <chr> "", "", "CG", "Caltico Neutro", "Caltico Neutro", "…
## $ skills             <chr> "Arcana|Religion|Intimidation", "Arcana|Religion|In…
## $ weapons            <chr> "Crossbow, light|Dagger", "Crossbow, light|Dagger",…
## $ spells             <chr> "Alarm*1|Protection from Evil and Good*1|Command*1|…
## $ castingStat        <chr> "Cha", "Cha", "Int", "Cha", "Cha", "Cha", "Wis", "W…
## $ choices            <chr> "metamagic/Twinned Spell*Subtle Spell*Quickened Spe…
## $ country            <chr> "Canada", "Canada", "United States", "Brazil", "Bra…
## $ countryCode        <chr> "CA", "CA", "US", "BR", "BR", "CA", "CA", "CA", "CA…
## $ processedAlignment <chr> "", "", "CG", "", "", "", "", "CN", "LG", "CN", "",…
## $ good               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ lawful             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ processedRace      <chr> "Dwarf", "Dwarf", "Human", "Aasimar", "Aasimar", "H…
## $ processedSpells    <chr> "Alarm*1|Protection from Evil and Good*1|Command*1|…
## $ processedWeapons   <chr> "Crossbow, Light|Dagger", "Crossbow, Light|Dagger",…
## $ alias              <chr> "thirsty_davinci", "thirsty_davinci", "cool_bhabha"…

Tidying data

We are going to start by removing the columns ip, finger, hash, and name since these are associated with ip addresses and identifying the devices users are operating, which we do not have use for. We will also remove the alignment column since this is a messier version of the processedAlignment column that is standardized. We will remove columns good and lawful since they are empty. Finally we’ll coerce the date column into data type “Date” so we can see trends over time. For column removal we’ll use the dplyr package, and for date-time coercion we’ll use the lubridate package.

dnd_tidy <- dnd |> 
  select(-c(ip, finger, hash, name, alignment, good, lawful)) |>
  mutate(date = as_date(parse_date_time(date, "Ymd HMS")))
glimpse(dnd_tidy)
## Rows: 10,894
## Columns: 28
## $ race               <chr> "Hill Dwarf", "Hill Dwarf", "Human", "Fallen Aasima…
## $ background         <chr> "Guild Member - Justice", "Guild Member - Justice",…
## $ date               <date> 2022-08-23, 2022-08-23, 2022-08-22, 2022-08-22, 20…
## $ class              <chr> "Sorcerer 13|Cleric 1", "Sorcerer 13|Cleric 1", "Fi…
## $ justClass          <chr> "Sorcerer|Cleric", "Sorcerer|Cleric", "Fighter", "S…
## $ subclass           <chr> "Clockwork Soul|Order Domain", "Clockwork Soul|Orde…
## $ level              <int> 14, 14, 13, 5, 5, 1, 20, 20, 4, 4, 20, 5, 11, 3, 1,…
## $ feats              <chr> "Fey Touched|War Caster|Metamagic Adept", "", "Heav…
## $ HP                 <int> 146, 133, 140, 34, 34, 10, 94, 116, 41, 35, 144, 42…
## $ AC                 <int> 10, 10, 21, 16, 16, 13, 16, 13, 15, 13, 16, 19, 18,…
## $ Str                <int> 9, 9, 20, 8, 8, 10, 13, 14, 14, 9, 13, 15, 15, 9, 1…
## $ Dex                <int> 11, 11, 12, 10, 10, 16, 16, 15, 16, 17, 16, 17, 19,…
## $ Con                <int> 20, 18, 19, 14, 14, 14, 13, 12, 15, 13, 13, 12, 15,…
## $ Int                <int> 14, 14, 14, 10, 10, 12, 15, 11, 11, 17, 15, 11, 14,…
## $ Wis                <int> 14, 14, 11, 16, 16, 8, 15, 17, 13, 10, 15, 10, 18, …
## $ Cha                <int> 20, 20, 10, 16, 16, 16, 14, 11, 11, 11, 14, 9, 11, …
## $ skills             <chr> "Arcana|Religion|Intimidation", "Arcana|Religion|In…
## $ weapons            <chr> "Crossbow, light|Dagger", "Crossbow, light|Dagger",…
## $ spells             <chr> "Alarm*1|Protection from Evil and Good*1|Command*1|…
## $ castingStat        <chr> "Cha", "Cha", "Int", "Cha", "Cha", "Cha", "Wis", "W…
## $ choices            <chr> "metamagic/Twinned Spell*Subtle Spell*Quickened Spe…
## $ country            <chr> "Canada", "Canada", "United States", "Brazil", "Bra…
## $ countryCode        <chr> "CA", "CA", "US", "BR", "BR", "CA", "CA", "CA", "CA…
## $ processedAlignment <chr> "", "", "CG", "", "", "", "", "CN", "LG", "CN", "",…
## $ processedRace      <chr> "Dwarf", "Dwarf", "Human", "Aasimar", "Aasimar", "H…
## $ processedSpells    <chr> "Alarm*1|Protection from Evil and Good*1|Command*1|…
## $ processedWeapons   <chr> "Crossbow, Light|Dagger", "Crossbow, Light|Dagger",…
## $ alias              <chr> "thirsty_davinci", "thirsty_davinci", "cool_bhabha"…

Analysis

Next, we’ll do some analysis on the newly cleaned dataset. We’ll examine the date variable to see when characters are being created, and the processedRace and countryCode variables to see which races are chosen most often and from where in the world.

ggplot(dnd_tidy, aes(x = date)) +
  geom_histogram(bins = 20)

summary(dnd_tidy$date)
##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "2018-04-06" "2019-03-14" "2019-11-30" "2020-02-20" "2021-01-03" "2022-08-23"

The left skewed histogram above shows that character creation actually drops off as time goes on. Now let’s take a look at where people in the world are engaging with this website.

processed_race_table <- table(dnd_tidy$processedRace)
country_code_table <- table(dnd_tidy$countryCode)
dnd_tidy |> 
  filter(!is.na(countryCode)) |>
  group_by(countryCode) |>
  count() |>
  filter(n > 200) |>
  ggplot(aes(x = countryCode, y = n)) +
  geom_col() +
  labs(title = "Top 3 countries that create characters on D&D Beyond",
       subtitle = "Canada, Great Britain, US", x = "Country", y = "Count")

The above plot shows the top 3 countries around the world that create the most characters using D&D Beyond. The US is the highest followed by Canada and Great Britain. I filtered the counts so only they would only show countries where more than 200 characters were created by users.

Conclusion

In conclusion, we took this messy dataset and removed unnecessary columns and coerced our date variable, of type character, into a date-time and then a date type of variable to see when characters were being created. We saw that character creation is dropping over time. We also looked at the origin of users who created characters and found the top 3 character creating countries where users originate from.