I have put a few hours into Dota 2, so I thought a little dive into hero starting stats would be interesting. While scraping the data I wanted was simple enough, creating the loop to scrape 124 different pages all rendered in java made for a small challenge. I intend to analyze the various heroes’ starting stats and how they might influence the heroes’ starting attack damage, armor, and movement speed. I am also interested in several other data points such as average starting HP by hero type, the distribution of starting attributes based on type, and the top 5 fastest heroes by weapon and/or type. Insights gained through this analysis can be beneficial when constructing team makeups and for determining the strongest and weakest heroes at the start of the game. I will incorporate this data into my final project, which will include using the Dota 2 API to gather my game history data and analyzing my past performances.
Libraries and Data
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr) library(rvest)
Attaching package: 'rvest'
The following object is masked from 'package:readr':
guess_encoding
library(lubridate) library(magrittr)
Attaching package: 'magrittr'
The following object is masked from 'package:purrr':
set_names
The following object is masked from 'package:tidyr':
extract
Rows: 124 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): hero_name, hero_type, hero_description, hero_weapon, hero_attack
dbl (7): hero_str, hero_agi, hero_int, hero_hp, hero_mana, hero_armor, hero_...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Data Structure
The data was mostly ready for analysis, but there were a few issues with some characters’ hero_armor and hero_movement because the hero_attack, armor, and movement were all part of the same class and some of the melee characters lack a stat for “projectile speed” which is also part of the same class. This resulted in the incorrect numbers being scraped for 9 characters and I had to manually correct it. A tibble for reference:
There are 124 heroes in Dota 2, 60 of which are melee and 64 ranged - here is the breakdown of how they fall into their primary attribute categories:
heroes %>%ggplot(aes(x = hero_weapon, fill = hero_type))+geom_bar()+labs(x ="Attack Type", y ="Number of Heroes", fill ="Hero Type") +theme_minimal()
As you can see, the majority of Strength heroes are melee, while the majority of Intelligence heroes are ranged. Agility and Universal heroes are split nearly evenly.
I then took a look at average starting HP by Primary Attribute groups as well:
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
Strength heroes clearly have the highest starting HP, followed by Universal and then Intelligence and Agility. There are some outliers as well if you note the dot at the bottom of the Agility chart, that is Medusa, and she starts with a measly 120 HP!
I then got curious about the correlation between starting Str, Agi, and Int and heroes’ relative stats like hp, mana, armor, and movement speed:
heroes %>%ggplot(aes(x = hero_str, y = hero_hp)) +geom_point() +geom_smooth(method ="lm", se =FALSE, color ="red") +labs(x ="Strength",y ="HP",title ="Correlation Between Strength and Starting HP" ) +theme_minimal() +theme( panel.grid.major =element_line(color ="gray40", size =0.5),panel.grid.minor =element_line(color ="gray70", size =0.10),panel.background =element_rect(fill ="gray70"))
The correlation between strength and hp as well as intelligence and mana are undeniable, as you can see above. There is also a moderate correlation between starting agility and armor, and relatively no correlation whatsoever between starting agility and movement speed.
Movement and positioning are critical when playing Dota 2, so I was curious about who the fastest heroes are right out of the gate!
There are fast heroes in each group, and if your playstyle is anything like mine, you will be selecting a hero with high movement speed and armor for the beginning of the game!