For this second part of DotaScience, we’ll do unsupervised learning: Clustering and Principal Component Analysis (PCA). Dota 2 is a multiplayer online battle arena (MOBA) video game developed and published by Valve. Dota 2 is played in matches between two teams (called Radiant and Dire) of five players, with each team occupying and defending their own separate base on the map. Each of the ten players independently controls a powerful character, known as a hero. In this article, we will analyze all of 117 heroes in Dota2 with unsupervised learning i mentioned before. Dota 2 Header

Background

Objective

This project is based on an old Kaggle competition. You can found all the datasets and the competition here. Well, we will not gonna predict the match winner (I already do that, check previous chapter here) but do a deeper analysis with the heroes instead. Is important for you to know this dataset were made one year ago. Thus, Recent updates of Dota are not represented in this analysis. However let’s hope we can found interesting insight that still related with current meta by do Clustering with Kmeans and dimensionality reduction with PCA

Let’s begin

Data Import

The competition provide 5 datasets; test, train, hero_names, item_ids, and submission example. We will use hero_names.json dataset.

Data Wrangling / Pre-process

EDA and Feature Engineering

## Observations: 117
## Variables: 29
## $ id                <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15...
## $ name              <chr> "npc_dota_hero_antimage", "npc_dota_hero_axe", "n...
## $ localized_name    <chr> "Anti-Mage", "Axe", "Bane", "Bloodseeker", "Cryst...
## $ primary_attr      <chr> "agi", "str", "int", "agi", "int", "agi", "str", ...
## $ attack_type       <chr> "Melee", "Melee", "Ranged", "Melee", "Ranged", "R...
## $ img               <chr> "/apps/dota2/images/heroes/antimage_full.png?", "...
## $ icon              <chr> "/apps/dota2/images/heroes/antimage_icon.png", "/...
## $ base_health       <dbl> 200, 200, 200, 200, 200, 200, 200, 200, 200, 200,...
## $ base_health_regen <dbl> 0.25, 2.75, NA, NA, NA, 0.25, 1.00, 0.50, NA, NA,...
## $ base_mana         <dbl> 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 7...
## $ base_mana_regen   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ base_armor        <dbl> -1, -2, 1, 0, 0, -3, 2, 0, -1, -2, 0, 0, -1, -2, ...
## $ base_mr           <dbl> 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 2...
## $ base_attack_min   <dbl> 29, 24, 35, 33, 30, 19, 27, 12, 25, 9, 15, 22, 30...
## $ base_attack_max   <dbl> 33, 28, 41, 39, 36, 30, 37, 16, 30, 18, 21, 44, 4...
## $ base_str          <dbl> 23, 25, 23, 24, 18, 17, 22, 21, 18, 20, 19, 19, 1...
## $ base_agi          <dbl> 24, 20, 23, 24, 16, 29, 12, 36, 18, 24, 20, 29, 2...
## $ base_int          <dbl> 12, 18, 23, 18, 14, 15, 16, 14, 19, 13, 18, 19, 2...
## $ str_gain          <dbl> 1.3, 3.4, 2.6, 2.7, 2.2, 1.9, 3.7, 2.2, 2.2, 3.0,...
## $ agi_gain          <dbl> 3.2, 2.2, 2.6, 3.5, 1.6, 2.8, 1.4, 2.8, 3.7, 4.3,...
## $ int_gain          <dbl> 1.8, 1.6, 2.6, 1.7, 3.3, 1.4, 1.8, 1.4, 1.9, 1.1,...
## $ attack_range      <dbl> 150, 150, 400, 150, 600, 625, 150, 150, 630, 350,...
## $ projectile_speed  <dbl> 0, 900, 900, 900, 900, 1250, 0, 0, 900, 1300, 120...
## $ attack_rate       <dbl> 1.4, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.4, 1.7, 1.5,...
## $ move_speed        <dbl> 310, 295, 305, 295, 275, 285, 310, 300, 290, 280,...
## $ turn_rate         <dbl> 0.5, 0.6, 0.6, 0.5, 0.5, 0.7, 0.9, 0.6, 0.5, 0.6,...
## $ cm_enabled        <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T...
## $ legs              <dbl> 2, 2, 4, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 6, 2...
## $ roles.c           <chr> "Carry,Escape,Nuker", "Initiator,Durable,Disabler...

first, we remove unused variable

##    localized_name      primary_attr       attack_type       base_health 
##                 0                 0                 0                 0 
## base_health_regen         base_mana   base_mana_regen        base_armor 
##                85                 0                 0                 0 
##           base_mr   base_attack_min   base_attack_max          base_str 
##                 0                 0                 0                 0 
##          base_agi          base_int          str_gain          agi_gain 
##                 0                 0                 0                 0 
##          int_gain      attack_range  projectile_speed       attack_rate 
##                 0                 0                 0                 0 
##        move_speed         turn_rate              legs           roles.c 
##                 0                 0                 0                 0
##  localized_name     primary_attr attack_type  base_health  base_health_regen
##  Length:117         agi:37       Melee :56   Min.   :200   Min.   :0.2500   
##  Class :character   int:42       Ranged:61   1st Qu.:200   1st Qu.:0.2500   
##  Mode  :character   str:38                   Median :200   Median :0.5000   
##                                              Mean   :200   Mean   :0.9766   
##                                              3rd Qu.:200   3rd Qu.:1.5000   
##                                              Max.   :200   Max.   :3.2500   
##                                                            NA's   :85       
##    base_mana  base_mana_regen   base_armor          base_mr     
##  Min.   :75   Min.   :0       Min.   :-3.00000   Min.   :10.00  
##  1st Qu.:75   1st Qu.:0       1st Qu.:-1.00000   1st Qu.:25.00  
##  Median :75   Median :0       Median : 0.00000   Median :25.00  
##  Mean   :75   Mean   :0       Mean   : 0.04701   Mean   :24.87  
##  3rd Qu.:75   3rd Qu.:0       3rd Qu.: 1.00000   3rd Qu.:25.00  
##  Max.   :75   Max.   :0       Max.   : 7.00000   Max.   :25.00  
##                                                                 
##  base_attack_min base_attack_max    base_str        base_agi    
##  Min.   : 9.00   Min.   :11.00   Min.   :12.00   Min.   : 0.00  
##  1st Qu.:22.00   1st Qu.:28.00   1st Qu.:19.00   1st Qu.:15.00  
##  Median :26.00   Median :34.00   Median :21.00   Median :18.00  
##  Mean   :26.53   Mean   :33.98   Mean   :21.21   Mean   :18.11  
##  3rd Qu.:30.00   3rd Qu.:39.00   3rd Qu.:23.00   3rd Qu.:22.00  
##  Max.   :62.00   Max.   :70.00   Max.   :30.00   Max.   :36.00  
##                                                                 
##     base_int        str_gain        agi_gain        int_gain    
##  Min.   :12.00   Min.   :1.300   Min.   :0.000   Min.   :1.000  
##  1st Qu.:16.00   1st Qu.:2.200   1st Qu.:1.500   1st Qu.:1.700  
##  Median :18.00   Median :2.600   Median :1.900   Median :2.000  
##  Mean   :18.97   Mean   :2.709   Mean   :2.111   Mean   :2.355  
##  3rd Qu.:22.00   3rd Qu.:3.100   3rd Qu.:2.600   3rd Qu.:3.100  
##  Max.   :30.00   Max.   :4.600   Max.   :4.800   Max.   :5.200  
##                                                                 
##   attack_range   projectile_speed  attack_rate      move_speed   
##  Min.   :140.0   Min.   :   0.0   Min.   :1.300   Min.   :270.0  
##  1st Qu.:150.0   1st Qu.: 900.0   1st Qu.:1.700   1st Qu.:290.0  
##  Median :350.0   Median : 900.0   Median :1.700   Median :295.0  
##  Mean   :349.5   Mean   : 942.7   Mean   :1.692   Mean   :297.9  
##  3rd Qu.:550.0   3rd Qu.:1000.0   3rd Qu.:1.700   3rd Qu.:305.0  
##  Max.   :700.0   Max.   :3000.0   Max.   :2.000   Max.   :330.0  
##                                                                  
##    turn_rate           legs         roles.c         
##  Min.   :0.5000   Min.   :0.000   Length:117        
##  1st Qu.:0.5000   1st Qu.:2.000   Class :character  
##  Median :0.5000   Median :2.000   Mode  :character  
##  Mean   :0.5919   Mean   :2.085                     
##  3rd Qu.:0.6000   3rd Qu.:2.000                     
##  Max.   :1.0000   Max.   :8.000                     
## 

from the summary above we can see that base_health, base_mana, and base_mana_regen only have one value, we’ll remove them. base_health_regen have 85 NA, we’ll change it to 0. And i’ll convert roles.c into boolean value just like what i do in previous chapter of DotaScience

##    localized_name      primary_attr       attack_type base_health_regen 
##                 0                 0                 0                 0 
##        base_armor           base_mr   base_attack_min   base_attack_max 
##                 0                 0                 0                 0 
##          base_str          base_agi          base_int          str_gain 
##                 0                 0                 0                 0 
##          agi_gain          int_gain      attack_range  projectile_speed 
##                 0                 0                 0                 0 
##       attack_rate        move_speed         turn_rate              legs 
##                 0                 0                 0                 0 
##        rolesCarry     rolesDisabler      rolesDurable       rolesEscape 
##                 0                 0                 0                 0 
##    rolesInitiator      rolesJungler        rolesNuker       rolesPusher 
##                 0                 0                 0                 0 
##      rolesSupport 
##                 0
## Observations: 117
## Variables: 29
## $ localized_name    <chr> "Anti-Mage", "Axe", "Bane", "Bloodseeker", "Cryst...
## $ primary_attr      <fct> agi, str, int, agi, int, agi, str, agi, agi, agi,...
## $ attack_type       <fct> Melee, Melee, Ranged, Melee, Ranged, Ranged, Mele...
## $ base_health_regen <dbl> 0.25, 2.75, 0.00, 0.00, 0.00, 0.25, 1.00, 0.50, 0...
## $ base_armor        <dbl> -1, -2, 1, 0, 0, -3, 2, 0, -1, -2, 0, 0, -1, -2, ...
## $ base_mr           <dbl> 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 2...
## $ base_attack_min   <dbl> 29, 24, 35, 33, 30, 19, 27, 12, 25, 9, 15, 22, 30...
## $ base_attack_max   <dbl> 33, 28, 41, 39, 36, 30, 37, 16, 30, 18, 21, 44, 4...
## $ base_str          <dbl> 23, 25, 23, 24, 18, 17, 22, 21, 18, 20, 19, 19, 1...
## $ base_agi          <dbl> 24, 20, 23, 24, 16, 29, 12, 36, 18, 24, 20, 29, 2...
## $ base_int          <dbl> 12, 18, 23, 18, 14, 15, 16, 14, 19, 13, 18, 19, 2...
## $ str_gain          <dbl> 1.3, 3.4, 2.6, 2.7, 2.2, 1.9, 3.7, 2.2, 2.2, 3.0,...
## $ agi_gain          <dbl> 3.2, 2.2, 2.6, 3.5, 1.6, 2.8, 1.4, 2.8, 3.7, 4.3,...
## $ int_gain          <dbl> 1.8, 1.6, 2.6, 1.7, 3.3, 1.4, 1.8, 1.4, 1.9, 1.1,...
## $ attack_range      <dbl> 150, 150, 400, 150, 600, 625, 150, 150, 630, 350,...
## $ projectile_speed  <dbl> 0, 900, 900, 900, 900, 1250, 0, 0, 900, 1300, 120...
## $ attack_rate       <dbl> 1.4, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.4, 1.7, 1.5,...
## $ move_speed        <dbl> 310, 295, 305, 295, 275, 285, 310, 300, 290, 280,...
## $ turn_rate         <dbl> 0.5, 0.6, 0.6, 0.5, 0.5, 0.7, 0.9, 0.6, 0.5, 0.6,...
## $ legs              <dbl> 2, 2, 4, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 6, 2...
## $ rolesCarry        <int> 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1...
## $ rolesDisabler     <int> 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1...
## $ rolesDurable      <int> 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0...
## $ rolesEscape       <int> 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1...
## $ rolesInitiator    <int> 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1...
## $ rolesJungler      <int> 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ rolesNuker        <int> 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ rolesPusher       <int> 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0...
## $ rolesSupport      <int> 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0...

from my experience of playing dota (7 years approx) heroes with strength attr tend to have large armor, higher health and regen and usually melee attack type. Agility heroes have the fastest attack rate and movement speed. And Int heroes has the most mana but lower armor and hp. however thats based on my experience, lets see what data said

bp1 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_armor, fill = primary_attr)) +
  geom_boxplot(show.legend = F) + theme_bw() + labs(title = "base_armor") + 
  scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))

bp2 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_attack_max, fill = primary_attr)) +
  geom_boxplot(show.legend = F) + theme_bw() + labs(title = "max_attack") + 
  scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))

bp3 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_str, fill = primary_attr)) +
  geom_boxplot(show.legend = F) + theme_bw() + labs(title = "base_str") + 
  scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))

bp4 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_agi, fill = primary_attr)) +
  geom_boxplot(show.legend = F) + theme_bw() + labs(title = "base_agi") + 
  scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))

bp5 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_int, fill = primary_attr)) +
  geom_boxplot(show.legend = F) + theme_bw() + labs(title = "base_int") + 
  scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))

bp6 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = attack_range, fill = primary_attr)) +
  geom_boxplot(show.legend = F) + theme_bw() + labs(title = "attack_range") + 
  scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))

bp7 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = attack_rate, fill = primary_attr)) +
  geom_boxplot(show.legend = F) + theme_bw() + labs(title = "attack_rate") + 
  scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))

bp8 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = move_speed, fill = primary_attr)) +
  geom_boxplot(show.legend = F) + theme_bw() + labs(title = "move_speed") + 
  scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))

bp9 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_health_regen, fill = primary_attr)) +
  geom_boxplot(show.legend = F) + theme_bw() + labs(title = "health_regen") + 
  scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))

plot_grid(bp1,bp2,bp3,bp4,bp5,bp6,bp7,bp8,bp9)

looks like most of my guesses are wrong. highest attack rate are held by str type, most str also have higher move speed, and most agi hero have higher health regen. that’s why we need deeper analysis based from data. it might be usefull for profesional players to have detailed knowledge about what heroes they have to use against certain heroes, what heroes are similliar, and what heroes are suitable for certain conditions.

To solve the problem, we’ll do clustering. we’ll group heroes based on their similarity.

Clustering (K-means)

k-means only need numeric value, so we’ll select only numeric data. We also scale the data because most of the variable have different scale

Cluster analysis

Let’s make a 3d plot to see how our heroes clustered by its base stats

there’s no clear distinction on each cluster based on their base stats. Cluster 2,3, and 4 are mostly cluster of heroes based on their primary_attribut. But cluster 1 have the combination of all atribut. Cluster 1 placed in center means that their base stats are on average of all heroes. Cluster 1 are the best heroes to pick when your team need some balancing in a terms of attribut.

Clusters based on heroe’s passive attribute

  • as we can see, cluster 1 have the highest base_armor, and slow move_speed. But overall, cluster 1 have the average value attribute. if we add our previous analysis from primary attribute, we can conclude that maybe most heroes in cluster 1 are durable (tanky) from all existing primary attributes.
  • cluster 2 however has many characteristics of int heroes, for example: lowest base_armor and health regen (because most of int heroes focus on mana than health), low_base attack, and highest attack range (because most of int heroes are ranged attack tpye). If only we have mana_regen or mana_skill_consumption kind of variables, i believe we can seperate heroes cluster even better
  • cluster 3 is a cluster for agi heroes. we can see that from low base armor, low attack, but high projectile_speed and attack rate since agi heroes are very dependent on speed. (note: low attack_rate mean higher attack speed. attack_rate indicates how many attack are happen in one second). But somehow cluster 3 have the highest base_health_regen which i thought i should be in str heroes.
  • cluster 4 have many characteristics of str heroes. high armor, base attack, and attack rate. str heroes depend on armor and health but low speed and mostly are melee attack type. Everything are summarized on the table above.

After that, lets see how our clusters seperate heroes based on their role

  • cluster 1: have the highest amount of disabler and jungler heroes, but have lowest carry. also have high support and initiator. i can say that heroes in cluster 1 are mostly semi-support hero who help carry to kill enemies and initiate battle
  • cluster 2: have the lowest durable and initiator but have the most support and apparently all of them are nuker heroes. its also match to int characteristics where most of support heroes are form int
  • cluster 3: have the highest carry, escape, and pusher but low disabler, support, jungler, and initiator. looks like heroes in cluster 3 are meant to kill enemies
  • cluster 4: have the highest disabler, durable, and initiator. In the game, heroes in cluster 4 are most likely who will start the clash/battle and also trying to disturb enemies.

Principal Component Analysis (PCA)

Another well-known unsupervised method is Principal Component Analysis or PCA. PCA looks for correlation within our data and use that redundancy to create a new matrix with just enough dimensions to explain most of the variance in the original data. New variables that are created by PCA is called principal component. PCA can be used for dimensionality reduction, pattern discovery, Identify variables that are highly correlated with others and Visualizing high dimensional data.

But not all data are suitable for PCA. first of all, it need lots of variables (dimensions). our data only have 29 variables, Is it enough? well i don’t know. Uncorrelated variables are bad for pca (also known as blind tasting), so if our data have lots of correlated data (Logistic Machinery), we are good to do PCA.

Shall we do the PCA?

##                   base_health_regen   base_armor      base_mr base_attack_min
## base_health_regen       1.000000000  0.203201707  0.037806632     0.045388788
## base_armor              0.203201707  1.000000000  0.113862440     0.064023192
## base_mr                 0.037806632  0.113862440  1.000000000     0.040142519
## base_attack_min         0.045388788  0.064023192  0.040142519     1.000000000
## base_attack_max        -0.003970871  0.055500043  0.010240522     0.893240113
## base_str                0.063935162  0.028802221 -0.023228541     0.369776595
## base_agi                0.192101926 -0.057882319  0.134211230    -0.335981620
## base_int               -0.225781529  0.050904767 -0.072392125    -0.161371199
## str_gain               -0.003294229 -0.030688458 -0.012598434     0.376941947
## agi_gain                0.155012005  0.073730991  0.088243931    -0.233047934
## int_gain               -0.231043921  0.033731005 -0.057513149    -0.147328320
## attack_range           -0.300021591 -0.238699274 -0.115301408    -0.401369267
## projectile_speed       -0.129522453 -0.072783689  0.009624639    -0.316839246
## attack_rate             0.050410163 -0.009360779 -0.008872137     0.301260812
## move_speed             -0.001965692  0.200100097  0.084459026    -0.088089389
## turn_rate               0.095003863 -0.009865875  0.064372919    -0.061588180
## legs                    0.111219915  0.100181407  0.006891421     0.080637116
## rolesCarry              0.023715097  0.025627993  0.102028865    -0.086043698
## rolesDisabler           0.089460872  0.028510953 -0.055744577     0.164691029
## rolesDurable           -0.047932653 -0.096188531 -0.113310745     0.348264739
## rolesEscape             0.211210351  0.008257484  0.076080072     0.006606846
## rolesInitiator          0.159517586  0.046376557  0.084492654     0.348920049
## rolesJungler            0.048469020  0.004516243  0.035605456    -0.024889765
## rolesNuker             -0.186356834  0.063166455 -0.058195356    -0.026782712
## rolesPusher            -0.090405458  0.102423578 -0.161738474    -0.097872611
## rolesSupport           -0.148239094 -0.085189908 -0.121801500     0.041718605
##                   base_attack_max    base_str    base_agi    base_int
## base_health_regen    -0.003970871  0.06393516  0.19210193 -0.22578153
## base_armor            0.055500043  0.02880222 -0.05788232  0.05090477
## base_mr               0.010240522 -0.02322854  0.13421123 -0.07239213
## base_attack_min       0.893240113  0.36977660 -0.33598162 -0.16137120
## base_attack_max       1.000000000  0.31526680 -0.34034754 -0.06532644
## base_str              0.315266797  1.00000000 -0.31096115 -0.04989530
## base_agi             -0.340347540 -0.31096115  1.00000000 -0.16320210
## base_int             -0.065326444 -0.04989530 -0.16320210  1.00000000
## str_gain              0.366455547  0.55790840 -0.41821154 -0.19559598
## agi_gain             -0.236758450 -0.38728934  0.60636444 -0.29378019
## int_gain             -0.089326157 -0.19316768 -0.26112723  0.57287616
## attack_range         -0.328987090 -0.47840044 -0.07312285  0.40713022
## projectile_speed     -0.237965373 -0.15090064  0.01335694  0.15700565
## attack_rate           0.333343133  0.24360649 -0.24933086 -0.04219480
## move_speed           -0.086516964  0.05938339  0.08869069  0.07819383
## turn_rate            -0.021105033  0.05368861  0.04989100 -0.12897915
## legs                  0.100086695 -0.04216962 -0.09823658  0.07694346
## rolesCarry           -0.053766099 -0.13579613  0.38004588 -0.25066271
## rolesDisabler         0.144455903  0.23992060 -0.12816677  0.03453965
## rolesDurable          0.285238989  0.40254009 -0.16380730 -0.34224284
## rolesEscape           0.003528075 -0.06387739  0.20830304 -0.15408203
## rolesInitiator        0.281115187  0.41681097 -0.11478937 -0.21261266
## rolesJungler         -0.039426988  0.12007884 -0.06582593 -0.06232600
## rolesNuker           -0.001202266 -0.07909378 -0.18660811  0.26293153
## rolesPusher          -0.043324003 -0.17413072  0.04739182 -0.02039034
## rolesSupport          0.067107384 -0.01015775 -0.18298015  0.35746364
##                       str_gain     agi_gain    int_gain attack_range
## base_health_regen -0.003294229  0.155012005 -0.23104392  -0.30002159
## base_armor        -0.030688458  0.073730991  0.03373100  -0.23869927
## base_mr           -0.012598434  0.088243931 -0.05751315  -0.11530141
## base_attack_min    0.376941947 -0.233047934 -0.14732832  -0.40136927
## base_attack_max    0.366455547 -0.236758450 -0.08932616  -0.32898709
## base_str           0.557908404 -0.387289341 -0.19316768  -0.47840044
## base_agi          -0.418211544  0.606364436 -0.26112723  -0.07312285
## base_int          -0.195595976 -0.293780193  0.57287616   0.40713022
## str_gain           1.000000000 -0.379817532 -0.37545694  -0.49789693
## agi_gain          -0.379817532  1.000000000 -0.35245684  -0.03547236
## int_gain          -0.375456945 -0.352456843  1.00000000   0.65327394
## attack_range      -0.497896926 -0.035472358  0.65327394   1.00000000
## projectile_speed  -0.200648790  0.170035858  0.15417490   0.37329249
## attack_rate        0.311677414 -0.147849056 -0.11917729  -0.16130355
## move_speed        -0.016405833 -0.006214705 -0.04799792  -0.22926259
## turn_rate          0.052540428  0.073724058 -0.10352321  -0.12277283
## legs               0.061206589 -0.091415508  0.07801556  -0.07526661
## rolesCarry        -0.126050682  0.406179487 -0.37645229  -0.15365138
## rolesDisabler      0.284015887 -0.221388687  0.06150720  -0.01868355
## rolesDurable       0.526353642 -0.190438606 -0.45690790  -0.46486088
## rolesEscape       -0.035038169  0.232430432 -0.15984075  -0.20597604
## rolesInitiator     0.532358079 -0.160728152 -0.31205431  -0.42735711
## rolesJungler      -0.001570852 -0.025962103  0.06910333   0.03258230
## rolesNuker         0.031582917 -0.185462296  0.29785518   0.11954721
## rolesPusher       -0.180037697  0.076027842  0.04752906   0.15689751
## rolesSupport      -0.069145205 -0.236341970  0.37754004   0.34412254
##                   projectile_speed  attack_rate   move_speed    turn_rate
## base_health_regen     -0.129522453  0.050410163 -0.001965692  0.095003863
## base_armor            -0.072783689 -0.009360779  0.200100097 -0.009865875
## base_mr                0.009624639 -0.008872137  0.084459026  0.064372919
## base_attack_min       -0.316839246  0.301260812 -0.088089389 -0.061588180
## base_attack_max       -0.237965373  0.333343133 -0.086516964 -0.021105033
## base_str              -0.150900637  0.243606493  0.059383389  0.053688615
## base_agi               0.013356940 -0.249330862  0.088690694  0.049890998
## base_int               0.157005648 -0.042194803  0.078193827 -0.128979148
## str_gain              -0.200648790  0.311677414 -0.016405833  0.052540428
## agi_gain               0.170035858 -0.147849056 -0.006214705  0.073724058
## int_gain               0.154174899 -0.119177289 -0.047997916 -0.103523211
## attack_range           0.373292492 -0.161303555 -0.229262593 -0.122772834
## projectile_speed       1.000000000  0.009905370 -0.144177941  0.024733195
## attack_rate            0.009905370  1.000000000 -0.026135640  0.024505403
## move_speed            -0.144177941 -0.026135640  1.000000000  0.004763117
## turn_rate              0.024733195  0.024505403  0.004763117  1.000000000
## legs                  -0.072508095  0.076971519  0.120193542 -0.096266141
## rolesCarry             0.053725036 -0.006131039  0.092011631  0.021981449
## rolesDisabler         -0.112758464  0.147760770 -0.025292492  0.007057357
## rolesDurable          -0.152603233  0.252696754  0.050717549 -0.041884377
## rolesEscape           -0.102913206 -0.157650508 -0.169738919  0.227806036
## rolesInitiator        -0.215113728  0.289022349 -0.055825544  0.114059914
## rolesJungler          -0.013396425  0.081773961  0.172000050  0.004204537
## rolesNuker             0.085704873 -0.138125962 -0.070307053  0.018925859
## rolesPusher            0.016118303 -0.061624577  0.103583238 -0.241165599
## rolesSupport           0.016658319 -0.094060224 -0.070730597 -0.073687966
##                           legs   rolesCarry rolesDisabler rolesDurable
## base_health_regen  0.111219915  0.023715097   0.089460872 -0.047932653
## base_armor         0.100181407  0.025627993   0.028510953 -0.096188531
## base_mr            0.006891421  0.102028865  -0.055744577 -0.113310745
## base_attack_min    0.080637116 -0.086043698   0.164691029  0.348264739
## base_attack_max    0.100086695 -0.053766099   0.144455903  0.285238989
## base_str          -0.042169616 -0.135796128   0.239920600  0.402540091
## base_agi          -0.098236584  0.380045883  -0.128166766 -0.163807303
## base_int           0.076943463 -0.250662707   0.034539649 -0.342242839
## str_gain           0.061206589 -0.126050682   0.284015887  0.526353642
## agi_gain          -0.091415508  0.406179487  -0.221388687 -0.190438606
## int_gain           0.078015563 -0.376452293   0.061507197 -0.456907905
## attack_range      -0.075266607 -0.153651379  -0.018683553 -0.464860878
## projectile_speed  -0.072508095  0.053725036  -0.112758464 -0.152603233
## attack_rate        0.076971519 -0.006131039   0.147760770  0.252696754
## move_speed         0.120193542  0.092011631  -0.025292492  0.050717549
## turn_rate         -0.096266141  0.021981449   0.007057357 -0.041884377
## legs               1.000000000 -0.141204845   0.044562483 -0.151658732
## rolesCarry        -0.141204845  1.000000000  -0.235104759  0.150271924
## rolesDisabler      0.044562483 -0.235104759   1.000000000  0.136412241
## rolesDurable      -0.151658732  0.150271924   0.136412241  1.000000000
## rolesEscape        0.090581154  0.115248388  -0.140126436 -0.209118541
## rolesInitiator    -0.037722594 -0.103183962   0.468546824  0.270010509
## rolesJungler       0.060341925 -0.113252049  -0.001485407 -0.001337142
## rolesNuker         0.178473956 -0.265134358   0.097112971 -0.222522900
## rolesPusher       -0.008227833  0.204270765  -0.238462437 -0.106985128
## rolesSupport       0.005000015 -0.481535677   0.256815622 -0.190694043
##                    rolesEscape rolesInitiator rolesJungler   rolesNuker
## base_health_regen  0.211210351     0.15951759  0.048469020 -0.186356834
## base_armor         0.008257484     0.04637656  0.004516243  0.063166455
## base_mr            0.076080072     0.08449265  0.035605456 -0.058195356
## base_attack_min    0.006606846     0.34892005 -0.024889765 -0.026782712
## base_attack_max    0.003528075     0.28111519 -0.039426988 -0.001202266
## base_str          -0.063877386     0.41681097  0.120078845 -0.079093778
## base_agi           0.208303037    -0.11478937 -0.065825932 -0.186608113
## base_int          -0.154082030    -0.21261266 -0.062325997  0.262931527
## str_gain          -0.035038169     0.53235808 -0.001570852  0.031582917
## agi_gain           0.232430432    -0.16072815 -0.025962103 -0.185462296
## int_gain          -0.159840749    -0.31205431  0.069103331  0.297855181
## attack_range      -0.205976038    -0.42735711  0.032582305  0.119547211
## projectile_speed  -0.102913206    -0.21511373 -0.013396425  0.085704873
## attack_rate       -0.157650508     0.28902235  0.081773961 -0.138125962
## move_speed        -0.169738919    -0.05582554  0.172000050 -0.070307053
## turn_rate          0.227806036     0.11405991  0.004204537  0.018925859
## legs               0.090581154    -0.03772259  0.060341925  0.178473956
## rolesCarry         0.115248388    -0.10318396 -0.113252049 -0.265134358
## rolesDisabler     -0.140126436     0.46854682 -0.001485407  0.097112971
## rolesDurable      -0.209118541     0.27001051 -0.001337142 -0.222522900
## rolesEscape        1.000000000    -0.04520132 -0.001337142 -0.145037247
## rolesInitiator    -0.045201316     1.00000000 -0.040823413 -0.040112578
## rolesJungler      -0.001337142    -0.04082341  1.000000000 -0.327764146
## rolesNuker        -0.145037247    -0.04011258 -0.327764146  1.000000000
## rolesPusher       -0.106985128    -0.20427076  0.135121734 -0.124072917
## rolesSupport      -0.190694043    -0.08827139 -0.027192895  0.202024737
##                    rolesPusher rolesSupport
## base_health_regen -0.090405458 -0.148239094
## base_armor         0.102423578 -0.085189908
## base_mr           -0.161738474 -0.121801500
## base_attack_min   -0.097872611  0.041718605
## base_attack_max   -0.043324003  0.067107384
## base_str          -0.174130720 -0.010157749
## base_agi           0.047391822 -0.182980155
## base_int          -0.020390340  0.357463636
## str_gain          -0.180037697 -0.069145205
## agi_gain           0.076027842 -0.236341970
## int_gain           0.047529062  0.377540040
## attack_range       0.156897506  0.344122543
## projectile_speed   0.016118303  0.016658319
## attack_rate       -0.061624577 -0.094060224
## move_speed         0.103583238 -0.070730597
## turn_rate         -0.241165599 -0.073687966
## legs              -0.008227833  0.005000015
## rolesCarry         0.204270765 -0.481535677
## rolesDisabler     -0.238462437  0.256815622
## rolesDurable      -0.106985128 -0.190694043
## rolesEscape       -0.106985128 -0.190694043
## rolesInitiator    -0.204270765 -0.088271395
## rolesJungler       0.135121734 -0.027192895
## rolesNuker        -0.124072917  0.202024737
## rolesPusher        1.000000000 -0.109136486
## rolesSupport      -0.109136486  1.000000000

it turns out our data have a very low correlation but some variables like str/agi/int base are correlated to str/agi/int gain. rolesCarry also have negative influence to rolesSupport. It make sense since it is very rare for carry to be a support and vice versa. The presence of carry roles can explain support roles as well as str/agi/int base to str/agi/int gain. It’ll make multicolinearity if we do supervised learning and to avoid that, lets make PCA.

Build and intepret PCA

## 
## Call:
## PCA(X = for.pca, quali.sup = c(1:2), graph = F) 
## 
## 
## Eigenvalues
##                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6   Dim.7
## Variance               4.624   3.582   1.772   1.649   1.452   1.311   1.171
## % of var.             17.785  13.779   6.817   6.342   5.583   5.042   4.502
## Cumulative % of var.  17.785  31.564  38.381  44.723  50.306  55.348  59.851
##                        Dim.8   Dim.9  Dim.10  Dim.11  Dim.12  Dim.13  Dim.14
## Variance               1.127   1.059   0.941   0.855   0.777   0.759   0.695
## % of var.              4.335   4.073   3.620   3.289   2.987   2.920   2.675
## Cumulative % of var.  64.185  68.259  71.878  75.167  78.154  81.073  83.748
##                       Dim.15  Dim.16  Dim.17  Dim.18  Dim.19  Dim.20  Dim.21
## Variance               0.603   0.570   0.526   0.440   0.410   0.379   0.358
## % of var.              2.321   2.193   2.024   1.693   1.578   1.458   1.377
## Cumulative % of var.  86.069  88.262  90.286  91.979  93.557  95.016  96.393
##                       Dim.22  Dim.23  Dim.24  Dim.25  Dim.26
## Variance               0.284   0.218   0.215   0.136   0.085
## % of var.              1.091   0.838   0.826   0.525   0.328
## Cumulative % of var.  97.483  98.321  99.147  99.672 100.000
## 
## Individuals (the 10 first)
##                       Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2  
## 1                 |  6.261 | -1.173  0.254  0.035 | -3.118  2.320  0.248 |
## 2                 |  6.007 |  2.376  1.044  0.156 | -1.328  0.420  0.049 |
## 3                 |  3.906 |  0.304  0.017  0.006 |  1.484  0.525  0.144 |
## 4                 |  4.468 |  1.086  0.218  0.059 | -1.275  0.388  0.081 |
## 5                 |  4.792 | -1.209  0.270  0.064 |  2.046  0.999  0.182 |
## 6                 |  5.383 | -2.426  1.088  0.203 | -2.530  1.527  0.221 |
## 7                 |  5.223 |  2.466  1.124  0.223 |  1.041  0.259  0.040 |
## 8                 |  7.455 | -2.446  1.106  0.108 | -4.975  5.904  0.445 |
## 9                 |  3.957 | -2.011  0.747  0.258 | -0.421  0.042  0.011 |
## 10                |  6.277 | -1.661  0.510  0.070 | -3.464  2.864  0.305 |
##                    Dim.3    ctr   cos2  
## 1                  1.034  0.516  0.027 |
## 2                  0.352  0.060  0.003 |
## 3                  0.553  0.148  0.020 |
## 4                 -0.069  0.002  0.000 |
## 5                 -0.111  0.006  0.001 |
## 6                 -1.212  0.709  0.051 |
## 7                  2.307  2.567  0.195 |
## 8                  0.446  0.096  0.004 |
## 9                  0.711  0.244  0.032 |
## 10                 0.252  0.031  0.002 |
## 
## Variables (the 10 first)
##                      Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3    ctr
## base_health_regen |  0.181  0.712  0.033 | -0.337  3.161  0.113 |  0.426 10.254
## base_armor        |  0.067  0.097  0.004 | -0.066  0.121  0.004 |  0.216  2.641
## base_mr           |  0.033  0.023  0.001 | -0.193  1.040  0.037 |  0.262  3.865
## base_attack_min   |  0.701 10.621  0.491 |  0.190  1.006  0.036 | -0.020  0.023
## base_attack_max   |  0.634  8.702  0.402 |  0.231  1.493  0.053 | -0.049  0.137
## base_str          |  0.680  9.995  0.462 |  0.182  0.927  0.033 | -0.044  0.110
## base_agi          | -0.334  2.416  0.112 | -0.658 12.102  0.434 |  0.178  1.791
## base_int          | -0.373  3.009  0.139 |  0.583  9.499  0.340 |  0.047  0.127
## str_gain          |  0.784 13.276  0.614 |  0.157  0.684  0.025 | -0.038  0.080
## agi_gain          | -0.313  2.120  0.098 | -0.702 13.769  0.493 |  0.110  0.678
##                     cos2  
## base_health_regen  0.182 |
## base_armor         0.047 |
## base_mr            0.068 |
## base_attack_min    0.000 |
## base_attack_max    0.002 |
## base_str           0.002 |
## base_agi           0.032 |
## base_int           0.002 |
## str_gain           0.001 |
## agi_gain           0.012 |
## 
## Supplementary categories
##                       Dist    Dim.1   cos2 v.test    Dim.2   cos2 v.test  
## agi               |  2.303 | -0.892  0.150 -3.039 | -2.069  0.807 -8.006 |
## int               |  2.207 | -1.295  0.345 -4.855 |  1.690  0.586  7.195 |
## str               |  2.406 |  2.300  0.914  7.991 |  0.147  0.004  0.580 |
## Melee             |  1.856 |  1.645  0.785  7.896 | -0.706  0.144 -3.847 |
## Ranged            |  1.704 | -1.510  0.785 -7.896 |  0.648  0.144  3.847 |
##                    Dim.3   cos2 v.test  
## agi                0.077  0.001  0.424 |
## int                0.097  0.002  0.589 |
## str               -0.183  0.006 -1.025 |
## Melee              0.193  0.011  1.497 |
## Ranged            -0.177  0.011 -1.497 |

From the summary above, we need 15 dimensions to cover 86% variance of data.

Dim 1 only cover 17.78% variance of data and dim 2 only 13.78%. That’s kinda low, i was expect something like more than 30% in Dim 1. lets visualize the percentages of variance covered by each pca

PC 1 and 2 combined only covers 31% approx. if we combine 10 first dimension, it covers 71.8% information of our data. Surely we can reduce the numbers of variables of our data for future supervised learning but the changes are not significant since our data has low multicolinearty in the first place.

Let’s see how each numeric variables are covered by pca

we can see that PC 1 and 2 are not enough to picture our data clearly (it only covers 31% anyway). But we have some interesting insight here where the difference of 3 primary attr can be explained by PC 1 and 2. All of the attr have negative influence to each other but not so significant.

From the plot we can see what variables are contribute to what dimension/PC from the plot above. For more clearer insight, lets draw a plot to see what variables contribute to both PC.

From the plot above, we know that str_gain have the highest contribution to PC 1. In PC 2, agi_gain and int_gain have the highest contribution. Red line in the plot indicates average contribution on each pc. if we take variables that contribute above average line, almost every variables are contribute against each PC (for example: str_gain have high contribution to PC 1 but low in PC 2. roles carry have high contribution in PC 2 but very low in PC 1. some variables like int_gain have high contribution in both PC tho), It means the PC have succesfuly seperated our data in the terms of contribution since PC 2 are made from a line that perpendicular to PC 1.

Combining PCA and Clustering

Let’s see how our cluster distributed in PCA

From the plots we can conclude that: * note: remember. this conclusion are made by only 31% of data.
- cluster 4 and cluster 2 are somewhat similiar
- heroes in cluster 1 are the most unique
- cluster 1 are the opposite of cluster 4
- heroes in cluster 1 are highly contributed by variables rolesSupport, int_gain, base_int, attack_range, and rolesNuker
- heroes in cluster 2 are highly contributed by variables base_attack_max/min, base_str, str_gain, attack_rate, rolesDisabler, rolesInitiator, and rolesDurable.
- heroes in cluster 3 are highly contributed by variables base_agi, agi_gain, rolesCarry, rolesEscape, base_mr, and move_speed, rolesPusher
- heres in cluster 4 are highly contributed by base_healh_regen, turn_rate, and base_armor.

those conclusion made by intepretaion from PC 1 and PC 2 which only portrayed 31% of data. it’s hard to intepret conclusion by only 2 PC because i’m afraid it will be misleading. So from the analysis, i can conclude that Dota2 heroes data are not suitable to be analyzed with PCA.

lastly, lets convert our PCA to df

  • bonus: Here if you want to take a look how heroes cluster distributed in PC 1,2 and 3 (only portrayed 38.3% data)

Conclusion

Finally, here’s some insight we can get from unsupervised learning for Dota2 heroes data: - There’s no clear distinction on each cluster based on their base stats, but there’s slightly different based on their roles and primary attribute
- cluster 1 are the only unique cluster made from combination of all hero’s primary attribute, meanwhile cluster 2-4 have many characteristics with intelligence, agility, strength primarry attribute sequentially
- We need 15 dimensions to cover 86% variance of data, or 26 dimensions to cover all data. It means if we use all the PC to reduce dimensionality of our main data, we only do 13% variable reduction (1 - (total.dimension/total.actual.variable)100)
- Or if 80% variance of data is enough for you, we only need 13 dimensions, which mean we’re able to reduce 50% varible to still retain 80% of data. (1 - (15/30)
100)
- It’s hard to intepret conclusion by only 2 first PC. We still need a lot of dimensions to summarize our data clearly. Thus, Dota2 heroes data are not suitable to be analyzed with PCA.

Thank you !

Shadow Fiend by chroneco

Shadow Fiend by chroneco