For this second part of DotaScience
, we’ll do unsupervised learning: Clustering and Principal Component Analysis (PCA). Dota 2 is a multiplayer online battle arena (MOBA) video game developed and published by Valve. Dota 2 is played in matches between two teams (called Radiant and Dire) of five players, with each team occupying and defending their own separate base on the map. Each of the ten players independently controls a powerful character, known as a hero
. In this article, we will analyze all of 117 heroes in Dota2 with unsupervised learning i mentioned before.
This project is based on an old Kaggle
competition. You can found all the datasets and the competition here. Well, we will not gonna predict the match winner (I already do that, check previous chapter here) but do a deeper analysis with the heroes instead. Is important for you to know this dataset were made one year ago. Thus, Recent updates of Dota are not represented in this analysis. However let’s hope we can found interesting insight that still related with current meta by do Clustering
with Kmeans and dimensionality reduction with PCA
You can load the package into your workspace using the library() function
The competition provide 5 datasets; test, train, hero_names, item_ids, and submission example. We will use hero_names.json
dataset.
# THe hero are seperated by its roles, i.ll combine all of their role into one row by their id seperated by ,
hero.df2 <- hero.df %>% group_by(id) %>%
summarise(roles.c = paste(roles, collapse = ","))
hero.df <- merge(hero.df[,-c("roles")], hero.df2, by = "id")
hero.df <- hero.df[!duplicated(hero.df),]
# take a quick look
glimpse(hero.df)
## Observations: 117
## Variables: 29
## $ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15...
## $ name <chr> "npc_dota_hero_antimage", "npc_dota_hero_axe", "n...
## $ localized_name <chr> "Anti-Mage", "Axe", "Bane", "Bloodseeker", "Cryst...
## $ primary_attr <chr> "agi", "str", "int", "agi", "int", "agi", "str", ...
## $ attack_type <chr> "Melee", "Melee", "Ranged", "Melee", "Ranged", "R...
## $ img <chr> "/apps/dota2/images/heroes/antimage_full.png?", "...
## $ icon <chr> "/apps/dota2/images/heroes/antimage_icon.png", "/...
## $ base_health <dbl> 200, 200, 200, 200, 200, 200, 200, 200, 200, 200,...
## $ base_health_regen <dbl> 0.25, 2.75, NA, NA, NA, 0.25, 1.00, 0.50, NA, NA,...
## $ base_mana <dbl> 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 7...
## $ base_mana_regen <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ base_armor <dbl> -1, -2, 1, 0, 0, -3, 2, 0, -1, -2, 0, 0, -1, -2, ...
## $ base_mr <dbl> 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 2...
## $ base_attack_min <dbl> 29, 24, 35, 33, 30, 19, 27, 12, 25, 9, 15, 22, 30...
## $ base_attack_max <dbl> 33, 28, 41, 39, 36, 30, 37, 16, 30, 18, 21, 44, 4...
## $ base_str <dbl> 23, 25, 23, 24, 18, 17, 22, 21, 18, 20, 19, 19, 1...
## $ base_agi <dbl> 24, 20, 23, 24, 16, 29, 12, 36, 18, 24, 20, 29, 2...
## $ base_int <dbl> 12, 18, 23, 18, 14, 15, 16, 14, 19, 13, 18, 19, 2...
## $ str_gain <dbl> 1.3, 3.4, 2.6, 2.7, 2.2, 1.9, 3.7, 2.2, 2.2, 3.0,...
## $ agi_gain <dbl> 3.2, 2.2, 2.6, 3.5, 1.6, 2.8, 1.4, 2.8, 3.7, 4.3,...
## $ int_gain <dbl> 1.8, 1.6, 2.6, 1.7, 3.3, 1.4, 1.8, 1.4, 1.9, 1.1,...
## $ attack_range <dbl> 150, 150, 400, 150, 600, 625, 150, 150, 630, 350,...
## $ projectile_speed <dbl> 0, 900, 900, 900, 900, 1250, 0, 0, 900, 1300, 120...
## $ attack_rate <dbl> 1.4, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.4, 1.7, 1.5,...
## $ move_speed <dbl> 310, 295, 305, 295, 275, 285, 310, 300, 290, 280,...
## $ turn_rate <dbl> 0.5, 0.6, 0.6, 0.5, 0.5, 0.7, 0.9, 0.6, 0.5, 0.6,...
## $ cm_enabled <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T...
## $ legs <dbl> 2, 2, 4, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 6, 2...
## $ roles.c <chr> "Carry,Escape,Nuker", "Initiator,Durable,Disabler...
first, we remove unused variable
hero.df <- hero.df %>% select(-c("id","name","cm_enabled","img","icon"))
# i want to take a second look to our data, i found some NA in glimpse above and i think we need to deal with it first
# change primary_attr and attack_type to factor
hero.df[,2:3] <- lapply(hero.df[,2:3], as.factor)
colSums(is.na(hero.df))
## localized_name primary_attr attack_type base_health
## 0 0 0 0
## base_health_regen base_mana base_mana_regen base_armor
## 85 0 0 0
## base_mr base_attack_min base_attack_max base_str
## 0 0 0 0
## base_agi base_int str_gain agi_gain
## 0 0 0 0
## int_gain attack_range projectile_speed attack_rate
## 0 0 0 0
## move_speed turn_rate legs roles.c
## 0 0 0 0
## localized_name primary_attr attack_type base_health base_health_regen
## Length:117 agi:37 Melee :56 Min. :200 Min. :0.2500
## Class :character int:42 Ranged:61 1st Qu.:200 1st Qu.:0.2500
## Mode :character str:38 Median :200 Median :0.5000
## Mean :200 Mean :0.9766
## 3rd Qu.:200 3rd Qu.:1.5000
## Max. :200 Max. :3.2500
## NA's :85
## base_mana base_mana_regen base_armor base_mr
## Min. :75 Min. :0 Min. :-3.00000 Min. :10.00
## 1st Qu.:75 1st Qu.:0 1st Qu.:-1.00000 1st Qu.:25.00
## Median :75 Median :0 Median : 0.00000 Median :25.00
## Mean :75 Mean :0 Mean : 0.04701 Mean :24.87
## 3rd Qu.:75 3rd Qu.:0 3rd Qu.: 1.00000 3rd Qu.:25.00
## Max. :75 Max. :0 Max. : 7.00000 Max. :25.00
##
## base_attack_min base_attack_max base_str base_agi
## Min. : 9.00 Min. :11.00 Min. :12.00 Min. : 0.00
## 1st Qu.:22.00 1st Qu.:28.00 1st Qu.:19.00 1st Qu.:15.00
## Median :26.00 Median :34.00 Median :21.00 Median :18.00
## Mean :26.53 Mean :33.98 Mean :21.21 Mean :18.11
## 3rd Qu.:30.00 3rd Qu.:39.00 3rd Qu.:23.00 3rd Qu.:22.00
## Max. :62.00 Max. :70.00 Max. :30.00 Max. :36.00
##
## base_int str_gain agi_gain int_gain
## Min. :12.00 Min. :1.300 Min. :0.000 Min. :1.000
## 1st Qu.:16.00 1st Qu.:2.200 1st Qu.:1.500 1st Qu.:1.700
## Median :18.00 Median :2.600 Median :1.900 Median :2.000
## Mean :18.97 Mean :2.709 Mean :2.111 Mean :2.355
## 3rd Qu.:22.00 3rd Qu.:3.100 3rd Qu.:2.600 3rd Qu.:3.100
## Max. :30.00 Max. :4.600 Max. :4.800 Max. :5.200
##
## attack_range projectile_speed attack_rate move_speed
## Min. :140.0 Min. : 0.0 Min. :1.300 Min. :270.0
## 1st Qu.:150.0 1st Qu.: 900.0 1st Qu.:1.700 1st Qu.:290.0
## Median :350.0 Median : 900.0 Median :1.700 Median :295.0
## Mean :349.5 Mean : 942.7 Mean :1.692 Mean :297.9
## 3rd Qu.:550.0 3rd Qu.:1000.0 3rd Qu.:1.700 3rd Qu.:305.0
## Max. :700.0 Max. :3000.0 Max. :2.000 Max. :330.0
##
## turn_rate legs roles.c
## Min. :0.5000 Min. :0.000 Length:117
## 1st Qu.:0.5000 1st Qu.:2.000 Class :character
## Median :0.5000 Median :2.000 Mode :character
## Mean :0.5919 Mean :2.085
## 3rd Qu.:0.6000 3rd Qu.:2.000
## Max. :1.0000 Max. :8.000
##
from the summary above we can see that base_health
, base_mana
, and base_mana_regen
only have one value, we’ll remove them. base_health_regen
have 85 NA, we’ll change it to 0. And i’ll convert roles.c into boolean value just like what i do in previous chapter of DotaScience
# remove `base_health`, `base_mana`, and `base_mana_regen` from df.
hero.df <- hero.df %>% select(-c("base_health", "base_mana", "base_mana_regen"))
# Change NA in base_health_regen to 0
hero.df$base_health_regen[which(is.na(hero.df$base_health_regen))] <- "0"
# change it back to numeric
hero.df$base_health_regen <- as.double(hero.df$base_health_regen)
hero.df3 <- rbindlist(hero.data)
hero.df3$roles <- as.factor(hero.df3$roles)
hero.roles <- dummy.data.frame(hero.df3[,c("id","roles")])
hero.roles <- hero.roles %>% group_by(id) %>%
dplyr::summarise_all(funs(sum))
hero.roles
hero.df.new <- cbind(hero.df, hero.roles)
# remove roles.c, we dont need it anymore
hero.df.new <- hero.df.new %>% select(-c("roles.c","id"))
# lets re-check our clean data
colSums(is.na(hero.df.new))
## localized_name primary_attr attack_type base_health_regen
## 0 0 0 0
## base_armor base_mr base_attack_min base_attack_max
## 0 0 0 0
## base_str base_agi base_int str_gain
## 0 0 0 0
## agi_gain int_gain attack_range projectile_speed
## 0 0 0 0
## attack_rate move_speed turn_rate legs
## 0 0 0 0
## rolesCarry rolesDisabler rolesDurable rolesEscape
## 0 0 0 0
## rolesInitiator rolesJungler rolesNuker rolesPusher
## 0 0 0 0
## rolesSupport
## 0
## Observations: 117
## Variables: 29
## $ localized_name <chr> "Anti-Mage", "Axe", "Bane", "Bloodseeker", "Cryst...
## $ primary_attr <fct> agi, str, int, agi, int, agi, str, agi, agi, agi,...
## $ attack_type <fct> Melee, Melee, Ranged, Melee, Ranged, Ranged, Mele...
## $ base_health_regen <dbl> 0.25, 2.75, 0.00, 0.00, 0.00, 0.25, 1.00, 0.50, 0...
## $ base_armor <dbl> -1, -2, 1, 0, 0, -3, 2, 0, -1, -2, 0, 0, -1, -2, ...
## $ base_mr <dbl> 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 2...
## $ base_attack_min <dbl> 29, 24, 35, 33, 30, 19, 27, 12, 25, 9, 15, 22, 30...
## $ base_attack_max <dbl> 33, 28, 41, 39, 36, 30, 37, 16, 30, 18, 21, 44, 4...
## $ base_str <dbl> 23, 25, 23, 24, 18, 17, 22, 21, 18, 20, 19, 19, 1...
## $ base_agi <dbl> 24, 20, 23, 24, 16, 29, 12, 36, 18, 24, 20, 29, 2...
## $ base_int <dbl> 12, 18, 23, 18, 14, 15, 16, 14, 19, 13, 18, 19, 2...
## $ str_gain <dbl> 1.3, 3.4, 2.6, 2.7, 2.2, 1.9, 3.7, 2.2, 2.2, 3.0,...
## $ agi_gain <dbl> 3.2, 2.2, 2.6, 3.5, 1.6, 2.8, 1.4, 2.8, 3.7, 4.3,...
## $ int_gain <dbl> 1.8, 1.6, 2.6, 1.7, 3.3, 1.4, 1.8, 1.4, 1.9, 1.1,...
## $ attack_range <dbl> 150, 150, 400, 150, 600, 625, 150, 150, 630, 350,...
## $ projectile_speed <dbl> 0, 900, 900, 900, 900, 1250, 0, 0, 900, 1300, 120...
## $ attack_rate <dbl> 1.4, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.4, 1.7, 1.5,...
## $ move_speed <dbl> 310, 295, 305, 295, 275, 285, 310, 300, 290, 280,...
## $ turn_rate <dbl> 0.5, 0.6, 0.6, 0.5, 0.5, 0.7, 0.9, 0.6, 0.5, 0.6,...
## $ legs <dbl> 2, 2, 4, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 6, 2...
## $ rolesCarry <int> 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1...
## $ rolesDisabler <int> 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1...
## $ rolesDurable <int> 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0...
## $ rolesEscape <int> 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1...
## $ rolesInitiator <int> 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1...
## $ rolesJungler <int> 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ rolesNuker <int> 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ rolesPusher <int> 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0...
## $ rolesSupport <int> 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0...
from my experience of playing dota (7 years approx) heroes with strength attr tend to have large armor, higher health and regen and usually melee attack type. Agility heroes have the fastest attack rate and movement speed. And Int heroes has the most mana but lower armor and hp. however thats based on my experience, lets see what data said
bp1 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_armor, fill = primary_attr)) +
geom_boxplot(show.legend = F) + theme_bw() + labs(title = "base_armor") +
scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))
bp2 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_attack_max, fill = primary_attr)) +
geom_boxplot(show.legend = F) + theme_bw() + labs(title = "max_attack") +
scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))
bp3 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_str, fill = primary_attr)) +
geom_boxplot(show.legend = F) + theme_bw() + labs(title = "base_str") +
scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))
bp4 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_agi, fill = primary_attr)) +
geom_boxplot(show.legend = F) + theme_bw() + labs(title = "base_agi") +
scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))
bp5 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_int, fill = primary_attr)) +
geom_boxplot(show.legend = F) + theme_bw() + labs(title = "base_int") +
scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))
bp6 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = attack_range, fill = primary_attr)) +
geom_boxplot(show.legend = F) + theme_bw() + labs(title = "attack_range") +
scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))
bp7 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = attack_rate, fill = primary_attr)) +
geom_boxplot(show.legend = F) + theme_bw() + labs(title = "attack_rate") +
scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))
bp8 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = move_speed, fill = primary_attr)) +
geom_boxplot(show.legend = F) + theme_bw() + labs(title = "move_speed") +
scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))
bp9 <- ggplot(data = hero.df.new, aes(x = primary_attr, y = base_health_regen, fill = primary_attr)) +
geom_boxplot(show.legend = F) + theme_bw() + labs(title = "health_regen") +
scale_fill_manual(values = c("green","blue","red"))+ theme(plot.title = element_text(size=10))
plot_grid(bp1,bp2,bp3,bp4,bp5,bp6,bp7,bp8,bp9)
looks like most of my guesses are wrong. highest attack rate are held by str type, most str also have higher move speed, and most agi hero have higher health regen. that’s why we need deeper analysis based from data. it might be usefull for profesional players to have detailed knowledge about what heroes they have to use against certain heroes, what heroes are similliar, and what heroes are suitable for certain conditions.
To solve the problem, we’ll do clustering
. we’ll group heroes based on their similarity.
k-means only need numeric value, so we’ll select only numeric data. We also scale the data because most of the variable have different scale
# clustering
df.num <- hero.df.new %>% select_if(is.numeric)
# i want to keep on track by each hero
df.num <- df.num %>% `rownames<-`(hero.df.new$localized_name)
# scaling
df.num <- scale(df.num)
# build the loop from k=2 until k=15 by 1
k_values <- seq(2,15,1)
num_k <- length(k_values)
# make empty table to save k, wss, and btratio
k.df <- tibble(k = rep(0, num_k), wss = rep(0, num_k), btratio = rep(0, num_k))
# evaluate knn for a bunch of values of k
for(i in 1:num_k){
k <- k_values[i]
# build kmeans model from given k in loop start from 2 until 15 k
kmn <- kmeans(df.num,i)
# store k values
k.df[i, 'k'] <- k
# store wss value
k.df[i, 'wss'] <- kmn$tot.withinss
# store btratio value
k.df[i, 'btratio'] <- kmn$betweenss/kmn$totss
}
# draw the plot from loop
wss.p <- ggplot(data = k.df, aes(x = k, y = wss)) + geom_point() + geom_line()+
labs(title = "Within sum of square")
btratio.p <- ggplot(data = k.df, aes(x = k, y = btratio)) + geom_point() + geom_line()+
labs(title = "betweenss/totalss")
plot_grid(wss.p, btratio.p)
For wss plot, we’re looking for a point where diminishing returns start to decrease. And for btratio we also looking for the increase in btratio in not big as the number of K before them. This method is also called elbow method
. In this case, we’ll chose K = 4 to be considered as the best K
Let’s make a 3d plot to see how our heroes clustered by its base stats
# run the plot with `plotly` package to make it interactive
library(plotly)
plot1 <- plot_ly(data=hero.df.new, x= ~base_str, y= ~base_agi, z= ~base_int,
color = ~cluster, hoverinfo = 'text', text = ~paste(
"</br> Hero_name: ",localized_name,
"</br> str: ", base_str,
"</br> agi: ", base_agi,
"</br> int: ", base_int)) %>%
add_markers() %>% layout(scene = list(xaxis = list(title = "base_str"),
yaxis = list(title = "base_agi"),
zaxis = list(title = "base_int")))
plot1
there’s no clear distinction on each cluster based on their base stats. Cluster 2,3, and 4 are mostly cluster of heroes based on their primary_attribut. But cluster 1 have the combination of all atribut. Cluster 1 placed in center means that their base stats are on average of all heroes. Cluster 1 are the best heroes to pick when your team need some balancing in a terms of attribut.
Clusters based on heroe’s passive attribute
hero.df.new %>% group_by(cluster) %>%
select(cluster, base_health_regen, base_armor, base_mr,base_attack_max,base_attack_min,
attack_range, projectile_speed, attack_rate, move_speed, turn_rate) %>%
summarise_if(is.numeric, "mean") %>%
mutate_if(is.numeric, .funs = "round", digits=3)
cluster 1
have the highest base_armor, and slow move_speed. But overall, cluster 1
have the average value attribute. if we add our previous analysis from primary attribute, we can conclude that maybe
most heroes in cluster 1
are durable
(tanky) from all existing primary attributes.cluster 2
however has many characteristics of int heroes, for example: lowest base_armor and health regen (because most of int heroes focus on mana than health), low_base attack, and highest attack range (because most of int heroes are ranged attack tpye). If only we have mana_regen or mana_skill_consumption kind of variables, i believe we can seperate heroes cluster even bettercluster 3
is a cluster for agi heroes. we can see that from low base armor, low attack, but high projectile_speed and attack rate since agi heroes are very dependent on speed. (note: low attack_rate mean higher attack speed. attack_rate indicates how many attack are happen in one second). But somehow cluster 3
have the highest base_health_regen which i thought i should be in str heroes.cluster 4
have many characteristics of str heroes. high armor, base attack, and attack rate. str heroes depend on armor and health but low speed and mostly are melee attack type. Everything are summarized on the table above.After that, lets see how our clusters seperate heroes based on their role
hero.df.new %>% group_by(cluster) %>%
select(21:30) %>%
summarise_if(is.numeric, "mean") %>%
mutate_if(is.numeric, .funs = "round", digits=3)
cluster 1
: have the highest amount of disabler and jungler heroes, but have lowest carry. also have high support and initiator. i can say that heroes in cluster 1
are mostly semi-support hero who help carry to kill enemies and initiate battlecluster 2
: have the lowest durable and initiator but have the most support and apparently all of them are nuker heroes. its also match to int characteristics where most of support heroes are form intcluster 3
: have the highest carry, escape, and pusher but low disabler, support, jungler, and initiator. looks like heroes in cluster 3
are meant to kill enemiescluster 4
: have the highest disabler, durable, and initiator. In the game, heroes in cluster 4
are most likely who will start the clash/battle and also trying to disturb enemies.Another well-known unsupervised method is Principal Component Analysis or PCA. PCA looks for correlation within our data and use that redundancy to create a new matrix with just enough dimensions to explain most of the variance in the original data. New variables that are created by PCA is called principal component. PCA can be used for dimensionality reduction
, pattern discovery, Identify variables that are highly correlated with others and Visualizing high dimensional data.
But not all data are suitable for PCA. first of all, it need lots of variables (dimensions). our data only have 29 variables, Is it enough? well i don’t know. Uncorrelated variables are bad for pca (also known as blind tasting), so if our data have lots of correlated data (Logistic Machinery), we are good to do PCA.
## base_health_regen base_armor base_mr base_attack_min
## base_health_regen 1.000000000 0.203201707 0.037806632 0.045388788
## base_armor 0.203201707 1.000000000 0.113862440 0.064023192
## base_mr 0.037806632 0.113862440 1.000000000 0.040142519
## base_attack_min 0.045388788 0.064023192 0.040142519 1.000000000
## base_attack_max -0.003970871 0.055500043 0.010240522 0.893240113
## base_str 0.063935162 0.028802221 -0.023228541 0.369776595
## base_agi 0.192101926 -0.057882319 0.134211230 -0.335981620
## base_int -0.225781529 0.050904767 -0.072392125 -0.161371199
## str_gain -0.003294229 -0.030688458 -0.012598434 0.376941947
## agi_gain 0.155012005 0.073730991 0.088243931 -0.233047934
## int_gain -0.231043921 0.033731005 -0.057513149 -0.147328320
## attack_range -0.300021591 -0.238699274 -0.115301408 -0.401369267
## projectile_speed -0.129522453 -0.072783689 0.009624639 -0.316839246
## attack_rate 0.050410163 -0.009360779 -0.008872137 0.301260812
## move_speed -0.001965692 0.200100097 0.084459026 -0.088089389
## turn_rate 0.095003863 -0.009865875 0.064372919 -0.061588180
## legs 0.111219915 0.100181407 0.006891421 0.080637116
## rolesCarry 0.023715097 0.025627993 0.102028865 -0.086043698
## rolesDisabler 0.089460872 0.028510953 -0.055744577 0.164691029
## rolesDurable -0.047932653 -0.096188531 -0.113310745 0.348264739
## rolesEscape 0.211210351 0.008257484 0.076080072 0.006606846
## rolesInitiator 0.159517586 0.046376557 0.084492654 0.348920049
## rolesJungler 0.048469020 0.004516243 0.035605456 -0.024889765
## rolesNuker -0.186356834 0.063166455 -0.058195356 -0.026782712
## rolesPusher -0.090405458 0.102423578 -0.161738474 -0.097872611
## rolesSupport -0.148239094 -0.085189908 -0.121801500 0.041718605
## base_attack_max base_str base_agi base_int
## base_health_regen -0.003970871 0.06393516 0.19210193 -0.22578153
## base_armor 0.055500043 0.02880222 -0.05788232 0.05090477
## base_mr 0.010240522 -0.02322854 0.13421123 -0.07239213
## base_attack_min 0.893240113 0.36977660 -0.33598162 -0.16137120
## base_attack_max 1.000000000 0.31526680 -0.34034754 -0.06532644
## base_str 0.315266797 1.00000000 -0.31096115 -0.04989530
## base_agi -0.340347540 -0.31096115 1.00000000 -0.16320210
## base_int -0.065326444 -0.04989530 -0.16320210 1.00000000
## str_gain 0.366455547 0.55790840 -0.41821154 -0.19559598
## agi_gain -0.236758450 -0.38728934 0.60636444 -0.29378019
## int_gain -0.089326157 -0.19316768 -0.26112723 0.57287616
## attack_range -0.328987090 -0.47840044 -0.07312285 0.40713022
## projectile_speed -0.237965373 -0.15090064 0.01335694 0.15700565
## attack_rate 0.333343133 0.24360649 -0.24933086 -0.04219480
## move_speed -0.086516964 0.05938339 0.08869069 0.07819383
## turn_rate -0.021105033 0.05368861 0.04989100 -0.12897915
## legs 0.100086695 -0.04216962 -0.09823658 0.07694346
## rolesCarry -0.053766099 -0.13579613 0.38004588 -0.25066271
## rolesDisabler 0.144455903 0.23992060 -0.12816677 0.03453965
## rolesDurable 0.285238989 0.40254009 -0.16380730 -0.34224284
## rolesEscape 0.003528075 -0.06387739 0.20830304 -0.15408203
## rolesInitiator 0.281115187 0.41681097 -0.11478937 -0.21261266
## rolesJungler -0.039426988 0.12007884 -0.06582593 -0.06232600
## rolesNuker -0.001202266 -0.07909378 -0.18660811 0.26293153
## rolesPusher -0.043324003 -0.17413072 0.04739182 -0.02039034
## rolesSupport 0.067107384 -0.01015775 -0.18298015 0.35746364
## str_gain agi_gain int_gain attack_range
## base_health_regen -0.003294229 0.155012005 -0.23104392 -0.30002159
## base_armor -0.030688458 0.073730991 0.03373100 -0.23869927
## base_mr -0.012598434 0.088243931 -0.05751315 -0.11530141
## base_attack_min 0.376941947 -0.233047934 -0.14732832 -0.40136927
## base_attack_max 0.366455547 -0.236758450 -0.08932616 -0.32898709
## base_str 0.557908404 -0.387289341 -0.19316768 -0.47840044
## base_agi -0.418211544 0.606364436 -0.26112723 -0.07312285
## base_int -0.195595976 -0.293780193 0.57287616 0.40713022
## str_gain 1.000000000 -0.379817532 -0.37545694 -0.49789693
## agi_gain -0.379817532 1.000000000 -0.35245684 -0.03547236
## int_gain -0.375456945 -0.352456843 1.00000000 0.65327394
## attack_range -0.497896926 -0.035472358 0.65327394 1.00000000
## projectile_speed -0.200648790 0.170035858 0.15417490 0.37329249
## attack_rate 0.311677414 -0.147849056 -0.11917729 -0.16130355
## move_speed -0.016405833 -0.006214705 -0.04799792 -0.22926259
## turn_rate 0.052540428 0.073724058 -0.10352321 -0.12277283
## legs 0.061206589 -0.091415508 0.07801556 -0.07526661
## rolesCarry -0.126050682 0.406179487 -0.37645229 -0.15365138
## rolesDisabler 0.284015887 -0.221388687 0.06150720 -0.01868355
## rolesDurable 0.526353642 -0.190438606 -0.45690790 -0.46486088
## rolesEscape -0.035038169 0.232430432 -0.15984075 -0.20597604
## rolesInitiator 0.532358079 -0.160728152 -0.31205431 -0.42735711
## rolesJungler -0.001570852 -0.025962103 0.06910333 0.03258230
## rolesNuker 0.031582917 -0.185462296 0.29785518 0.11954721
## rolesPusher -0.180037697 0.076027842 0.04752906 0.15689751
## rolesSupport -0.069145205 -0.236341970 0.37754004 0.34412254
## projectile_speed attack_rate move_speed turn_rate
## base_health_regen -0.129522453 0.050410163 -0.001965692 0.095003863
## base_armor -0.072783689 -0.009360779 0.200100097 -0.009865875
## base_mr 0.009624639 -0.008872137 0.084459026 0.064372919
## base_attack_min -0.316839246 0.301260812 -0.088089389 -0.061588180
## base_attack_max -0.237965373 0.333343133 -0.086516964 -0.021105033
## base_str -0.150900637 0.243606493 0.059383389 0.053688615
## base_agi 0.013356940 -0.249330862 0.088690694 0.049890998
## base_int 0.157005648 -0.042194803 0.078193827 -0.128979148
## str_gain -0.200648790 0.311677414 -0.016405833 0.052540428
## agi_gain 0.170035858 -0.147849056 -0.006214705 0.073724058
## int_gain 0.154174899 -0.119177289 -0.047997916 -0.103523211
## attack_range 0.373292492 -0.161303555 -0.229262593 -0.122772834
## projectile_speed 1.000000000 0.009905370 -0.144177941 0.024733195
## attack_rate 0.009905370 1.000000000 -0.026135640 0.024505403
## move_speed -0.144177941 -0.026135640 1.000000000 0.004763117
## turn_rate 0.024733195 0.024505403 0.004763117 1.000000000
## legs -0.072508095 0.076971519 0.120193542 -0.096266141
## rolesCarry 0.053725036 -0.006131039 0.092011631 0.021981449
## rolesDisabler -0.112758464 0.147760770 -0.025292492 0.007057357
## rolesDurable -0.152603233 0.252696754 0.050717549 -0.041884377
## rolesEscape -0.102913206 -0.157650508 -0.169738919 0.227806036
## rolesInitiator -0.215113728 0.289022349 -0.055825544 0.114059914
## rolesJungler -0.013396425 0.081773961 0.172000050 0.004204537
## rolesNuker 0.085704873 -0.138125962 -0.070307053 0.018925859
## rolesPusher 0.016118303 -0.061624577 0.103583238 -0.241165599
## rolesSupport 0.016658319 -0.094060224 -0.070730597 -0.073687966
## legs rolesCarry rolesDisabler rolesDurable
## base_health_regen 0.111219915 0.023715097 0.089460872 -0.047932653
## base_armor 0.100181407 0.025627993 0.028510953 -0.096188531
## base_mr 0.006891421 0.102028865 -0.055744577 -0.113310745
## base_attack_min 0.080637116 -0.086043698 0.164691029 0.348264739
## base_attack_max 0.100086695 -0.053766099 0.144455903 0.285238989
## base_str -0.042169616 -0.135796128 0.239920600 0.402540091
## base_agi -0.098236584 0.380045883 -0.128166766 -0.163807303
## base_int 0.076943463 -0.250662707 0.034539649 -0.342242839
## str_gain 0.061206589 -0.126050682 0.284015887 0.526353642
## agi_gain -0.091415508 0.406179487 -0.221388687 -0.190438606
## int_gain 0.078015563 -0.376452293 0.061507197 -0.456907905
## attack_range -0.075266607 -0.153651379 -0.018683553 -0.464860878
## projectile_speed -0.072508095 0.053725036 -0.112758464 -0.152603233
## attack_rate 0.076971519 -0.006131039 0.147760770 0.252696754
## move_speed 0.120193542 0.092011631 -0.025292492 0.050717549
## turn_rate -0.096266141 0.021981449 0.007057357 -0.041884377
## legs 1.000000000 -0.141204845 0.044562483 -0.151658732
## rolesCarry -0.141204845 1.000000000 -0.235104759 0.150271924
## rolesDisabler 0.044562483 -0.235104759 1.000000000 0.136412241
## rolesDurable -0.151658732 0.150271924 0.136412241 1.000000000
## rolesEscape 0.090581154 0.115248388 -0.140126436 -0.209118541
## rolesInitiator -0.037722594 -0.103183962 0.468546824 0.270010509
## rolesJungler 0.060341925 -0.113252049 -0.001485407 -0.001337142
## rolesNuker 0.178473956 -0.265134358 0.097112971 -0.222522900
## rolesPusher -0.008227833 0.204270765 -0.238462437 -0.106985128
## rolesSupport 0.005000015 -0.481535677 0.256815622 -0.190694043
## rolesEscape rolesInitiator rolesJungler rolesNuker
## base_health_regen 0.211210351 0.15951759 0.048469020 -0.186356834
## base_armor 0.008257484 0.04637656 0.004516243 0.063166455
## base_mr 0.076080072 0.08449265 0.035605456 -0.058195356
## base_attack_min 0.006606846 0.34892005 -0.024889765 -0.026782712
## base_attack_max 0.003528075 0.28111519 -0.039426988 -0.001202266
## base_str -0.063877386 0.41681097 0.120078845 -0.079093778
## base_agi 0.208303037 -0.11478937 -0.065825932 -0.186608113
## base_int -0.154082030 -0.21261266 -0.062325997 0.262931527
## str_gain -0.035038169 0.53235808 -0.001570852 0.031582917
## agi_gain 0.232430432 -0.16072815 -0.025962103 -0.185462296
## int_gain -0.159840749 -0.31205431 0.069103331 0.297855181
## attack_range -0.205976038 -0.42735711 0.032582305 0.119547211
## projectile_speed -0.102913206 -0.21511373 -0.013396425 0.085704873
## attack_rate -0.157650508 0.28902235 0.081773961 -0.138125962
## move_speed -0.169738919 -0.05582554 0.172000050 -0.070307053
## turn_rate 0.227806036 0.11405991 0.004204537 0.018925859
## legs 0.090581154 -0.03772259 0.060341925 0.178473956
## rolesCarry 0.115248388 -0.10318396 -0.113252049 -0.265134358
## rolesDisabler -0.140126436 0.46854682 -0.001485407 0.097112971
## rolesDurable -0.209118541 0.27001051 -0.001337142 -0.222522900
## rolesEscape 1.000000000 -0.04520132 -0.001337142 -0.145037247
## rolesInitiator -0.045201316 1.00000000 -0.040823413 -0.040112578
## rolesJungler -0.001337142 -0.04082341 1.000000000 -0.327764146
## rolesNuker -0.145037247 -0.04011258 -0.327764146 1.000000000
## rolesPusher -0.106985128 -0.20427076 0.135121734 -0.124072917
## rolesSupport -0.190694043 -0.08827139 -0.027192895 0.202024737
## rolesPusher rolesSupport
## base_health_regen -0.090405458 -0.148239094
## base_armor 0.102423578 -0.085189908
## base_mr -0.161738474 -0.121801500
## base_attack_min -0.097872611 0.041718605
## base_attack_max -0.043324003 0.067107384
## base_str -0.174130720 -0.010157749
## base_agi 0.047391822 -0.182980155
## base_int -0.020390340 0.357463636
## str_gain -0.180037697 -0.069145205
## agi_gain 0.076027842 -0.236341970
## int_gain 0.047529062 0.377540040
## attack_range 0.156897506 0.344122543
## projectile_speed 0.016118303 0.016658319
## attack_rate -0.061624577 -0.094060224
## move_speed 0.103583238 -0.070730597
## turn_rate -0.241165599 -0.073687966
## legs -0.008227833 0.005000015
## rolesCarry 0.204270765 -0.481535677
## rolesDisabler -0.238462437 0.256815622
## rolesDurable -0.106985128 -0.190694043
## rolesEscape -0.106985128 -0.190694043
## rolesInitiator -0.204270765 -0.088271395
## rolesJungler 0.135121734 -0.027192895
## rolesNuker -0.124072917 0.202024737
## rolesPusher 1.000000000 -0.109136486
## rolesSupport -0.109136486 1.000000000
it turns out our data have a very low correlation but some variables like str/agi/int base
are correlated to str/agi/int gain
. rolesCarry
also have negative influence to rolesSupport.
It make sense since it is very rare for carry to be a support and vice versa. The presence of carry roles can explain support roles as well as str/agi/int base to str/agi/int gain. It’ll make multicolinearity
if we do supervised learning and to avoid that, lets make PCA.
# we need to scale our numeric data first and seperate cateogirc with numeric variable
# remove localized name and cluster
for.pca <- hero.df.new[,-c(1,30)]
for.pca <- for.pca %>% mutate_if(is.numeric, .funs = "scale")
##
## Call:
## PCA(X = for.pca, quali.sup = c(1:2), graph = F)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## Variance 4.624 3.582 1.772 1.649 1.452 1.311 1.171
## % of var. 17.785 13.779 6.817 6.342 5.583 5.042 4.502
## Cumulative % of var. 17.785 31.564 38.381 44.723 50.306 55.348 59.851
## Dim.8 Dim.9 Dim.10 Dim.11 Dim.12 Dim.13 Dim.14
## Variance 1.127 1.059 0.941 0.855 0.777 0.759 0.695
## % of var. 4.335 4.073 3.620 3.289 2.987 2.920 2.675
## Cumulative % of var. 64.185 68.259 71.878 75.167 78.154 81.073 83.748
## Dim.15 Dim.16 Dim.17 Dim.18 Dim.19 Dim.20 Dim.21
## Variance 0.603 0.570 0.526 0.440 0.410 0.379 0.358
## % of var. 2.321 2.193 2.024 1.693 1.578 1.458 1.377
## Cumulative % of var. 86.069 88.262 90.286 91.979 93.557 95.016 96.393
## Dim.22 Dim.23 Dim.24 Dim.25 Dim.26
## Variance 0.284 0.218 0.215 0.136 0.085
## % of var. 1.091 0.838 0.826 0.525 0.328
## Cumulative % of var. 97.483 98.321 99.147 99.672 100.000
##
## Individuals (the 10 first)
## Dist Dim.1 ctr cos2 Dim.2 ctr cos2
## 1 | 6.261 | -1.173 0.254 0.035 | -3.118 2.320 0.248 |
## 2 | 6.007 | 2.376 1.044 0.156 | -1.328 0.420 0.049 |
## 3 | 3.906 | 0.304 0.017 0.006 | 1.484 0.525 0.144 |
## 4 | 4.468 | 1.086 0.218 0.059 | -1.275 0.388 0.081 |
## 5 | 4.792 | -1.209 0.270 0.064 | 2.046 0.999 0.182 |
## 6 | 5.383 | -2.426 1.088 0.203 | -2.530 1.527 0.221 |
## 7 | 5.223 | 2.466 1.124 0.223 | 1.041 0.259 0.040 |
## 8 | 7.455 | -2.446 1.106 0.108 | -4.975 5.904 0.445 |
## 9 | 3.957 | -2.011 0.747 0.258 | -0.421 0.042 0.011 |
## 10 | 6.277 | -1.661 0.510 0.070 | -3.464 2.864 0.305 |
## Dim.3 ctr cos2
## 1 1.034 0.516 0.027 |
## 2 0.352 0.060 0.003 |
## 3 0.553 0.148 0.020 |
## 4 -0.069 0.002 0.000 |
## 5 -0.111 0.006 0.001 |
## 6 -1.212 0.709 0.051 |
## 7 2.307 2.567 0.195 |
## 8 0.446 0.096 0.004 |
## 9 0.711 0.244 0.032 |
## 10 0.252 0.031 0.002 |
##
## Variables (the 10 first)
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr
## base_health_regen | 0.181 0.712 0.033 | -0.337 3.161 0.113 | 0.426 10.254
## base_armor | 0.067 0.097 0.004 | -0.066 0.121 0.004 | 0.216 2.641
## base_mr | 0.033 0.023 0.001 | -0.193 1.040 0.037 | 0.262 3.865
## base_attack_min | 0.701 10.621 0.491 | 0.190 1.006 0.036 | -0.020 0.023
## base_attack_max | 0.634 8.702 0.402 | 0.231 1.493 0.053 | -0.049 0.137
## base_str | 0.680 9.995 0.462 | 0.182 0.927 0.033 | -0.044 0.110
## base_agi | -0.334 2.416 0.112 | -0.658 12.102 0.434 | 0.178 1.791
## base_int | -0.373 3.009 0.139 | 0.583 9.499 0.340 | 0.047 0.127
## str_gain | 0.784 13.276 0.614 | 0.157 0.684 0.025 | -0.038 0.080
## agi_gain | -0.313 2.120 0.098 | -0.702 13.769 0.493 | 0.110 0.678
## cos2
## base_health_regen 0.182 |
## base_armor 0.047 |
## base_mr 0.068 |
## base_attack_min 0.000 |
## base_attack_max 0.002 |
## base_str 0.002 |
## base_agi 0.032 |
## base_int 0.002 |
## str_gain 0.001 |
## agi_gain 0.012 |
##
## Supplementary categories
## Dist Dim.1 cos2 v.test Dim.2 cos2 v.test
## agi | 2.303 | -0.892 0.150 -3.039 | -2.069 0.807 -8.006 |
## int | 2.207 | -1.295 0.345 -4.855 | 1.690 0.586 7.195 |
## str | 2.406 | 2.300 0.914 7.991 | 0.147 0.004 0.580 |
## Melee | 1.856 | 1.645 0.785 7.896 | -0.706 0.144 -3.847 |
## Ranged | 1.704 | -1.510 0.785 -7.896 | 0.648 0.144 3.847 |
## Dim.3 cos2 v.test
## agi 0.077 0.001 0.424 |
## int 0.097 0.002 0.589 |
## str -0.183 0.006 -1.025 |
## Melee 0.193 0.011 1.497 |
## Ranged -0.177 0.011 -1.497 |
From the summary above, we need 15 dimensions to cover 86% variance of data.
plot.PCA(pca1, choix = "ind", invisible = "quali", habillage = 1, col.hab = c("green","blue","red"))
Dim 1 only cover 17.78% variance of data and dim 2 only 13.78%. That’s kinda low, i was expect something like more than 30% in Dim 1. lets visualize the percentages of variance covered by each pca
PC 1 and 2 combined only covers 31% approx. if we combine 10 first dimension, it covers 71.8% information of our data. Surely we can reduce the numbers of variables of our data for future supervised learning but the changes are not significant since our data has low multicolinearty in the first place.
Let’s see how each numeric variables are covered by pca
we can see that PC 1 and 2 are not enough to picture our data clearly (it only covers 31% anyway). But we have some interesting insight here where the difference of 3 primary attr can be explained by PC 1 and 2. All of the attr have negative influence to each other but not so significant.
From the plot we can see what variables are contribute to what dimension/PC from the plot above. For more clearer insight, lets draw a plot to see what variables contribute to both PC.
From the plot above, we know that str_gain
have the highest contribution to PC 1. In PC 2, agi_gain
and int_gain
have the highest contribution. Red line in the plot indicates average contribution on each pc. if we take variables that contribute above average line, almost every variables are contribute against each PC (for example: str_gain
have high contribution to PC 1 but low in PC 2. roles carry
have high contribution in PC 2 but very low in PC 1. some variables like int_gain
have high contribution in both PC tho), It means the PC have succesfuly seperated our data in the terms of contribution
since PC 2 are made from a line that perpendicular to PC 1.
Let’s see how our cluster distributed in PCA
# rebuild PCA. this time we include cluster
for.pca2 <- cbind(for.pca, cluster = hero.df.new$cluster)
# build PCA
pca2 <- PCA(for.pca2, quali.sup = c(1,2,29), graph = F)
From the plots we can conclude that: * note: remember. this conclusion are made by only 31% of data.
- cluster 4
and cluster 2
are somewhat similiar
- heroes in cluster 1
are the most unique
- cluster 1
are the opposite of cluster 4
- heroes in cluster 1
are highly contributed by variables rolesSupport, int_gain, base_int, attack_range, and rolesNuker
- heroes in cluster 2
are highly contributed by variables base_attack_max/min, base_str, str_gain, attack_rate, rolesDisabler, rolesInitiator, and rolesDurable.
- heroes in cluster 3
are highly contributed by variables base_agi, agi_gain, rolesCarry, rolesEscape, base_mr, and move_speed, rolesPusher
- heres in cluster 4
are highly contributed by base_healh_regen, turn_rate, and base_armor.
those conclusion made by intepretaion from PC 1 and PC 2 which only portrayed 31% of data. it’s hard to intepret conclusion by only 2 PC because i’m afraid it will be misleading. So from the analysis, i can conclude that Dota2 heroes data are not suitable to be analyzed with PCA.
lastly, lets convert our PCA to df
df.pca <- data.frame(pca2$ind$coord[,1:5])
df.pca <- cbind(df.pca, cluster = hero.df.new$cluster, hero = hero.df.new$localized_name)
plot2 <- plot_ly(data=df.pca, x= ~Dim.1, y= ~Dim.2, z= ~Dim.3,
color = ~cluster, hoverinfo = 'text', text = ~paste(
"</br> Hero_name: ",hero,
"</br> Dim.1: ", Dim.1,
"</br> Dim.2: ", Dim.2,
"</br> Dim.3: ", Dim.3)) %>%
add_markers() %>% layout(scene = list(xaxis = list(title = "Dim.1"),
yaxis = list(title = "Dim.2"),
zaxis = list(title = "Dim.3")))
plot2
Finally, here’s some insight we can get from unsupervised learning for Dota2 heroes data: - There’s no clear distinction on each cluster based on their base stats, but there’s slightly different based on their roles and primary attribute
- cluster 1
are the only unique cluster made from combination of all hero’s primary attribute, meanwhile cluster 2-4
have many characteristics with intelligence, agility, strength primarry attribute sequentially
- We need 15 dimensions to cover 86% variance of data, or 26 dimensions to cover all data. It means if we use all the PC to reduce dimensionality of our main data, we only do 13% variable reduction (1 - (total.dimension/total.actual.variable)100)
- Or if 80% variance of data is enough for you, we only need 13 dimensions, which mean we’re able to reduce 50% varible to still retain 80% of data. (1 - (15/30)100)
- It’s hard to intepret conclusion by only 2 first PC. We still need a lot of dimensions to summarize our data clearly. Thus, Dota2 heroes data are not suitable to be analyzed with PCA.
Thank you !
Shadow Fiend by chroneco