In this small project, a clustering analysis is performed on fully-evolved Pokemon up until Generation 8. This can inform decisions on which Pokemon to choose when trying to build a balanced team.
Import packages and set seed for k-means clustering.
library(tidyverse)
library(dplyr)
library(stats)
library(ggplot2)
library(tidyr)
library(ggpubr)
set.seed(123)
options(dplyr.summarise.inform = FALSE)
Read file. The dataset was downloaded from Kaggle https://www.kaggle.com/tlegrand/pokemon-with-stats-generation-8. Some modifications were done on the data to include a column on whether each Pokemon is fully-evolved or not. There were some mistakes on the types of each Pokemon but these columns would not be used in this analysis so were ignored.
pokemon <- read.csv('Pokemon_Gen_1-8.csv')
#Rename X column to fully_evolved, 1 means fully_evolved
pokemon <- pokemon %>%
rename(fully_evolved = X)
#Take only relevant columns, and choose only fully evolved pokemon
stats <- pokemon %>%
select(Name, HP, Attack, Defense, Sp..Attack, Sp..Defense, Speed, fully_evolved) %>%
filter(fully_evolved == 1)
#Check mean and standard deviation of the stats
colMeans(stats[,c('HP', 'Attack', 'Defense', 'Sp..Attack', 'Sp..Defense', 'Speed')])
## HP Attack Defense Sp..Attack Sp..Defense Speed
## 81.57096 95.91254 87.65182 87.60726 86.13861 79.99175
apply(stats[,c('HP', 'Attack', 'Defense', 'Sp..Attack', 'Sp..Defense', 'Speed')], 2, sd)
## HP Attack Defense Sp..Attack Sp..Defense Speed
## 25.36811 30.32922 30.24084 31.91203 25.66924 30.32117
#Scale stats to have mean = 0, sd = 1
stat_num <- scale(stats[,c('HP', 'Attack', 'Defense', 'Sp..Attack', 'Sp..Defense', 'Speed')],
center=TRUE, scale=TRUE)
First of all, plot an elbow plot.
#Elbow plot for k-means clustering
ws <- 0
for (i in 1: 12){
kmclust <- kmeans(stat_num, center = i, nstart = 20)
ws[i] <- kmclust$tot.withinss
}
plot(1:12, ws)
There doesn’t seem to be a clear elbow. However, Pokemon can, in general, be classified based on their stats to a few categories: well-rounded Pokemon with stats higher than usual (like the legendaries), sweepers (those better in attacking), walls (those better in defense compared to attack), tanks (those that are very robust to attacks and can survive very long due to high HP), and those that are generally lacklustre. Walls and tanks are quite similar so in this case, we can try to have 4 clusters.
kmclust <- kmeans(stat_num, center = 4, nstart = 20)
Because there are 6 stats to take into account, it’s hard to plot clusters using a graph that utilise all 6 stats. To make visualisation easier, a PCA was done.
#Plot the clusters using principal components
stat_pca <- prcomp(stat_num)
summary(stat_pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 1.2496 1.1891 1.0776 0.9498 0.76748 0.61004
## Proportion of Variance 0.2603 0.2356 0.1935 0.1504 0.09817 0.06203
## Cumulative Proportion 0.2603 0.4959 0.6894 0.8398 0.93797 1.00000
The first 2 principal components can represent 50% of the variance.
biplot(stat_pca)
Pokemon that are good in defense are usually relatively bad in attack and vice versa, defensive pokemon also usually have lower speed, the direction of the eigenvectors obtained from PCA capture these covariances quite well.
pca_df <- as.data.frame(stat_pca$x)
pca_df['Name'] <- stats$Name
pca_df['cluster'] <- kmclust$cluster
ggplot(pca_df, aes(x=PC1, y=PC2, color = as.factor(cluster))) + geom_point() +
labs(colour="Cluster")
#Add cluster column to the data frame with original unscaled stats
stats$cluster <- kmclust$cluster
#Plot scatterplot of stats showing what sort of values each cluster has
ggplot(stats, aes(x=HP, y=Speed, color = as.factor(cluster))) + geom_point() +
labs(colour="Cluster")
ggplot(stats, aes(x=Attack, y=Defense, color = as.factor(cluster))) + geom_point() + labs(colour="Cluster")
ggplot(stats, aes(x=Sp..Attack, y=Sp..Defense, color = as.factor(cluster))) + geom_point() + labs(colour="Cluster")
Generally it can be observed that, Cluster 1 has relatively high HP and attack, but abysmal speed, this cluster is highly likely to be representing tanks.
Cluster 2 has low HP, defense, special defense but high speed, this cluster probably represents the sweepers.
Cluster 3 has higher stats across all departments. This probably represents the legendaries and overpowered Pokemon.
Cluster 4 generally has higher special defense than all other stats, but these Pokemon are quite similar to the tanks of Cluster 3 in terms of special defense. Their defense stats are not bad, but they have abysmal HP. They also have better special attack than physical attack. This cluster is probably made up of the relatively lacklustre Pokemon.
Let’s look at the mean of each stat for each each cluster.
#Mean of each stats for each cluster
stats %>%
filter(cluster == 1) %>%
summarize(mean(HP), mean(Attack), mean(Defense), mean(Sp..Attack), mean(Sp..Defense), mean(Speed))
## mean(HP) mean(Attack) mean(Defense) mean(Sp..Attack) mean(Sp..Defense)
## 1 90.98824 110.0765 98.96471 67.29412 74.84706
## mean(Speed)
## 1 60.85294
stats %>%
filter(cluster == 2) %>%
summarize(mean(HP), mean(Attack), mean(Defense), mean(Sp..Attack), mean(Sp..Defense), mean(Speed))
## mean(HP) mean(Attack) mean(Defense) mean(Sp..Attack) mean(Sp..Defense)
## 1 68.32609 86.82065 65.07065 81.69022 69.60326
## mean(Speed)
## 1 99.08696
stats %>%
filter(cluster == 3) %>%
summarize(mean(HP), mean(Attack), mean(Defense), mean(Sp..Attack), mean(Sp..Defense), mean(Speed))
## mean(HP) mean(Attack) mean(Defense) mean(Sp..Attack) mean(Sp..Defense)
## 1 95.79339 118.6198 96.53719 122.2727 103.1322
## mean(Speed)
## 1 102.3554
stats %>%
filter(cluster == 4) %>%
summarize(mean(HP), mean(Attack), mean(Defense), mean(Sp..Attack), mean(Sp..Defense), mean(Speed))
## mean(HP) mean(Attack) mean(Defense) mean(Sp..Attack) mean(Sp..Defense)
## 1 74.81679 69.32824 96.48092 90.25954 108.3206
## mean(Speed)
## 1 57.35115
#Plot distribution of stats for each cluster, need to first create a data frame
#for each cluster
clus1 <- stats %>%
filter(cluster == 1) %>%
gather(key = 'stat', value = 'value', -c('Name', 'cluster', 'fully_evolved'))
plot1 <- ggplot(clus1, aes(x=value)) + geom_density() + facet_wrap(~stat) +
ggtitle('Stats of Cluster 1 Pokemon') +
xlab('Stats') + ylab('Density') +
theme(axis.text.x = element_text(angle = 45)) +
theme(plot.title = element_text(hjust = 0.5))
clus2 <- stats %>%
filter(cluster == 2) %>%
gather(key = 'stat', value = 'value', -c('Name', 'cluster', 'fully_evolved'))
plot2 <- ggplot(clus2, aes(x=value)) + geom_density() + facet_wrap(~stat) +
ggtitle('Stats of Cluster 2 Pokemon') +
xlab('Stats') + ylab('Density') +
theme(axis.text.x = element_text(angle = 45)) +
theme(plot.title = element_text(hjust = 0.5))
clus3 <- stats %>%
filter(cluster == 3) %>%
gather(key = 'stat', value = 'value', -c('Name', 'cluster', 'fully_evolved'))
plot3 <- ggplot(clus3, aes(x=value)) + geom_density() + facet_wrap(~stat) +
ggtitle('Stats of Cluster 3 Pokemon') +
xlab('Stats') + ylab('Density') +
theme(axis.text.x = element_text(angle = 45)) +
theme(plot.title = element_text(hjust = 0.5))
clus4 <- stats %>%
filter(cluster == 4) %>%
gather(key = 'stat', value = 'value', -c('Name', 'cluster', 'fully_evolved'))
plot4 <- ggplot(clus4, aes(x=value)) + geom_density() + facet_wrap(~stat) +
ggtitle('Stats of Cluster 4 Pokemon') +
xlab('Stats') + ylab('Density') +
theme(axis.text.x = element_text(angle = 45)) +
theme(plot.title = element_text(hjust = 0.5))
figure <- ggarrange(plot1, plot2, plot3, plot4,
ncol = 2, nrow = 2)
figure
Cluster 3 generally are quite well-rounded. Cluster 4 have lower values for most stats, compared to the 3 other clusters. Comparing the 1st and 2nd cluster, the 2nd cluster are generally better in terms of attack and speed, making them more suitable for offensive roles. The 1st cluster are relatively more suitable for defensive roles.
top_pokemon <- stats %>%
select(-fully_evolved) %>%
mutate(Total = HP + Attack + Defense + Sp..Attack + Sp..Defense + Speed) %>%
arrange(cluster, desc(Total)) %>%
group_by(cluster) %>%
top_n(20)
## Selecting by Total
print(top_pokemon, n = 200)
## # A tibble: 84 x 9
## # Groups: cluster [4]
## Name HP Attack Defense Sp..Attack Sp..Defense Speed cluster Total
## <chr> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 Aggron (Mega… 70 140 230 60 80 50 1 630
## 2 Steelix (Meg… 75 125 230 55 95 30 1 610
## 3 Pinsir (Mega… 65 155 120 65 90 105 1 600
## 4 Scizor (Mega… 70 150 140 65 100 75 1 600
## 5 Heracross (M… 80 185 115 40 105 75 1 600
## 6 Tyranitar 100 134 110 95 100 61 1 600
## 7 Metagross 80 135 130 95 90 70 1 600
## 8 Melmetal 135 143 143 80 65 34 1 600
## 9 Kangaskhan (… 105 125 100 60 100 100 1 590
## 10 Regirock 80 100 200 50 100 50 1 580
## 11 Cobalion 91 90 129 90 72 108 1 580
## 12 Regidrago 200 100 50 100 50 80 1 580
## 13 Glastrier 100 145 130 65 110 30 1 580
## 14 Tapu Bulu 70 130 115 85 95 75 1 570
## 15 Buzzwole 107 139 139 53 53 79 1 570
## 16 Kartana 59 181 131 59 31 109 1 570
## 17 Guzzlord 223 101 53 97 53 43 1 570
## 18 Stakataka 61 131 211 53 101 13 1 570
## 19 Banette (Meg… 64 165 75 93 83 75 1 555
## 20 Urshifu (Sin… 100 130 100 63 60 97 1 550
## 21 Urshifu (Rap… 100 130 100 63 60 97 1 550
## 22 Deoxys (Norm… 50 150 50 150 50 150 2 600
## 23 Deoxys (Atta… 50 180 20 180 20 150 2 600
## 24 Deoxys (Spee… 50 95 90 95 90 180 2 600
## 25 Lopunny (Meg… 65 136 94 54 96 135 2 580
## 26 Regieleki 80 100 50 100 50 200 2 580
## 27 Manectric (M… 70 75 80 135 80 135 2 575
## 28 Tapu Koko 70 115 85 95 75 130 2 570
## 29 Pheromosa 71 137 37 137 37 151 2 570
## 30 Archeops 75 140 65 112 65 110 2 567
## 31 Absol (Mega … 65 150 60 115 60 115 2 565
## 32 Sharpedo (Me… 70 140 70 110 65 105 2 560
## 33 Electivire 75 123 67 95 85 95 2 540
## 34 Darmanitan (… 105 160 55 30 55 135 2 540
## 35 Naganadel 73 73 73 127 73 121 2 540
## 36 Crobat 85 90 80 70 80 130 2 535
## 37 Porygon-Z 85 80 70 135 75 90 2 535
## 38 Noivern 85 70 80 97 80 123 2 535
## 39 Duraludon 70 95 115 120 50 85 2 535
## 40 Charizard 78 84 78 109 85 100 2 534
## 41 Typhlosion 78 84 78 109 85 100 2 534
## 42 Infernape 76 104 71 104 71 108 2 534
## 43 Delphox 75 69 72 114 100 104 2 534
## 44 Eternatus (E… 255 115 250 125 250 130 3 1125
## 45 Mewtwo (Mega… 106 190 100 154 100 130 3 780
## 46 Mewtwo (Mega… 106 150 70 194 120 140 3 780
## 47 Rayquaza (Me… 105 180 100 180 100 115 3 780
## 48 Kyogre (Prim… 100 150 90 180 160 90 3 770
## 49 Groudon (Pri… 100 180 160 150 90 90 3 770
## 50 Necrozma (Ul… 97 167 97 167 97 129 3 754
## 51 Arceus 120 120 120 120 120 120 3 720
## 52 Zacian (Crow… 92 170 115 80 115 148 3 720
## 53 Zamazenta (C… 92 130 145 80 145 128 3 720
## 54 Zygarde (Com… 216 100 121 91 95 85 3 708
## 55 Tyranitar (M… 100 164 150 95 120 71 3 700
## 56 Salamence (M… 95 145 130 120 90 120 3 700
## 57 Metagross (M… 80 145 150 105 110 110 3 700
## 58 Latias (Mega… 80 100 120 140 150 110 3 700
## 59 Latios (Mega… 80 130 100 160 120 110 3 700
## 60 Garchomp (Me… 108 170 115 120 95 92 3 700
## 61 Kyurem (Blac… 125 170 100 120 90 95 3 700
## 62 Kyurem (Whit… 125 120 90 170 100 95 3 700
## 63 Diancie (Meg… 50 160 110 160 110 110 3 700
## 64 Wishiwashi (… 45 140 130 140 135 30 4 620
## 65 Deoxys (Defe… 50 70 160 70 160 90 4 600
## 66 Cresselia 120 70 120 75 130 85 4 600
## 67 Diancie 50 100 150 100 150 50 4 600
## 68 Magearna 80 95 115 130 115 65 4 600
## 69 Slowbro (Meg… 95 75 180 130 80 30 4 590
## 70 Articuno 90 85 100 95 125 85 4 580
## 71 Suicune 100 75 115 90 115 85 4 580
## 72 Regice 80 50 100 100 200 50 4 580
## 73 Registeel 80 75 150 75 150 50 4 580
## 74 Uxie 75 75 130 75 130 95 4 580
## 75 Tapu Fini 70 75 115 95 130 85 4 570
## 76 Celesteela 97 101 103 107 101 61 4 570
## 77 Camerupt (Me… 70 120 100 145 105 20 4 560
## 78 Florges 78 65 68 112 154 75 4 552
## 79 Togekiss 85 50 95 120 115 80 4 545
## 80 Audino (Mega… 103 60 126 80 126 50 4 545
## 81 Kingdra 75 95 95 95 95 85 4 540
## 82 Blissey 255 10 10 75 135 55 4 540
## 83 Milotic 95 60 79 100 125 81 4 540
## 84 Darmanitan (… 105 30 105 140 105 55 4 540
Above is a data frame showing the top 20 Pokemon in each cluster, ranked according to their total stats. As expected, many legendaries can be found in Cluster 3.
Cluster 1 Pokemon are mainly those with high defense and low speed, they also generally have high attack, but their low speed makes many of them hard to use for offensive roles. Many traditionally defensive-typed Pokemon like those of Steel and Rock types can be found in this cluster.
Cluster 2 Pokemon have high speed and remarkable offensive stats, they are better for offensive roles than defensive roles.
Cluster 4 has better special defense than other clusters. Their speed are not as impressive. They are more suitable for defensive roles.
When building a balanced team, there should be both offensive and defensive Pokemon.