In this small project, a clustering analysis is performed on fully-evolved Pokemon up until Generation 8. This can inform decisions on which Pokemon to choose when trying to build a balanced team.

Import packages and set seed for k-means clustering.

library(tidyverse)
library(dplyr)
library(stats)
library(ggplot2)
library(tidyr)
library(ggpubr)
set.seed(123)

options(dplyr.summarise.inform = FALSE)

Read file. The dataset was downloaded from Kaggle https://www.kaggle.com/tlegrand/pokemon-with-stats-generation-8. Some modifications were done on the data to include a column on whether each Pokemon is fully-evolved or not. There were some mistakes on the types of each Pokemon but these columns would not be used in this analysis so were ignored.

pokemon <- read.csv('Pokemon_Gen_1-8.csv')
#Rename X column to fully_evolved, 1 means fully_evolved
pokemon <- pokemon %>%
  rename(fully_evolved = X)

#Take only relevant columns, and choose only fully evolved pokemon
stats <- pokemon %>%
  select(Name, HP, Attack, Defense, Sp..Attack, Sp..Defense, Speed, fully_evolved) %>%
  filter(fully_evolved == 1)
#Check mean and standard deviation of the stats
colMeans(stats[,c('HP', 'Attack', 'Defense', 'Sp..Attack', 'Sp..Defense', 'Speed')])
##          HP      Attack     Defense  Sp..Attack Sp..Defense       Speed 
##    81.57096    95.91254    87.65182    87.60726    86.13861    79.99175
apply(stats[,c('HP', 'Attack', 'Defense', 'Sp..Attack', 'Sp..Defense', 'Speed')], 2, sd)
##          HP      Attack     Defense  Sp..Attack Sp..Defense       Speed 
##    25.36811    30.32922    30.24084    31.91203    25.66924    30.32117
#Scale stats to have mean = 0, sd = 1
stat_num <- scale(stats[,c('HP', 'Attack', 'Defense', 'Sp..Attack', 'Sp..Defense', 'Speed')],
                  center=TRUE, scale=TRUE)

K-means clustering

First of all, plot an elbow plot.

#Elbow plot for k-means clustering
ws <- 0
for (i in 1: 12){
  kmclust <- kmeans(stat_num, center = i, nstart = 20)
  ws[i] <- kmclust$tot.withinss
  }

plot(1:12, ws)

There doesn’t seem to be a clear elbow. However, Pokemon can, in general, be classified based on their stats to a few categories: well-rounded Pokemon with stats higher than usual (like the legendaries), sweepers (those better in attacking), walls (those better in defense compared to attack), tanks (those that are very robust to attacks and can survive very long due to high HP), and those that are generally lacklustre. Walls and tanks are quite similar so in this case, we can try to have 4 clusters.

kmclust <- kmeans(stat_num, center = 4, nstart = 20)

Because there are 6 stats to take into account, it’s hard to plot clusters using a graph that utilise all 6 stats. To make visualisation easier, a PCA was done.

#Plot the clusters using principal components
stat_pca <- prcomp(stat_num)
summary(stat_pca)
## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5     PC6
## Standard deviation     1.2496 1.1891 1.0776 0.9498 0.76748 0.61004
## Proportion of Variance 0.2603 0.2356 0.1935 0.1504 0.09817 0.06203
## Cumulative Proportion  0.2603 0.4959 0.6894 0.8398 0.93797 1.00000

The first 2 principal components can represent 50% of the variance.

biplot(stat_pca)

Pokemon that are good in defense are usually relatively bad in attack and vice versa, defensive pokemon also usually have lower speed, the direction of the eigenvectors obtained from PCA capture these covariances quite well.

pca_df <- as.data.frame(stat_pca$x)
pca_df['Name'] <- stats$Name
pca_df['cluster'] <- kmclust$cluster
ggplot(pca_df, aes(x=PC1, y=PC2, color = as.factor(cluster))) + geom_point() +
  labs(colour="Cluster")

#Add cluster column to the data frame with original unscaled stats
stats$cluster <- kmclust$cluster
#Plot scatterplot of stats showing what sort of values each cluster has
ggplot(stats, aes(x=HP, y=Speed, color = as.factor(cluster))) + geom_point() +
  labs(colour="Cluster")

ggplot(stats, aes(x=Attack, y=Defense, color = as.factor(cluster))) + geom_point() + labs(colour="Cluster")

ggplot(stats, aes(x=Sp..Attack, y=Sp..Defense, color = as.factor(cluster))) + geom_point() + labs(colour="Cluster")

Generally it can be observed that, Cluster 1 has relatively high HP and attack, but abysmal speed, this cluster is highly likely to be representing tanks.

Cluster 2 has low HP, defense, special defense but high speed, this cluster probably represents the sweepers.

Cluster 3 has higher stats across all departments. This probably represents the legendaries and overpowered Pokemon.

Cluster 4 generally has higher special defense than all other stats, but these Pokemon are quite similar to the tanks of Cluster 3 in terms of special defense. Their defense stats are not bad, but they have abysmal HP. They also have better special attack than physical attack. This cluster is probably made up of the relatively lacklustre Pokemon.

Let’s look at the mean of each stat for each each cluster.

#Mean of each stats for each cluster
stats %>%
  filter(cluster == 1) %>%
  summarize(mean(HP), mean(Attack), mean(Defense), mean(Sp..Attack), mean(Sp..Defense), mean(Speed))
##   mean(HP) mean(Attack) mean(Defense) mean(Sp..Attack) mean(Sp..Defense)
## 1 90.98824     110.0765      98.96471         67.29412          74.84706
##   mean(Speed)
## 1    60.85294
stats %>%
  filter(cluster == 2) %>%
  summarize(mean(HP), mean(Attack), mean(Defense), mean(Sp..Attack), mean(Sp..Defense), mean(Speed))
##   mean(HP) mean(Attack) mean(Defense) mean(Sp..Attack) mean(Sp..Defense)
## 1 68.32609     86.82065      65.07065         81.69022          69.60326
##   mean(Speed)
## 1    99.08696
stats %>%
  filter(cluster == 3) %>%
  summarize(mean(HP), mean(Attack), mean(Defense), mean(Sp..Attack), mean(Sp..Defense), mean(Speed))
##   mean(HP) mean(Attack) mean(Defense) mean(Sp..Attack) mean(Sp..Defense)
## 1 95.79339     118.6198      96.53719         122.2727          103.1322
##   mean(Speed)
## 1    102.3554
stats %>%
  filter(cluster == 4) %>%
  summarize(mean(HP), mean(Attack), mean(Defense), mean(Sp..Attack), mean(Sp..Defense), mean(Speed))
##   mean(HP) mean(Attack) mean(Defense) mean(Sp..Attack) mean(Sp..Defense)
## 1 74.81679     69.32824      96.48092         90.25954          108.3206
##   mean(Speed)
## 1    57.35115

Distribution plot of each stat for each cluster

#Plot distribution of stats for each cluster, need to first create a data frame
#for each cluster
clus1 <- stats %>%
  filter(cluster == 1) %>%
  gather(key = 'stat', value = 'value', -c('Name', 'cluster', 'fully_evolved'))
plot1 <- ggplot(clus1, aes(x=value)) + geom_density() + facet_wrap(~stat) +
  ggtitle('Stats of Cluster 1 Pokemon') +
  xlab('Stats') + ylab('Density') +
  theme(axis.text.x = element_text(angle = 45)) +
  theme(plot.title = element_text(hjust = 0.5))

clus2 <- stats %>%
  filter(cluster == 2) %>%
  gather(key = 'stat', value = 'value', -c('Name', 'cluster', 'fully_evolved'))
plot2 <- ggplot(clus2, aes(x=value)) + geom_density() + facet_wrap(~stat) +
  ggtitle('Stats of Cluster 2 Pokemon') +
  xlab('Stats') + ylab('Density') +
  theme(axis.text.x = element_text(angle = 45)) +
  theme(plot.title = element_text(hjust = 0.5))

clus3 <- stats %>%
  filter(cluster == 3) %>%
  gather(key = 'stat', value = 'value', -c('Name', 'cluster', 'fully_evolved'))
plot3 <- ggplot(clus3, aes(x=value)) + geom_density() + facet_wrap(~stat) +
  ggtitle('Stats of Cluster 3 Pokemon') +
  xlab('Stats') + ylab('Density') +
  theme(axis.text.x = element_text(angle = 45)) +
  theme(plot.title = element_text(hjust = 0.5))

clus4 <- stats %>%
  filter(cluster == 4) %>%
  gather(key = 'stat', value = 'value', -c('Name', 'cluster', 'fully_evolved'))
plot4 <- ggplot(clus4, aes(x=value)) + geom_density() + facet_wrap(~stat) +
  ggtitle('Stats of Cluster 4 Pokemon') +
  xlab('Stats') + ylab('Density') +
  theme(axis.text.x = element_text(angle = 45)) +
  theme(plot.title = element_text(hjust = 0.5))

figure <- ggarrange(plot1, plot2, plot3, plot4,
                    ncol = 2, nrow = 2)
figure

Cluster 3 generally are quite well-rounded. Cluster 4 have lower values for most stats, compared to the 3 other clusters. Comparing the 1st and 2nd cluster, the 2nd cluster are generally better in terms of attack and speed, making them more suitable for offensive roles. The 1st cluster are relatively more suitable for defensive roles.

top_pokemon <- stats %>%
  select(-fully_evolved) %>%
  mutate(Total = HP + Attack + Defense + Sp..Attack + Sp..Defense + Speed) %>%
  arrange(cluster, desc(Total)) %>%
  group_by(cluster) %>%
  top_n(20)
## Selecting by Total
print(top_pokemon, n = 200)
## # A tibble: 84 x 9
## # Groups:   cluster [4]
##    Name             HP Attack Defense Sp..Attack Sp..Defense Speed cluster Total
##    <chr>         <int>  <int>   <int>      <int>       <int> <int>   <int> <int>
##  1 Aggron (Mega…    70    140     230         60          80    50       1   630
##  2 Steelix (Meg…    75    125     230         55          95    30       1   610
##  3 Pinsir (Mega…    65    155     120         65          90   105       1   600
##  4 Scizor (Mega…    70    150     140         65         100    75       1   600
##  5 Heracross (M…    80    185     115         40         105    75       1   600
##  6 Tyranitar       100    134     110         95         100    61       1   600
##  7 Metagross        80    135     130         95          90    70       1   600
##  8 Melmetal        135    143     143         80          65    34       1   600
##  9 Kangaskhan (…   105    125     100         60         100   100       1   590
## 10 Regirock         80    100     200         50         100    50       1   580
## 11 Cobalion         91     90     129         90          72   108       1   580
## 12 Regidrago       200    100      50        100          50    80       1   580
## 13 Glastrier       100    145     130         65         110    30       1   580
## 14 Tapu Bulu        70    130     115         85          95    75       1   570
## 15 Buzzwole        107    139     139         53          53    79       1   570
## 16 Kartana          59    181     131         59          31   109       1   570
## 17 Guzzlord        223    101      53         97          53    43       1   570
## 18 Stakataka        61    131     211         53         101    13       1   570
## 19 Banette (Meg…    64    165      75         93          83    75       1   555
## 20 Urshifu (Sin…   100    130     100         63          60    97       1   550
## 21 Urshifu (Rap…   100    130     100         63          60    97       1   550
## 22 Deoxys (Norm…    50    150      50        150          50   150       2   600
## 23 Deoxys (Atta…    50    180      20        180          20   150       2   600
## 24 Deoxys (Spee…    50     95      90         95          90   180       2   600
## 25 Lopunny (Meg…    65    136      94         54          96   135       2   580
## 26 Regieleki        80    100      50        100          50   200       2   580
## 27 Manectric (M…    70     75      80        135          80   135       2   575
## 28 Tapu Koko        70    115      85         95          75   130       2   570
## 29 Pheromosa        71    137      37        137          37   151       2   570
## 30 Archeops         75    140      65        112          65   110       2   567
## 31 Absol (Mega …    65    150      60        115          60   115       2   565
## 32 Sharpedo (Me…    70    140      70        110          65   105       2   560
## 33 Electivire       75    123      67         95          85    95       2   540
## 34 Darmanitan (…   105    160      55         30          55   135       2   540
## 35 Naganadel        73     73      73        127          73   121       2   540
## 36 Crobat           85     90      80         70          80   130       2   535
## 37 Porygon-Z        85     80      70        135          75    90       2   535
## 38 Noivern          85     70      80         97          80   123       2   535
## 39 Duraludon        70     95     115        120          50    85       2   535
## 40 Charizard        78     84      78        109          85   100       2   534
## 41 Typhlosion       78     84      78        109          85   100       2   534
## 42 Infernape        76    104      71        104          71   108       2   534
## 43 Delphox          75     69      72        114         100   104       2   534
## 44 Eternatus (E…   255    115     250        125         250   130       3  1125
## 45 Mewtwo (Mega…   106    190     100        154         100   130       3   780
## 46 Mewtwo (Mega…   106    150      70        194         120   140       3   780
## 47 Rayquaza (Me…   105    180     100        180         100   115       3   780
## 48 Kyogre (Prim…   100    150      90        180         160    90       3   770
## 49 Groudon (Pri…   100    180     160        150          90    90       3   770
## 50 Necrozma (Ul…    97    167      97        167          97   129       3   754
## 51 Arceus          120    120     120        120         120   120       3   720
## 52 Zacian (Crow…    92    170     115         80         115   148       3   720
## 53 Zamazenta (C…    92    130     145         80         145   128       3   720
## 54 Zygarde (Com…   216    100     121         91          95    85       3   708
## 55 Tyranitar (M…   100    164     150         95         120    71       3   700
## 56 Salamence (M…    95    145     130        120          90   120       3   700
## 57 Metagross (M…    80    145     150        105         110   110       3   700
## 58 Latias (Mega…    80    100     120        140         150   110       3   700
## 59 Latios (Mega…    80    130     100        160         120   110       3   700
## 60 Garchomp (Me…   108    170     115        120          95    92       3   700
## 61 Kyurem (Blac…   125    170     100        120          90    95       3   700
## 62 Kyurem (Whit…   125    120      90        170         100    95       3   700
## 63 Diancie (Meg…    50    160     110        160         110   110       3   700
## 64 Wishiwashi (…    45    140     130        140         135    30       4   620
## 65 Deoxys (Defe…    50     70     160         70         160    90       4   600
## 66 Cresselia       120     70     120         75         130    85       4   600
## 67 Diancie          50    100     150        100         150    50       4   600
## 68 Magearna         80     95     115        130         115    65       4   600
## 69 Slowbro (Meg…    95     75     180        130          80    30       4   590
## 70 Articuno         90     85     100         95         125    85       4   580
## 71 Suicune         100     75     115         90         115    85       4   580
## 72 Regice           80     50     100        100         200    50       4   580
## 73 Registeel        80     75     150         75         150    50       4   580
## 74 Uxie             75     75     130         75         130    95       4   580
## 75 Tapu Fini        70     75     115         95         130    85       4   570
## 76 Celesteela       97    101     103        107         101    61       4   570
## 77 Camerupt (Me…    70    120     100        145         105    20       4   560
## 78 Florges          78     65      68        112         154    75       4   552
## 79 Togekiss         85     50      95        120         115    80       4   545
## 80 Audino (Mega…   103     60     126         80         126    50       4   545
## 81 Kingdra          75     95      95         95          95    85       4   540
## 82 Blissey         255     10      10         75         135    55       4   540
## 83 Milotic          95     60      79        100         125    81       4   540
## 84 Darmanitan (…   105     30     105        140         105    55       4   540

Above is a data frame showing the top 20 Pokemon in each cluster, ranked according to their total stats. As expected, many legendaries can be found in Cluster 3.

Cluster 1 Pokemon are mainly those with high defense and low speed, they also generally have high attack, but their low speed makes many of them hard to use for offensive roles. Many traditionally defensive-typed Pokemon like those of Steel and Rock types can be found in this cluster.

Cluster 2 Pokemon have high speed and remarkable offensive stats, they are better for offensive roles than defensive roles.

Cluster 4 has better special defense than other clusters. Their speed are not as impressive. They are more suitable for defensive roles.

When building a balanced team, there should be both offensive and defensive Pokemon.