Image courtesy of screenrant.com
In this code through, I am going to show how you can use K-Clustering with datasets. Keeping with my theme of my last code through, I will show how to cluster data from Pokemon GO.
If you aren’t familiar with Pokemon, it’s a video game about collecting fantasy monsters that battle each other with special powers. We’re up to over a thousand of these pocket monsters now, and every time a new game is announced, the significant overlap of data nerds and Pokemon nerds datamine it to pull all the stats. Pokemon GO is the mobile version of this game that prior to 2016 was only on Nintendo Consoles.
Pokemon GO also becomes a bit heavy on the inventory management, as there are so many Pokemon and an almost infinite number of team line ups made of six Pokemon each. R comes in handy here, as you can upload a dataset containing all the current Pokemon and use it to help you manage your Pokemon.
This code through explores how to cluster data by using Shreya Sur965’s “Gotta Analyze ’Em All: The Ultimate Pokémon GO Dataset”, the fviz_nbclust(), kmeans(), and fviz_cluster() functions. This helps me learn more about R and also incentives me to clear out my inventory of Pokémon because I am trying not to become a digital hoarder.
In doing this code through, I learned about K-clustering, which helps you find the optimal number of clusters, which can be helpful when trying not to create too many groups.
Clustering is a powerful tool. It can show you patterns that are unexpected and can lead to newer understandings, or they can reveal what we already know in a format that is established in it’s accuracy and repeatability.
This is based on the work of Shreya Sur965, who created the “Gotta Analyze ’Em All: The Ultimate Pokémon GO Dataset” and hosted it on kaggle.com. I also compared the outcome of my data manipulation against Pokemon GO Database.
First off, we need to load a few libraries:
# LOAD LIBRARIES
library(dplyr)
library(tidyverse)
library( mclust ) # cluster analysis
library( ggplot2 ) # graphing
library( ggthemes )
library( dplyr )
library( pander )
# K-Means Clustering
#install.packages("factoextra")
library(factoextra)
library(cluster)
set.seed(1234)
Then we’ll download the pokemon.csv file and read the .csv, and then take a closer look at the dataset to see the characteristics with glimpse().
Anyone familiar with Pokemon will recognize that the Pokemon are arranged by their number (pokemon_id) with Bulbasaur being the very first Pokemon.
# Loading the dataset
pogo <- read_csv(file="pokemon.csv" )
# column names and data types
glimpse(pogo)## Rows: 1,007
## Columns: 24
## $ pokemon_id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13…
## $ pokemon_name <chr> "Bulbasaur", "Ivysaur", "Venusaur", "Char…
## $ base_attack <dbl> 118, 151, 198, 116, 158, 223, 94, 126, 17…
## $ base_defense <dbl> 111, 143, 189, 93, 126, 173, 121, 155, 20…
## $ base_stamina <dbl> 128, 155, 190, 118, 151, 186, 127, 153, 1…
## $ type <chr> "['Grass', 'Poison']", "['Grass', 'Poison…
## $ rarity <chr> "Standard", "Standard", "Standard", "Stan…
## $ charged_moves <chr> "['Sludge Bomb', 'Seed Bomb', 'Power Whip…
## $ fast_moves <chr> "['Vine Whip', 'Tackle']", "['Razor Leaf'…
## $ candy_required <dbl> NA, 25, 100, NA, 25, 100, NA, 25, 100, NA…
## $ distance <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1,…
## $ max_cp <dbl> 1275, 1943, 3112, 1121, 1891, 3305, 1082,…
## $ attack_probability <dbl> 0.1, 0.1, 0.2, 0.1, 0.1, 0.2, 0.1, 0.1, 0…
## $ base_capture_rate <dbl> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -…
## $ base_flee_rate <dbl> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -…
## $ dodge_probability <dbl> 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15,…
## $ max_pokemon_action_frequency <dbl> 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1…
## $ min_pokemon_action_frequency <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0…
## $ found_egg <lgl> TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, T…
## $ found_evolution <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FAL…
## $ found_wild <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
## $ found_research <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
## $ found_raid <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
## $ found_photobomb <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
Now that we understand how the database is built, we can move onto which characteristics we would like to isolate for clustering. I am going to focus on attack, defense, stamina, and maximum combat power (CP).
# create separate database "d1" from rest of "pogo" database
d1 <- pogo
# Prepare Data for Clustering
d3 <- d1 %>%
select("base_attack", "base_stamina", "base_defense", "max_cp")
d4 <- d1 %>%
select("pokemon_name", "base_attack", "base_defense", "base_stamina", "max_cp")
head(d4) %>% pander()| pokemon_name | base_attack | base_defense | base_stamina | max_cp |
|---|---|---|---|---|
| Bulbasaur | 118 | 111 | 128 | 1275 |
| Ivysaur | 151 | 143 | 155 | 1943 |
| Venusaur | 198 | 189 | 190 | 3112 |
| Charmander | 116 | 93 | 118 | 1121 |
| Charmeleon | 158 | 126 | 151 | 1891 |
| Charizard | 223 | 173 | 186 | 3305 |
Pokemon, like Baseball, is a stats game. Pokemon have, essentially, four main stats: Attack, Stamina, Defense, and Combat Power (CP). Attack is how much damage a Pokemon does in battle, Stamina is how long a Pokemon lasts in battle, and Defense is how much damage a Pokemon resists in battle. CP is the overall measure of strength of any Pokemon, and can be considered their power level. Knowing this, you’ll want to have Pokemon with the highest possible CP for any raid.
The database I pulled from has the variable max_cp as the maximum amount of CP a Pokemon can get. It should be noted that Pokemon in this dataset have their max_cp as coming from their higher powered form variations such as Mega, Primal, Shadow, and so on, which shouldn’t be an issue if you already have those variations. If you don’t or aren’t sure, you can check the pokemon_name and max_cp against the same Pokemon in Pokemon GO Database, which will give you a drop-down menu of all the variations of that Pokemon.
Let’s take a peek at the summary statistics of attack, stamina, defense, and maximum combat power. This will help us understand what is and isn’t impressive for each stat.
# Summary statistics of chosen characteristics
summary( d3 [c("base_attack",
"base_stamina",
"base_defense",
"max_cp")])## base_attack base_stamina base_defense max_cp
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 16
## 1st Qu.:119.0 1st Qu.:137.0 1st Qu.:103.0 1st Qu.:1306
## Median :165.0 Median :167.0 Median :142.0 Median :2304
## Mean :166.3 Mean :171.1 Mean :143.8 Mean :2310
## 3rd Qu.:211.0 3rd Qu.:193.0 3rd Qu.:179.0 3rd Qu.:3138
## Max. :414.0 Max. :496.0 Max. :505.0 Max. :9366
How does the data look when clustered?
# Perform Cluster Analysis
# library( mclust )
set.seed( 1234 )
fit <- Mclust( d3 )
d3$cluster <- as.factor( fit$classification )
plot( fit, what = "classification" )Good! The data shows up well with some correlation. Now let’s plot out max CP within each group.
d3$cluster <- d3$cluster <- as.factor( paste0("GROUP-",fit$classification) )
ggplot( d3, aes( x=max_cp ) ) +
geom_density( alpha = 0.5, fill="blue" ) +
xlab( "max cp" ) +
facet_wrap( ~ cluster, nrow=2 ) +
theme_minimal()9 groups?! For my purposes in Pokemon, that’s too many. Let’s try to pare that down with K-Means Clustering.
First thing you’ll need to do, is scale each variable to have a mean of 0 and a standard deviation of 1.
# K-Means Clustering Packages
# library(factoextra)
# library(cluster)
#scale each variable to have a mean of 0 and sd of 1
d.scale <- scale(d3[,c("base_attack", "base_stamina", "base_defense", "max_cp")],
center=TRUE, scale=TRUE)
head(d.scale)## base_attack base_stamina base_defense max_cp
## [1,] -0.8128315 -0.8982865 -0.63095462 -0.9248926
## [2,] -0.2571140 -0.3352733 -0.01584354 -0.3278316
## [3,] 0.5343624 0.3945586 0.86837863 0.7170251
## [4,] -0.8465114 -1.1068099 -0.97695460 -1.0625384
## [5,] -0.1392346 -0.4186827 -0.34262130 -0.3743095
## [6,] 0.9553605 0.3111492 0.56082310 0.8895293
Now we can move onto figuring out the optimal amount of clusters that would work with the dataset. In order to figure this out, we can use the elbow method by plotting the data with the fviz_nbclust() function. This function determines and visualizes the optimal number of clusters within the sum of squares.
The elbow method is a technique in cluster analysis where the elbow of the curve is the number of clusters you should use. In a way, it is sort of highlighting the spot where diminishing returns begins. As we saw earlier, my dataset has 9 groups, which is fine for our neighborhood analysis where the stakes are high. But when the stakes are low as messing with datasets and trying to find out more about my hobby, then I only really need 3-5 groups. Thankfully, it looks like the elbow is around 3-5 clusters, so I will go with 4 clusters.
Now we can use the kmeans() function to perform clustering on the dataset.The way this function is built is kmeans(dataset name, centers = number of clusters you want, nstart = number of initial configurations)
We will set centers to equal 4 so that we know will sort all the Pokemon into 4 groups, not 9. And we will set the nstart to equal 25, which will allow the algorithm to cycle through 25 configurations and use the one with the smallest within cluster variation.
## K-means clustering with 4 clusters of sizes 256, 50, 340, 361
##
## Cluster means:
## base_attack base_stamina base_defense max_cp
## 1 1.2063149 0.55982964 0.8231764 1.24646077
## 2 0.1227947 2.58030073 0.3178542 0.71654245
## 3 0.1224777 0.00486514 0.3095774 0.07743481
## 4 -0.9878082 -0.75896280 -0.9193412 -1.05609118
##
## Clustering vector:
## [1] 4 3 1 4 3 1 4 3 3 4 4 3 4 4 3 4 4 3 4 3 4 3 4 3 4 3 4 3 4 4 3 4 4 3 4 3 4
## [38] 3 4 2 4 3 4 3 3 4 3 4 3 4 3 4 3 4 3 4 3 4 1 4 4 3 4 3 1 4 3 1 4 4 3 4 3 4
## [75] 3 1 3 3 4 3 4 3 3 4 3 4 3 4 1 4 3 4 3 1 3 4 3 4 1 4 3 4 1 4 3 3 3 3 4 3 3
## [112] 1 2 3 2 4 3 4 3 4 3 3 1 3 3 3 1 3 4 1 2 4 4 2 1 1 3 3 1 4 1 1 2 1 1 1 4 3
## [149] 1 1 1 4 3 3 4 3 1 4 3 1 4 3 4 3 4 3 4 3 3 4 2 4 4 4 4 3 4 3 4 4 1 3 4 3 3
## [186] 3 4 4 3 4 4 3 4 4 3 1 3 4 3 3 4 2 3 4 3 3 3 3 4 3 3 1 3 1 3 4 1 4 3 4 3 4
## [223] 4 3 4 3 3 4 3 3 4 1 1 3 4 4 3 4 4 4 3 2 1 1 1 4 3 1 1 1 1 4 3 1 4 3 1 4 3
## [260] 1 4 3 4 3 4 4 3 4 4 4 4 3 4 4 3 4 3 4 3 4 4 1 4 3 4 3 4 3 1 4 3 4 4 4 3 4
## [297] 2 4 4 4 3 4 3 4 3 1 4 3 4 3 3 3 3 3 3 4 3 4 3 2 2 4 3 3 4 3 4 4 4 3 4 3 4
## [334] 3 3 3 3 3 4 2 4 3 4 3 4 3 4 1 4 1 3 3 4 3 4 3 3 3 3 4 4 3 4 3 2 4 3 3 3 4
## [371] 4 3 1 4 3 1 1 1 3 1 1 1 1 1 1 1 4 3 1 4 4 1 4 3 1 4 4 1 4 3 4 3 4 4 1 4 1
## [408] 3 1 4 3 4 3 3 4 3 4 4 3 4 3 4 2 3 4 2 4 3 3 1 4 3 4 4 3 4 3 4 4 4 3 3 4 3
## [445] 1 2 4 1 4 1 4 3 4 3 3 4 3 4 4 3 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 1
## [482] 1 1 1 1 1 2 2 3 1 1 1 1 1 4 3 3 4 3 1 4 3 1 4 3 4 3 1 4 3 4 3 4 3 4 3 4 2
## [519] 4 4 1 4 3 4 3 1 4 3 4 1 3 4 3 1 4 4 2 2 1 4 3 3 4 4 3 4 3 4 3 3 4 4 1 4 1
## [556] 3 4 3 4 3 3 4 3 3 3 3 1 4 3 4 3 4 3 4 3 3 4 3 1 4 3 4 3 1 4 3 3 4 1 4 2 4
## [593] 3 2 4 3 4 3 4 3 3 4 3 1 4 1 4 3 1 4 3 1 4 1 1 4 3 2 4 1 1 4 1 4 1 1 4 2 4
## [630] 2 3 3 4 3 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 4 3 1 4 3 1 4 3 3 4 3 4 4 3 4 4 3
## [667] 4 1 4 3 1 4 2 4 1 3 4 3 4 4 4 4 3 4 3 4 3 4 1 4 3 4 1 4 3 3 1 4 2 1 3 3 3
## [704] 4 3 1 3 4 3 4 3 4 1 4 1 1 1 2 1 1 1 4 3 1 4 3 1 4 3 1 4 4 1 4 3 4 3 1 4 1
## [741] 3 4 3 4 1 1 4 3 3 1 4 3 4 3 4 3 4 3 4 2 4 4 1 3 3 1 4 1 4 3 3 1 1 3 3 3 3
## [778] 3 3 1 1 4 3 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 4 3 1 4 3
## [815] 1 4 3 1 4 2 4 4 3 4 4 3 4 3 4 3 4 3 4 1 4 3 4 3 2 4 3 2 4 1 3 4 1 4 3 4 1
## [852] 4 1 4 1 4 3 1 4 4 1 1 3 1 1 1 3 4 1 3 3 4 1 1 3 3 3 4 2 3 3 3 3 1 4 3 1 1
## [889] 1 2 3 1 1 1 2 1 1 1 1 1 2 1 1 1 4 3 1 4 3 1 4 3 1 4 2 4 3 4 3 4 4 3 4 3 4
## [926] 3 4 4 1 3 4 4 1 4 1 1 4 2 4 3 4 1 4 3 4 3 4 3 3 4 3 4 3 4 3 4 4 3 4 3 3 4
## [963] 1 4 1 3 3 3 1 4 3 1 4 2 3 2 1 1 2 2 2 1 1 2 1 1 1 1 1 1 2 1 1 1 4 3 1 4 1
## [1000] 1 1 2 1 1 1 1 1
##
## Within cluster sum of squares by cluster:
## [1] 323.2009 236.0519 403.8299 364.8965
## (between_SS / total_SS = 67.0 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
There are 1007 Pokemon in this dataset, and now they are in groups of 50 to 361 Pokemon each.
To get a visualization of how kmeans() partitioned the data, use fviz_cluster():
I can certainly see that 4 groups is more than enough, and how K-Clustering can be impacted by outliers. But we have 4 distinct, easily manageable groups!
Now that the data is clustered, we can use other functions, like aggregate(), to see the means of each group:
# use aggregate() function to find means of each cluster
aggregate(d3,
by=list(cluster=km.pogo$cluster),
mean)Right off the bat, I notice that cluster 1 has the highest maxium combat power and base attack, most likely that is where all our heaviest hitters and legendary Pokemon are! In contrast, cluster 4 has the lowest of every characteristic, making it the group that has the weakest Pokemon or those that are still in their non-evolved form.
Next up, we use cbind() to add cluster group assignments back to the dataset.
#add cluster assigment to original data
d5 <- cbind(d4,
cluster = km.pogo$cluster)
#view final data
head(d5)I added the cluster information back to my “d4” dataset, since that one kept the unscaled information from “d3” as well as the Pokemon names. Which is useful! It seems like as long as the cluster groups match your original dataset, you can add them back into any variation of that dataset, as long as it has the same number of observations.
We can revisit our maximum combat power variable plots by group, but this time with only 4 instead of 9.
ggplot( d5, aes( x=max_cp ) ) +
geom_density( alpha = 0.5, fill="blue" ) +
xlab( "max cp" ) +
facet_wrap( ~ cluster, nrow=2 ) +
theme_minimal()Cluster 1 indeed has the highest density of higher combat power Pokemon, and group 4 has the lowest density.
Now for some graphs that show off cluster differences and patterns!
Starting off with seeing how each cluster ranks in terms of combat power and attack:
ggplot(d5, aes(x=max_cp,
y=base_attack,
color = as.factor(cluster))) + geom_point() + labs(colour="Cluster")We can already see that combat power was likely the main component that determined the clustering. The difference between clusters 1 and 4 is also quite apparent! And cluster 3 turns out to be the mid-range Pokemon, not too weak and not too strong. The most interesting to me is cluster 2 covering the widest range of CP, looking like the group with the most variety and outliers.
Next up, attack vs stamina:
ggplot(d5, aes(x=base_stamina, y=base_attack, color = as.factor(cluster))) + geom_point() +
labs(colour="Cluster")
Interesting! Now I can see why group 2 was clustered in that way. Group
2 must have the Pokemon with the highest stamina.
Last one just to see if there are any other patterns, stamina vs defense:
ggplot(d5, aes(x=base_stamina, y=base_defense, color = as.factor(cluster))) + geom_point() +
labs(colour="Cluster")
Hmmmm, the least organized of the bunch, but still has that distinctive
pattern of group 4 being weaker and group 1 being stronger. It just also
has a long tail of cluster 2, showing off all the higher stamina
Pokemon.
From these visualizations, I came to the following conclusion for group names:
To wrap it all up, let’s see what the top Pokemon are of each group. To figure that out, you just need to add all their characteristics together to make a “Total” column, then work with some classic dplyr functions to arrange the top 20 Pokemon of each cluster in descending order:
top_pokemon <- d5 %>%
select("pokemon_name", "base_attack", "base_defense", "base_stamina", "max_cp", cluster) %>%
mutate(Total =base_attack + base_defense + base_stamina + max_cp) %>%
arrange(cluster, desc(Total)) %>%
group_by(cluster) %>%
top_n(20)
print(top_pokemon, n = 200)## # A tibble: 80 × 7
## # Groups: cluster [4]
## pokemon_name base_attack base_defense base_stamina max_cp cluster Total
## <chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 Zacian 332 240 192 5696 1 6460
## 2 Palafin 322 196 225 5418 1 6161
## 3 Kyurem 310 183 245 5268 1 6006
## 4 Slaking 290 166 284 5069 1 5809
## 5 Regigigas 287 210 221 4972 1 5690
## 6 Calyrex 268 246 205 4845 1 5564
## 7 Roaring Moon 280 196 233 4821 1 5530
## 8 Zamazenta 250 292 192 4773 1 5507
## 9 Kyogre 270 228 205 4708 1 5411
## 10 Groudon 270 228 205 4708 1 5411
## 11 Necrozma 277 220 200 4689 1 5386
## 12 Solgaleo 255 191 264 4625 1 5335
## 13 Lunala 255 191 264 4625 1 5335
## 14 Great Tusk 249 209 251 4604 1 5313
## 15 Dialga 275 211 205 4620 1 5311
## 16 Reshiram 275 211 205 4620 1 5311
## 17 Zekrom 275 211 205 4620 1 5311
## 18 Arceus 238 238 237 4564 1 5277
## 19 Palkia 280 215 189 4565 1 5249
## 20 Meloetta 250 225 225 4544 1 5244
## 21 Eternatus 251 505 452 9366 2 10574
## 22 Iron Hands 245 177 319 4704 2 5445
## 23 Ursaluna 243 181 277 4410 2 5111
## 24 Zygarde 184 207 389 4258 2 5038
## 25 Ting-Lu 194 203 321 4041 2 4759
## 26 Giratina 187 225 284 3866 2 4562
## 27 Snorlax 190 169 330 3690 2 4379
## 28 Cetitan 208 123 347 3561 2 4239
## 29 Vaporeon 205 161 277 3563 2 4206
## 30 Bewear 226 141 260 3566 2 4193
## 31 Regidrago 202 101 400 3402 2 4105
## 32 Dondozo 176 178 312 3428 2 4094
## 33 Guzzlord 188 99 440 3303 2 4030
## 34 Copperajah 226 126 263 3409 2 4024
## 35 Blissey 129 169 496 3155 2 3949
## 36 Cresselia 152 258 260 3269 2 3939
## 37 Farigiraf 209 136 260 3261 2 3866
## 38 Hariyama 209 114 302 3236 2 3861
## 39 Aurorus 186 163 265 3206 2 3820
## 40 Braviary 213 137 242 3219 2 3811
## 41 Flygon 205 168 190 3044 3 3607
## 42 Durant 217 188 151 3043 3 3599
## 43 Crobat 194 178 198 3027 3 3597
## 44 Kingdra 194 194 181 3022 3 3591
## 45 Greninja 223 152 176 3037 3 3588
## 46 Klinklang 199 214 155 3017 3 3585
## 47 Carracosta 192 197 179 2999 3 3567
## 48 Dracozolt 195 165 207 2999 3 3566
## 49 Houndoom 224 144 181 3014 3 3563
## 50 Rabsca 201 178 181 3001 3 3561
## 51 Tauros 198 183 181 2998 3 3560
## 52 Pawmot 232 141 172 3014 3 3559
## 53 Breloom 241 144 155 3007 3 3547
## 54 Mismagius 211 187 155 2992 3 3545
## 55 Poliwrath 182 184 207 2958 3 3531
## 56 Toxtricity 224 140 181 2976 3 3521
## 57 Heliolisk 219 168 158 2974 3 3519
## 58 Rotom 204 219 137 2951 3 3511
## 59 Starmie 210 184 155 2957 3 3506
## 60 Leavanny 205 165 181 2952 3 3503
## 61 Weepinbell 172 92 163 1844 4 2271
## 62 Monferno 158 105 162 1801 4 2226
## 63 Murkrow 175 87 155 1787 4 2204
## 64 Krabby 181 124 102 1785 4 2192
## 65 Flaaffy 145 109 172 1740 4 2166
## 66 Anorith 176 100 128 1750 4 2154
## 67 Rufflet 150 97 172 1706 4 2125
## 68 Pancham 145 107 167 1703 4 2122
## 69 Larvesta 156 107 146 1712 4 2121
## 70 Luxio 159 95 155 1700 4 2109
## 71 Sableye 141 136 137 1688 4 2102
## 72 Trumbeak 159 100 146 1691 4 2096
## 73 Fletchinder 145 110 158 1681 4 2094
## 74 Yanma 154 94 163 1682 4 2093
## 75 Darumaka 153 86 172 1649 4 2060
## 76 Tranquill 144 107 158 1650 4 2059
## 77 Morgrem 145 102 163 1649 4 2059
## 78 Dolliv 137 131 141 1639 4 2048
## 79 Litleo 139 112 158 1631 4 2040
## 80 Poliwhirl 130 123 163 1623 4 2039
Now we can see which Pokemon are in which groups. It tracks that group 1 has all the legendary Pokemon, considering they are the strongest. Group 4 has the weaker, less rare Pokemon. Group 2 is still the most interesting group to me, as the Pokemon that have the highest stamina are not always the most obvious. Putting Eternatus, one of the Pokemon “gods” in the same category as Snorlax, a more commonly found Pokemon known for almost always being asleep, is quite entertaining.
I didn’t have as much of a clear end goal of what to do with this data as my last code through, which did help me in finally getting a Mega Rayquaza. However, this did help me come to a better understanding on how clustering works outside of mclust() and gave me a better appreciation for all the data that goes into one of my favorite games.
Learn more about [package, technique, dataset] with the following:
Shreya
Sur965’s “Gotta Analyze ’Em All: The Ultimate Pokémon GO
Dataset”
K-Means Clustering in R: Step-by-Step Example