This week we explore the ‘group_by’ function and investigate more patterns in our data. In particular, we will investigate things such as
Type Matchups
Legendary Pokemon by Generation
Abilities
…and more!
As usual, here is the summary of our current data set.
library(readr)
pokemon <- read_csv("C:\\Users\\danid\\OneDrive\\Desktop\\pokemon.csv")
## Rows: 801 Columns: 41
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): abilities, capture_rate, classfication, japanese_name, name, type1...
## dbl (34): against_bug, against_dark, against_dragon, against_electric, again...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(pokemon)
## abilities against_bug against_dark against_dragon
## Length:801 Min. :0.2500 Min. :0.250 Min. :0.0000
## Class :character 1st Qu.:0.5000 1st Qu.:1.000 1st Qu.:1.0000
## Mode :character Median :1.0000 Median :1.000 Median :1.0000
## Mean :0.9963 Mean :1.057 Mean :0.9688
## 3rd Qu.:1.0000 3rd Qu.:1.000 3rd Qu.:1.0000
## Max. :4.0000 Max. :4.000 Max. :2.0000
##
## against_electric against_fairy against_fight against_fire
## Min. :0.000 Min. :0.250 Min. :0.000 Min. :0.250
## 1st Qu.:0.500 1st Qu.:1.000 1st Qu.:0.500 1st Qu.:0.500
## Median :1.000 Median :1.000 Median :1.000 Median :1.000
## Mean :1.074 Mean :1.069 Mean :1.066 Mean :1.135
## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:2.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
##
## against_flying against_ghost against_grass against_ground
## Min. :0.250 Min. :0.000 Min. :0.250 Min. :0.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.500 1st Qu.:1.000
## Median :1.000 Median :1.000 Median :1.000 Median :1.000
## Mean :1.193 Mean :0.985 Mean :1.034 Mean :1.098
## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
##
## against_ice against_normal against_poison against_psychic
## Min. :0.250 Min. :0.000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.500 1st Qu.:1.000 1st Qu.:0.5000 1st Qu.:1.000
## Median :1.000 Median :1.000 Median :1.0000 Median :1.000
## Mean :1.208 Mean :0.887 Mean :0.9753 Mean :1.005
## 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:1.000
## Max. :4.000 Max. :1.000 Max. :4.0000 Max. :4.000
##
## against_rock against_steel against_water attack
## Min. :0.25 Min. :0.2500 Min. :0.250 Min. : 5.00
## 1st Qu.:1.00 1st Qu.:0.5000 1st Qu.:0.500 1st Qu.: 55.00
## Median :1.00 Median :1.0000 Median :1.000 Median : 75.00
## Mean :1.25 Mean :0.9835 Mean :1.058 Mean : 77.86
## 3rd Qu.:2.00 3rd Qu.:1.0000 3rd Qu.:1.000 3rd Qu.:100.00
## Max. :4.00 Max. :4.0000 Max. :4.000 Max. :185.00
##
## base_egg_steps base_happiness base_total capture_rate
## Min. : 1280 Min. : 0.00 Min. :180.0 Length:801
## 1st Qu.: 5120 1st Qu.: 70.00 1st Qu.:320.0 Class :character
## Median : 5120 Median : 70.00 Median :435.0 Mode :character
## Mean : 7191 Mean : 65.36 Mean :428.4
## 3rd Qu.: 6400 3rd Qu.: 70.00 3rd Qu.:505.0
## Max. :30720 Max. :140.00 Max. :780.0
##
## classfication defense experience_growth height_m
## Length:801 Min. : 5.00 Min. : 600000 Min. : 0.100
## Class :character 1st Qu.: 50.00 1st Qu.:1000000 1st Qu.: 0.600
## Mode :character Median : 70.00 Median :1000000 Median : 1.000
## Mean : 73.01 Mean :1054996 Mean : 1.164
## 3rd Qu.: 90.00 3rd Qu.:1059860 3rd Qu.: 1.500
## Max. :230.00 Max. :1640000 Max. :14.500
## NA's :20
## hp japanese_name name percentage_male
## Min. : 1.00 Length:801 Length:801 Min. : 0.00
## 1st Qu.: 50.00 Class :character Class :character 1st Qu.: 50.00
## Median : 65.00 Mode :character Mode :character Median : 50.00
## Mean : 68.96 Mean : 55.16
## 3rd Qu.: 80.00 3rd Qu.: 50.00
## Max. :255.00 Max. :100.00
## NA's :98
## pokedex_number sp_attack sp_defense speed
## Min. : 1 Min. : 10.00 Min. : 20.00 Min. : 5.00
## 1st Qu.:201 1st Qu.: 45.00 1st Qu.: 50.00 1st Qu.: 45.00
## Median :401 Median : 65.00 Median : 66.00 Median : 65.00
## Mean :401 Mean : 71.31 Mean : 70.91 Mean : 66.33
## 3rd Qu.:601 3rd Qu.: 91.00 3rd Qu.: 90.00 3rd Qu.: 85.00
## Max. :801 Max. :194.00 Max. :230.00 Max. :180.00
##
## type1 type2 weight_kg generation
## Length:801 Length:801 Min. : 0.10 Min. :1.00
## Class :character Class :character 1st Qu.: 9.00 1st Qu.:2.00
## Mode :character Mode :character Median : 27.30 Median :4.00
## Mean : 61.38 Mean :3.69
## 3rd Qu.: 64.80 3rd Qu.:5.00
## Max. :999.90 Max. :7.00
## NA's :20
## is_legendary
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.00000
## Mean :0.08739
## 3rd Qu.:0.00000
## Max. :1.00000
##
As of today (2026), there are a total of nine generations of new Pokemon. Each generation of Pokemon created provided fresh contributions to the franchise such as new pocket monsters to play with, new abilities, and, specific to gen 6, a new Pokemon type. One question that we can ask of these new pokemon, is if they have gotten stronger or weaker over time.
There are six stats attached to a pokemon:
While it could be argued that an increase in any of these stats could contribute to a Pokemon’s strength, for this query, we are specifically interested in whether Pokemon generally have gotten more fragile over time by way of their hp. Let’s investigate.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
hp <- pokemon |>
group_by(generation) |>
summarize(count_ = n(), mean_hp = mean(hp))
We notice a few things from this table. We can see here that the number of Pokemon in the most recent generations (gen 6 and 7) have dropped off considerably (more than half from gen. 1 to gen. 6, and slightly more than half from gen. 1 to gen. 7). Also, it does not appear that hp has experience much variance on average across generations. Let’s visualize this.
library(ggplot2)
hp |>
ggplot() +
geom_line(mapping = aes(x = generation, y = mean_hp)) +
expand_limits(y = 0) +
#theme_hc() +
labs(title = "Mean HP by Generation",
subtitle = "Includes evolutions",
x = "Generation", y = "HP") +
theme(plot.subtitle = element_text(colour = "darkgray"))
Based on our table and visualization, we can see here that the hp stats of Pokemon on average have slightly increased over time since gen. 4. However, it should be noted that the drop in number of Pokemon per generation will skew our data and affect our concluding statement. To put it simply, having a smaller data set will increase our chances of variance as opposed to our larger set (again, peak at gen. 1 and gen 6).
Going back to what makes Pokemon stronger, a stat that can directly correlate to this is the attack stat.
Pokemon move categories are generally put into 3 categories: physical, special and status. The attack stat number directly correlates to the amount of physical damage a Pokemon can deal in a turn. Most Pokemon have a higher special attack or attack stat, and even fewer are balanced. For this query, we want to look into which typings have the highest attack stat regardless of generation.
library(dplyr)
offensive_typing <- pokemon|>
group_by(type1, type2) |>
summarize(count_ =n(), max_att = max(attack))
## `summarise()` has grouped output by 'type1'. You can override using the
## `.groups` argument.
Our table looks at all possible combinations of typing and determines the highest attack stats. For our purposes, if a Pokemon has at least one of the types, we will count that Pokemon as being part of that group. For example, a Pokemon that is bug/flying will have its attack stat contributing to both ‘bug’ and ‘flying’ type Pokemon. Here’s the associated visualization:
library(ggplot2)
offensive_typing |>
ggplot() +
geom_boxplot(mapping = aes(x = type1, y = max_att)) +
coord_flip()+
labs(title="Max Attack Stat by Type") + # labels!
theme_minimal()
As we can see here, a Pokemon with ‘bug’ typing has the greatest attack stat across typings. Looking deeper into bug types, we can see that the highest attack stat held by a bug Pokemon is approx. in the 180s, while the lowest attack stat is approx. in the high 40s. The median is around 110, with the first quartile containing values approx. from ~80-109, and the third quartile containing a larger range of ~111-150.
While this is what we can gather from our visualization, it should be noted that there are not an equal number of types of Pokemon across generations. Furthermore, it is often the case that, when Pokemon evolve, they take on a new type. This contextual information paired with our presentations means we can make general statements regarding Pokemon’s attack stats, but that further investigation should be conducted for rigor.
For our next query, we are interested in the number of legendary Pokemon across generations.
Legendary Pokemon are rare Pokemon that typically appear once at the end of a Pokemon game upon completion of the main story. There is usually only one of these classification of Pokemon per game, and they often come with rare stats or abilities (a bonus perk is that they also often end up on the cover of the game that generation). Below, let’s look at the legendaries that exist by generation and their type(s):
library(dplyr)
legendary_gen <-pokemon%>%
group_by(generation, name)%>%
filter(any(is_legendary == 1)) %>%
summarize(type1, type2)
## `summarise()` has grouped output by 'generation'. You can override using the
## `.groups` argument.
Here we have ordered the legendaries by ascending generation and can see their name and type(s). As expected, there are a small number of legendary Pokemon in comparison to the total number of non-legendary Pokemon, and they all have varying types. Out of curiosity, I wonder what the most common legendary type is across generations?:
library(ggplot2)
poke_colors <-c("bug"="#a3a624", "dark"="#444850", "dragon"="#5b65cf", "electric"="#fcdb0b", "fairy"="#ffb1fe", "fire"="#fe6227", "flying"="#90cafe", "ghost"="#6c476f", "grass"="#47b831", "ground"="#a77d36", "ice"="#69e3f8", "normal"="#8c9e9d", "psychic"="#fd708a", "rock"="#b6ba87", "steel"="#70abd0")
legendary_gen |>
ggplot() +
geom_bar(mapping = aes(x = generation, fill = type1)) +
theme_minimal() +
scale_fill_manual(values = poke_colors)
Wonderful! From our visualization here, we can clearly see that ‘psychic’ typing is dominating the legendary pool. Again, it should be noted that the typing displayed include both the primary and secondary type columns (type1 and type2 respectively). It’s also interesting to look at other things! For example, the first grass legendary didn’t appear until gen. 4, and the first dragon legendaries didn’t appear until gen. 3 (contextually, dragon typing was rare in the Pokemon franchise until later games).
I hope you enjoy the colors from the official Pokemon type match ups :)
Speaking of Pokemon types, I wonder what the most and least popular type combinations are across gen. 1-7.
library(dplyr)
type_matchups <- pokemon|>
group_by(type1, type2) |>
summarize(count_ =n())
## `summarise()` has grouped output by 'type1'. You can override using the
## `.groups` argument.
That’s a lot of combinations, so what can we see? Well, we may notice that bug/normal does not exist just looking at the first several rows. Looking at the history of the franchise, although bug typing is quite popular across generations, we might make a guess that the lack of normal typing may be due to strategy. Bug type moves are resisted by several types such as fire, flying and poison and the Pokemon themselves are weak to fire, flying and rock attacks. Fire, flying and rock are common typings, and are easily found at the early stages of any Pokemon game. Following this line of thought, we could guess that the addition of the normal typing would add an additional fighting type weakness and would also not add much value to its weak match ups (normal is ineffective against rock).
Pure normal and pure water are the most common pure types, and normal/flying is the most common cross type with 26 pokemon having this typing (by count, ‘normal/flying’ and ‘flying/normal’ appear differently by visualization). We could follow the same logic as before about type match ups to the reason behind these commonalities, and I think we could also argue that these Pokemon types are the most accessible early game, when the mechanics reward you for battling and catching other Pokemon.
Let’s see what this looks like.
library(forcats)
type_matchups |>
ggplot() +
geom_bar(mapping = aes(x = fct_rev(fct_infreq(type1)), fill = type2)) +
labs(x = "Type1", y="Count", Title="Pokemon Type Combinations") +
coord_flip() +
theme_minimal() +
scale_fill_manual(values = poke_colors)
## Ignoring unknown labels:
## • Title : "Pokemon Type Combinations"
Interesting stuff. Looking at the chart, if we were to look at the y-value for flying (originally x-value, but it’s flipped), it would appear that there are no bug/flying Pokemon. However, this isn’t true. While this is in part due to our table, contextually, certain types are more likely to be primary (type1) as opposed to secondary (type2) in their naming conventions. As such, we can ascertain which Pokemon have more values that are primary or secondary. For example, we can see that more Pokemon share a primary ‘bug’ typing as opposed to ‘flying’.
I happen to really like bug types, if you can’t tell. Therefore, for our last query, let’s look into the bug/flying Pokemon and their abilities.
library(dplyr)
missing_abilities <- pokemon%>%
group_by(abilities, type1, type2) %>%
filter(any((type1=="bug" & type2=="flying") | (type1=="flying" & type2=="bug"))) %>%
summarize(name)
## `summarise()` has grouped output by 'abilities', 'type1'. You can override
## using the `.groups` argument.
As we can see from our table, no Pokemon in the set with the type bug/flying has just one ability. With respect to the game mechanics, this is likely to increase the possible outcomes of Pokemon battle and add to the number of strategies players can use throughout the game. Also, as a fun fact, Pokemon are based around real animals and plants found in nature. This makes abilities such as ‘Swarm’ and ‘Compound Eyes’ cute wordplay moments based around what the Pokemon looks like and its design inspiration.
Finally, the least common ability combination is ‘Honey gather’ and ‘Hustle’, while the most common ability for a bug/flying type in this set to have access to is ‘swarm’. Referring again to our previous reasoning, Honey Gather, is the ability of the Pokemon Combee, who is based around (you guessed it) a bee. Its evolutions follow this design pattern, with its final evolution being the queen bee, Vespiquen. Hustle is an ability that increases the attack stat and lowers accuracy, and, while bees normally have great eyesight, they lack distance vision.