Week 3 Data Dive

This week we explore the ‘group_by’ function and investigate more patterns in our data. In particular, we will investigate things such as

As usual, here is the summary of our current data set.

library(readr)
pokemon <- read_csv("C:\\Users\\danid\\OneDrive\\Desktop\\pokemon.csv")
## Rows: 801 Columns: 41
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): abilities, capture_rate, classfication, japanese_name, name, type1...
## dbl (34): against_bug, against_dark, against_dragon, against_electric, again...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(pokemon)
##   abilities          against_bug      against_dark   against_dragon  
##  Length:801         Min.   :0.2500   Min.   :0.250   Min.   :0.0000  
##  Class :character   1st Qu.:0.5000   1st Qu.:1.000   1st Qu.:1.0000  
##  Mode  :character   Median :1.0000   Median :1.000   Median :1.0000  
##                     Mean   :0.9963   Mean   :1.057   Mean   :0.9688  
##                     3rd Qu.:1.0000   3rd Qu.:1.000   3rd Qu.:1.0000  
##                     Max.   :4.0000   Max.   :4.000   Max.   :2.0000  
##                                                                      
##  against_electric against_fairy   against_fight    against_fire  
##  Min.   :0.000    Min.   :0.250   Min.   :0.000   Min.   :0.250  
##  1st Qu.:0.500    1st Qu.:1.000   1st Qu.:0.500   1st Qu.:0.500  
##  Median :1.000    Median :1.000   Median :1.000   Median :1.000  
##  Mean   :1.074    Mean   :1.069   Mean   :1.066   Mean   :1.135  
##  3rd Qu.:1.000    3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:2.000  
##  Max.   :4.000    Max.   :4.000   Max.   :4.000   Max.   :4.000  
##                                                                  
##  against_flying  against_ghost   against_grass   against_ground 
##  Min.   :0.250   Min.   :0.000   Min.   :0.250   Min.   :0.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:0.500   1st Qu.:1.000  
##  Median :1.000   Median :1.000   Median :1.000   Median :1.000  
##  Mean   :1.193   Mean   :0.985   Mean   :1.034   Mean   :1.098  
##  3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000  
##  Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :4.000  
##                                                                 
##   against_ice    against_normal  against_poison   against_psychic
##  Min.   :0.250   Min.   :0.000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.500   1st Qu.:1.000   1st Qu.:0.5000   1st Qu.:1.000  
##  Median :1.000   Median :1.000   Median :1.0000   Median :1.000  
##  Mean   :1.208   Mean   :0.887   Mean   :0.9753   Mean   :1.005  
##  3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:1.0000   3rd Qu.:1.000  
##  Max.   :4.000   Max.   :1.000   Max.   :4.0000   Max.   :4.000  
##                                                                  
##   against_rock  against_steel    against_water       attack      
##  Min.   :0.25   Min.   :0.2500   Min.   :0.250   Min.   :  5.00  
##  1st Qu.:1.00   1st Qu.:0.5000   1st Qu.:0.500   1st Qu.: 55.00  
##  Median :1.00   Median :1.0000   Median :1.000   Median : 75.00  
##  Mean   :1.25   Mean   :0.9835   Mean   :1.058   Mean   : 77.86  
##  3rd Qu.:2.00   3rd Qu.:1.0000   3rd Qu.:1.000   3rd Qu.:100.00  
##  Max.   :4.00   Max.   :4.0000   Max.   :4.000   Max.   :185.00  
##                                                                  
##  base_egg_steps  base_happiness     base_total    capture_rate      
##  Min.   : 1280   Min.   :  0.00   Min.   :180.0   Length:801        
##  1st Qu.: 5120   1st Qu.: 70.00   1st Qu.:320.0   Class :character  
##  Median : 5120   Median : 70.00   Median :435.0   Mode  :character  
##  Mean   : 7191   Mean   : 65.36   Mean   :428.4                     
##  3rd Qu.: 6400   3rd Qu.: 70.00   3rd Qu.:505.0                     
##  Max.   :30720   Max.   :140.00   Max.   :780.0                     
##                                                                     
##  classfication         defense       experience_growth    height_m     
##  Length:801         Min.   :  5.00   Min.   : 600000   Min.   : 0.100  
##  Class :character   1st Qu.: 50.00   1st Qu.:1000000   1st Qu.: 0.600  
##  Mode  :character   Median : 70.00   Median :1000000   Median : 1.000  
##                     Mean   : 73.01   Mean   :1054996   Mean   : 1.164  
##                     3rd Qu.: 90.00   3rd Qu.:1059860   3rd Qu.: 1.500  
##                     Max.   :230.00   Max.   :1640000   Max.   :14.500  
##                                                        NA's   :20      
##        hp         japanese_name          name           percentage_male 
##  Min.   :  1.00   Length:801         Length:801         Min.   :  0.00  
##  1st Qu.: 50.00   Class :character   Class :character   1st Qu.: 50.00  
##  Median : 65.00   Mode  :character   Mode  :character   Median : 50.00  
##  Mean   : 68.96                                         Mean   : 55.16  
##  3rd Qu.: 80.00                                         3rd Qu.: 50.00  
##  Max.   :255.00                                         Max.   :100.00  
##                                                         NA's   :98      
##  pokedex_number   sp_attack        sp_defense         speed       
##  Min.   :  1    Min.   : 10.00   Min.   : 20.00   Min.   :  5.00  
##  1st Qu.:201    1st Qu.: 45.00   1st Qu.: 50.00   1st Qu.: 45.00  
##  Median :401    Median : 65.00   Median : 66.00   Median : 65.00  
##  Mean   :401    Mean   : 71.31   Mean   : 70.91   Mean   : 66.33  
##  3rd Qu.:601    3rd Qu.: 91.00   3rd Qu.: 90.00   3rd Qu.: 85.00  
##  Max.   :801    Max.   :194.00   Max.   :230.00   Max.   :180.00  
##                                                                   
##     type1              type2             weight_kg        generation  
##  Length:801         Length:801         Min.   :  0.10   Min.   :1.00  
##  Class :character   Class :character   1st Qu.:  9.00   1st Qu.:2.00  
##  Mode  :character   Mode  :character   Median : 27.30   Median :4.00  
##                                        Mean   : 61.38   Mean   :3.69  
##                                        3rd Qu.: 64.80   3rd Qu.:5.00  
##                                        Max.   :999.90   Max.   :7.00  
##                                        NA's   :20                     
##   is_legendary    
##  Min.   :0.00000  
##  1st Qu.:0.00000  
##  Median :0.00000  
##  Mean   :0.08739  
##  3rd Qu.:0.00000  
##  Max.   :1.00000  
## 

Are Pokemon More Fragile Nowadays?

As of today (2026), there are a total of nine generations of new Pokemon. Each generation of Pokemon created provided fresh contributions to the franchise such as new pocket monsters to play with, new abilities, and, specific to gen 6, a new Pokemon type. One question that we can ask of these new pokemon, is if they have gotten stronger or weaker over time.

There are six stats attached to a pokemon:

  1. Health Points (HP)
  2. Attack
  3. Special Attack
  4. Defense
  5. Special Defense
  6. Speed

While it could be argued that an increase in any of these stats could contribute to a Pokemon’s strength, for this query, we are specifically interested in whether Pokemon generally have gotten more fragile over time by way of their hp. Let’s investigate.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
hp <- pokemon |> 
      group_by(generation) |>
      summarize(count_ = n(), mean_hp = mean(hp))

We notice a few things from this table. We can see here that the number of Pokemon in the most recent generations (gen 6 and 7) have dropped off considerably (more than half from gen. 1 to gen. 6, and slightly more than half from gen. 1 to gen. 7). Also, it does not appear that hp has experience much variance on average across generations. Let’s visualize this.

library(ggplot2)
hp |>
  ggplot() +
  geom_line(mapping = aes(x = generation, y = mean_hp)) +
  expand_limits(y = 0) +  
  #theme_hc() + 
  labs(title = "Mean HP by Generation",
       subtitle = "Includes evolutions",
       x = "Generation", y = "HP") +
  theme(plot.subtitle = element_text(colour = "darkgray"))

Based on our table and visualization, we can see here that the hp stats of Pokemon on average have slightly increased over time since gen. 4. However, it should be noted that the drop in number of Pokemon per generation will skew our data and affect our concluding statement. To put it simply, having a smaller data set will increase our chances of variance as opposed to our larger set (again, peak at gen. 1 and gen 6).

Hardest Hitters

Going back to what makes Pokemon stronger, a stat that can directly correlate to this is the attack stat.

Pokemon move categories are generally put into 3 categories: physical, special and status. The attack stat number directly correlates to the amount of physical damage a Pokemon can deal in a turn. Most Pokemon have a higher special attack or attack stat, and even fewer are balanced. For this query, we want to look into which typings have the highest attack stat regardless of generation.

library(dplyr)
offensive_typing <- pokemon|>
                    group_by(type1, type2) |>
                    summarize(count_ =n(), max_att = max(attack))
## `summarise()` has grouped output by 'type1'. You can override using the
## `.groups` argument.

Our table looks at all possible combinations of typing and determines the highest attack stats. For our purposes, if a Pokemon has at least one of the types, we will count that Pokemon as being part of that group. For example, a Pokemon that is bug/flying will have its attack stat contributing to both ‘bug’ and ‘flying’ type Pokemon. Here’s the associated visualization:

library(ggplot2)
offensive_typing |>
  ggplot() +
  geom_boxplot(mapping = aes(x = type1, y = max_att)) +
  coord_flip()+
  labs(title="Max Attack Stat by Type") +  # labels!
  theme_minimal()

As we can see here, a Pokemon with ‘bug’ typing has the greatest attack stat across typings. Looking deeper into bug types, we can see that the highest attack stat held by a bug Pokemon is approx. in the 180s, while the lowest attack stat is approx. in the high 40s. The median is around 110, with the first quartile containing values approx. from ~80-109, and the third quartile containing a larger range of ~111-150.

While this is what we can gather from our visualization, it should be noted that there are not an equal number of types of Pokemon across generations. Furthermore, it is often the case that, when Pokemon evolve, they take on a new type. This contextual information paired with our presentations means we can make general statements regarding Pokemon’s attack stats, but that further investigation should be conducted for rigor.

Dragons, and Whales, and Bears, Oh My!: Legendary Pokemon By Gen.

For our next query, we are interested in the number of legendary Pokemon across generations.

Legendary Pokemon are rare Pokemon that typically appear once at the end of a Pokemon game upon completion of the main story. There is usually only one of these classification of Pokemon per game, and they often come with rare stats or abilities (a bonus perk is that they also often end up on the cover of the game that generation). Below, let’s look at the legendaries that exist by generation and their type(s):

library(dplyr)
legendary_gen <-pokemon%>%
              group_by(generation, name)%>%
              filter(any(is_legendary == 1)) %>%
              summarize(type1, type2)
## `summarise()` has grouped output by 'generation'. You can override using the
## `.groups` argument.

Here we have ordered the legendaries by ascending generation and can see their name and type(s). As expected, there are a small number of legendary Pokemon in comparison to the total number of non-legendary Pokemon, and they all have varying types. Out of curiosity, I wonder what the most common legendary type is across generations?:

library(ggplot2)
poke_colors <-c("bug"="#a3a624", "dark"="#444850", "dragon"="#5b65cf", "electric"="#fcdb0b", "fairy"="#ffb1fe", "fire"="#fe6227", "flying"="#90cafe", "ghost"="#6c476f", "grass"="#47b831", "ground"="#a77d36", "ice"="#69e3f8", "normal"="#8c9e9d", "psychic"="#fd708a", "rock"="#b6ba87", "steel"="#70abd0")
 legendary_gen |>
  ggplot() +  
  geom_bar(mapping = aes(x = generation, fill = type1)) +
  theme_minimal() +
  scale_fill_manual(values = poke_colors)

Wonderful! From our visualization here, we can clearly see that ‘psychic’ typing is dominating the legendary pool. Again, it should be noted that the typing displayed include both the primary and secondary type columns (type1 and type2 respectively). It’s also interesting to look at other things! For example, the first grass legendary didn’t appear until gen. 4, and the first dragon legendaries didn’t appear until gen. 3 (contextually, dragon typing was rare in the Pokemon franchise until later games).

I hope you enjoy the colors from the official Pokemon type match ups :)

Type Combinations

Speaking of Pokemon types, I wonder what the most and least popular type combinations are across gen. 1-7.

library(dplyr)
type_matchups <- pokemon|>
                    group_by(type1, type2) |>
                    summarize(count_ =n())
## `summarise()` has grouped output by 'type1'. You can override using the
## `.groups` argument.

That’s a lot of combinations, so what can we see? Well, we may notice that bug/normal does not exist just looking at the first several rows. Looking at the history of the franchise, although bug typing is quite popular across generations, we might make a guess that the lack of normal typing may be due to strategy. Bug type moves are resisted by several types such as fire, flying and poison and the Pokemon themselves are weak to fire, flying and rock attacks. Fire, flying and rock are common typings, and are easily found at the early stages of any Pokemon game. Following this line of thought, we could guess that the addition of the normal typing would add an additional fighting type weakness and would also not add much value to its weak match ups (normal is ineffective against rock).

Pure normal and pure water are the most common pure types, and normal/flying is the most common cross type with 26 pokemon having this typing (by count, ‘normal/flying’ and ‘flying/normal’ appear differently by visualization). We could follow the same logic as before about type match ups to the reason behind these commonalities, and I think we could also argue that these Pokemon types are the most accessible early game, when the mechanics reward you for battling and catching other Pokemon.

Let’s see what this looks like.

library(forcats)
type_matchups |>
  ggplot() +  
  geom_bar(mapping = aes(x = fct_rev(fct_infreq(type1)), fill = type2)) +
  labs(x = "Type1", y="Count", Title="Pokemon Type Combinations") +
  coord_flip() +
  theme_minimal() +
  scale_fill_manual(values = poke_colors)
## Ignoring unknown labels:
## • Title : "Pokemon Type Combinations"

Interesting stuff. Looking at the chart, if we were to look at the y-value for flying (originally x-value, but it’s flipped), it would appear that there are no bug/flying Pokemon. However, this isn’t true. While this is in part due to our table, contextually, certain types are more likely to be primary (type1) as opposed to secondary (type2) in their naming conventions. As such, we can ascertain which Pokemon have more values that are primary or secondary. For example, we can see that more Pokemon share a primary ‘bug’ typing as opposed to ‘flying’.

Bug Out!

I happen to really like bug types, if you can’t tell. Therefore, for our last query, let’s look into the bug/flying Pokemon and their abilities.

library(dplyr)
missing_abilities <- pokemon%>%
                     group_by(abilities, type1, type2) %>%
                     filter(any((type1=="bug" & type2=="flying") | (type1=="flying" & type2=="bug"))) %>%
                     summarize(name)
## `summarise()` has grouped output by 'abilities', 'type1'. You can override
## using the `.groups` argument.

As we can see from our table, no Pokemon in the set with the type bug/flying has just one ability. With respect to the game mechanics, this is likely to increase the possible outcomes of Pokemon battle and add to the number of strategies players can use throughout the game. Also, as a fun fact, Pokemon are based around real animals and plants found in nature. This makes abilities such as ‘Swarm’ and ‘Compound Eyes’ cute wordplay moments based around what the Pokemon looks like and its design inspiration.

Finally, the least common ability combination is ‘Honey gather’ and ‘Hustle’, while the most common ability for a bug/flying type in this set to have access to is ‘swarm’. Referring again to our previous reasoning, Honey Gather, is the ability of the Pokemon Combee, who is based around (you guessed it) a bee. Its evolutions follow this design pattern, with its final evolution being the queen bee, Vespiquen. Hustle is an ability that increases the attack stat and lowers accuracy, and, while bees normally have great eyesight, they lack distance vision.