Pokémon & How Their Stats Compare

The Pokémon franchise is a large multimedia franchise with many individual aspects. Whether it’s the trading cards, the shows/movies, or the games, they have a large presence everywhere you look.

I have always been interested in the Pokémon games. While I have never played one (as nintendo consoles are expensive), I have always had a fascination with how they are able to make over a thousand different creatures feel and play completely different from each other. In this document, I intend to analyze how specific aspects of a Pokémon’s stats affect and compare to one-another.

To do this, we will need these libraries:

library(tidyverse)  # The tidyverse collection of packages
Warning: package 'tidyverse' was built under R version 4.4.3
Warning: package 'ggplot2' was built under R version 4.4.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr)       # Useful for web authentication
library(rvest)      # Useful tools for working with HTML and XML

Attaching package: 'rvest'

The following object is masked from 'package:readr':

    guess_encoding
library(polite)     # Promoting responsible web scraping
Warning: package 'polite' was built under R version 4.4.3

and for our data, I have it hosted here:

pokemonData = read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/campisec_xavier_edu/IQAeZikL_0i2TJcNjTgqBBQwAUwIthQAg4M8axnll64bvQM?download=1")
Rows: 1219 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): ID, Name, Type
dbl (7): Total, Hp, Attack, Defense, Sp.Atk, Sp.Def, Speed

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Obtaining the data

Before I begin with analysis, I want to showcase how and where I was able to obtain my data.

There are many websites where you can find the stats and traits of all the different Pokémon. I chose to look for a website that is open and is easy to navigate, in which I ended up on:

https://pokemondb.net/pokedex/all

This website gives a great visualization of all the data I will be working with, as well as more in-depth data that can be obtained by going to specific Pokémon pages by clicking on their respective names.

Analysis: The different stats and how they interact

In the Pokémon games, their power is divided up into multiple “stats”. These stats are

  • Hp: How much health a Pokémon haps compared to others

  • Attack: How much power their physical attacks have

  • Defense: How much damage they can reduce from physical attacks

  • Sp.Atk (Special Attack): How much power their non-physical, usually more elemental attacks deal

  • Sp.Def (Special Defense): The same as Defense, but for Special Attacks.

  • Speed: In battle, whichever Pokémon has the higher speed gets to attack first.

You can see how this appears in the dataset:

head(pokemonData)
# A tibble: 6 × 10
  ID    Name                Type  Total    Hp Attack Defense Sp.Atk Sp.Def Speed
  <chr> <chr>               <chr> <dbl> <dbl>  <dbl>   <dbl>  <dbl>  <dbl> <dbl>
1 0001  Bulbasaur           "Gra…   318    45     49      49     65     65    45
2 0002  Ivysaur             "Gra…   405    60     62      63     80     80    60
3 0003  Venusaur            "Gra…   525    80     82      83    100    100    80
4 0003  Venusaur Mega Venu… "Gra…   625    80    100     123    122    120    80
5 0004  Charmander          "Fir…   309    39     52      43     60     50    65
6 0005  Charmeleon          "Fir…   405    58     64      58     80     65    80

I want to look at some of the relationships between some of the stats, specifically:

  1. Does an increase in Defense tend to coincide with an increase of Sp.Def?

  2. Does an increase in Attack tend to coincide with an increase of Sp.Atk

  3. How does increases in Attack affect Defense

  4. How does increases in Sp.Atk affect Sp.Def

Q1: Defense vs Sp.Def

When I think “Tanky” Pokémon, I tend to think of a Pokémon with high health, defense, and special defense, but is there a relation between defense and special defense?

pokemonData %>% 
  ggplot( aes(x=Defense, y=Sp.Def) ) +
  geom_point() +
  geom_smooth() +
  labs(title="Pokémon Defense vs. Special Defense")
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

While the correlation tends to be positive, there are many outliers. However, ignoring these outliers as special, gimicky Pokémon or Pokémon who are more specialists, we can see that, usually, they have a positive association. In fact, if we add a x=y line to the graph..

pokemonData %>% 
  ggplot( aes(x=Defense, y=Sp.Def) ) +
  geom_point() +
  geom_smooth() +
  geom_abline(slope=1,color="red") +
  labs(title="Pokémon Defense vs. Special Defense")
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

We can see that a substantial amount of entries lie on the line where x=y! This makes sense in a gameplay design standpoint, where having many Pokémon with very similar defense and special defense allow them to serve as more “generalists” that can withstand a beating from both types of attacks in a similar manner.

Q2: Attack vs. Sp.Atk

Going off of what we saw in the previous analysis, I believe that a graph of the Pokémons offensive capababilities will appear similar to one of their defensive abilities, with a positive trend with many outliers that serve as more specialist Pokémon.

pokemonData %>% 
  ggplot( aes(x=Attack, y=Sp.Atk) ) +
  geom_point() +
  geom_smooth() +
  labs(title="Pokémon Attack vs. Special Attack")
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

While the axis in this plot are different from the ones above, we can still certainly see a similar positive association! In fact, we can even see many of the Attack=Sp.Atk Pokémon forming an apparent line in the middle of the plot!

Q3 & Q4: Attack vs. Defense & Sp.Atk vs. Sp.Def

As both of these pairs of values share the same values they are increasing/decreasing, I opted to analyze them both at the same time.

I intend to see an even stronger linear correlation, as Pokémon who specialize in physical combat probably have both a high Attack and Defense compared to one who uses more special moves, and vice versa.

pokemonData %>% 
  ggplot( aes(x=Attack, y=Defense) ) +
  geom_point() +
  geom_smooth() +
  labs(title="Pokémon Attack vs. Defense")
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

pokemonData %>% 
  ggplot( aes(x=Sp.Atk, y=Sp.Def) ) +
  geom_point() +
  geom_smooth() +
  labs(title="Pokémon Special Attack vs. Special Defense")
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

These two plots share a very similar story. Although they both have completely different variables for their x and y axis, they share very similar graph structure where there is a clear positive association with many outlier that dictate gimmicky and specialist Pokémon.

Conclusion

After concluding my analysis, I can that there is certainly many associations between the multiple types of stats, atleast the attacking/defensive ones. While there are many Pokémon that tend to serve as “gimmicks” or “specialists” that favor one stat over all others, most of the Pokémon tend to share the trend of having similar values as their numbers climb. Even when coming Pokémon with many points in both Attack and Defense compared to one with below-average points in those stats, they tend to carry similar proportions in how they distribute these stats.