2025-11-14

What determines the category of Pokémon?

pokemon <- read.csv("Pokedex_Ver_SV2.csv")
pokemon[pokemon$Name %in% c("Bulbasaur", "Moltres", "Mew", "Mewtwo"), c("Name", "Category")]
         Name       Category
1   Bulbasaur       Ordinary
194   Moltres Semi-Legendary
199    Mewtwo      Legendary
202       Mew       Mythical

Potential data points to look at.

Base Stat Total (BST), Capture Rate, Base Experience gained when Defeated, Experience needed to reach Level 100.

        Category   AvgBST AvgCaptureRate AvgExpGain AvgExpNeeded
1      Legendary 673.9783       43.78261   329.3696      1250000
2       Mythical 594.6667        9.50000   291.3667      1224648
3       Ordinary 416.8943      103.63143   142.3810      1036503
4 Semi-Legendary 575.7013       14.24675   278.8442      1250000

AvgBST: The average total base stats.
AvgCaptureRate: The average rate of capture. (higher the number = easier to catch)
AvgExpGain: The average base experience gained when defeating it.
AvgExpNeeded: The average experience needs to make them out from level 1-100.

Base Stat Totals

The Base Stat Total (BST) of a Pokémon is the sum of its six base stats:

\[ \text{BST} = \text{HP} + \text{Attack} + \text{Defense} + \text{Sp. Atk} + \text{Sp. Def} + \text{Speed} \]

The avg BST of each category type of Pokémon as well as their max and min BST visualized:

Summary Statistics of Base Stat Totals

As there seems to be some correlation between the BST and the Categories, let us properly compare them. We can look at the sample mean and sample variance of their Base Stat Totals (BST).

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \quad\text{(sample mean)} \] \[ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2 \quad\text{(sample variance)} \]

Mean and SD of BST for each Category
Category Mean_BST SD_BST
Legendary 674.0 119.6
Mythical 594.7 68.3
Ordinary 416.9 104.3
Semi-Legendary 575.7 38.0

Capture Rate

Summary Statistics of Capture Rate

There also seems to be a noticeable correlation between the Capture Rate and the Categories. The sample mean and sample variance of their Capture Rates look like this:

Category Mean_CaptureRate SD_CaptureRate
Legendary 43.8 84.2
Mythical 9.5 15.0
Ordinary 103.6 73.7
Semi-Legendary 14.2 17.2

Relationship Between Base Exp and Exp Needed

We can model the relationship between Base Experience gained (\(x\)) and Experience needed to reach level 100 (\(y\)) using a simple linear regression:

\[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \]

  • \(\beta_0\): intercept (expected \(y\) when \(x = 0\))
  • \(\beta_1\): slope (change in \(y\) for a one-unit change in \(x\))
  • \(\varepsilon_i\): random error term
Regression Coefficients
Estimate Std. Error t value Pr(>|t|)
Experience_Type 969069.799 8971.157 108.021 0
Base_Experience 580.033 48.759 11.896 0

R-squared: 0.105

Base Experience Gained + Experience needed to max level