Pokémon Stats: What Makes a Strong Pokémon?

Author

Mark Lukeman

Published

December 4, 2025

Introduction

In the Pokémon games, each Pokémon is assigned a set of base stats that show how strong it is designed to be in battle. These stats include HP, Attack, Defense, Special Attack, Special Defense, and Speed. Game designers can use these stats to create glass cannons, bulky walls, or fast utility Pokémon.

In this project I focus on Pokémon from Generations I–VI (747 Pokémon total) and treat the Total base stat score (added all six stats together) as a measure of overall strength. I use this dataset to answer three questions:

Q1: Is there a correlation between a Pokémon’s Speed and its total base stat score? Do faster Pokémon tend to be stronger overall, or do higher Speed values come at the cost of other stats such as HP and Defense?

Q2: Have Pokémon become stronger over time. i.e do newer generations exhibit higher total base stats than earlier generations?

Q3: Are dual-type Pokémon statistically stronger than single-type Pokémon based on their total base stat scores?

Together, these questions help describe which characteristics are most associated with “strong” Pokémon and whether that has changed across generations.

Data, Licensing, and Ethics

The data for this project comes from the Kaggle dataset “The Complete Pokémon Dataset (Gen 1–6)” (pokemon.csv). Each row corresponds to a single Pokémon and includes its name, Pokédex number, generation, whether it is Legendary, its primary and secondary type, and all six base stats.

Unlike the original Pokédex, this dataset includes:

  • Base species
  • Alternate forms
  • Mega Evolutions
  • Primal forms

These additional forms inflate the dataset beyond the canonical 721 species from Generations I–VI.

Because the goal of this project is to analyze standard Pokémon design, I removed:

  • Mega Evolutions
  • Primal Forms

Key points:

Source: Kaggle – “The Complete Pokémon Dataset (Gen 1–6)” (pokemon.csv). https://www.kaggle.com/datasets/abcsds/pokemon

License: The dataset is released under the MIT License, which is a permissive open-source license. It allows reuse, modification, and redistribution for academic and non-commercial purposes as long as the original source is credited.

Because this dataset consists entirely of non real characters, there are no privacy or personal data concerns. The dataset is provided directly as a downloadable CSV file on Kaggle so there are no terms-of-service violations. Overall, the licensing is clear and the dataset is safe to use.

Data Import, Cleaning, and Quality Checks

I store the CSV file in a data folder at the root of the project and use read_csv() to import it. I then use janitor’s clean_names() function to convert column names to snake_case, which makes code cleaner and more reproducible

library(tidyverse)
library(janitor)
pokemon <- read_csv("data copy/pokemon.csv") %>% 
  clean_names() %>% 
  filter(!str_detect(name,"Mega")) %>% 
  filter(!str_detect(name,"Primal")) %>% 
  filter(!str_detect(name,"-")) %>% 
  mutate(
    generation = factor(generation),
    dual_type = if_else(is.na(type_2), "Single-Type", "Dual-Type"),
    dual_type = factor(dual_type),
    legendary = factor(legendary)
  )

glimpse(pokemon)
Rows: 747
Columns: 14
$ number     <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
$ name       <chr> "Bulbasaur", "Ivysaur", "Venusaur", "Charmander", "Charmele…
$ type_1     <chr> "Grass", "Grass", "Grass", "Fire", "Fire", "Fire", "Water",…
$ type_2     <chr> "Poison", "Poison", "Poison", NA, NA, "Flying", NA, NA, NA,…
$ total      <dbl> 318, 405, 525, 309, 405, 534, 314, 405, 530, 195, 205, 395,…
$ hp         <dbl> 45, 60, 80, 39, 58, 78, 44, 59, 79, 45, 50, 60, 40, 45, 65,…
$ attack     <dbl> 49, 62, 82, 52, 64, 84, 48, 63, 83, 30, 20, 45, 35, 25, 90,…
$ defense    <dbl> 49, 63, 83, 43, 58, 78, 65, 80, 100, 35, 55, 50, 30, 50, 40…
$ sp_atk     <dbl> 65, 80, 100, 60, 80, 109, 50, 65, 85, 20, 25, 90, 20, 25, 4…
$ sp_def     <dbl> 65, 80, 100, 50, 65, 85, 64, 80, 105, 20, 25, 80, 20, 25, 8…
$ speed      <dbl> 45, 60, 80, 65, 80, 100, 43, 58, 78, 45, 30, 70, 50, 35, 75…
$ generation <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ legendary  <fct> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ dual_type  <fct> Dual-Type, Dual-Type, Dual-Type, Single-Type, Single-Type, …

After cleaning, the main variables used in this analysis are:

name: Pokémon name

number: Pokédex number

type_1, type_2: primary and secondary types

hp, attack, defense, sp_atk, sp_def, speed: base stats

total: sum of all six base stats

generation: generation (1–6)

dual_type: Single-type vs “Dual-type

legendary: TRUE/FALSE indicator

Data quality checks

To evaluate the quality of the data, I check the overall dimensions of the dataset, missing values, duplicate rows, and ranges of stats.

# Dimensions
dim(pokemon)
[1] 747  14
#Missing values
pokemon %>% summarise(across(everything(), ~sum(is.na(.))))
# A tibble: 1 × 14
  number  name type_1 type_2 total    hp attack defense sp_atk sp_def speed
   <int> <int>  <int>  <int> <int> <int>  <int>   <int>  <int>  <int> <int>
1      0     0      0    374     0     0      0       0      0      0     0
# ℹ 3 more variables: generation <int>, legendary <int>, dual_type <int>
#Duplicated rows
sum(duplicated(pokemon))
[1] 0
#Summary of stat distributions
pokemon %>% select(total, hp, attack, defense, sp_atk, sp_def, speed) %>% summary()
     total             hp             attack          defense      
 Min.   :180.0   Min.   :  1.00   Min.   :  5.00   Min.   :  5.00  
 1st Qu.:325.0   1st Qu.: 50.00   1st Qu.: 55.00   1st Qu.: 50.00  
 Median :430.0   Median : 65.00   Median : 74.00   Median : 67.00  
 Mean   :421.9   Mean   : 68.51   Mean   : 75.66   Mean   : 71.58  
 3rd Qu.:500.0   3rd Qu.: 80.00   3rd Qu.: 95.00   3rd Qu.: 89.50  
 Max.   :720.0   Max.   :255.00   Max.   :180.00   Max.   :230.00  
     sp_atk           sp_def           speed       
 Min.   : 10.00   Min.   : 20.00   Min.   :  5.00  
 1st Qu.: 45.00   1st Qu.: 50.00   1st Qu.: 45.00  
 Median : 65.00   Median : 65.00   Median : 65.00  
 Mean   : 69.82   Mean   : 69.84   Mean   : 66.48  
 3rd Qu.: 90.00   3rd Qu.: 85.00   3rd Qu.: 86.00  
 Max.   :180.00   Max.   :230.00   Max.   :180.00  
  • The dataset contains 747 rows and the expected 14 columns. The 747 is also expected because the orignal data set was 800, and we removed Mega Evolutions and Primal Forms.

  • The only real missing values appears in type_2, which is expected because many Pokémon are not dual-type.

  • There are no duplicated rows, which suggests that each Pokémon only appears once.

  • The ranges of the base stats are consistent with typical Pokémon design and do not show any obvious data entry errors. (I double checked some of the “outlier stats” with the official Pokémon index to ensure all values were correct)

Overall, the dataset is already very clean. My main cleaning steps were renaming columns with clean_names(), removing the Mega and Primal version of Pokémon and creating the derived factor variables generation, dual_type, and legendary.

Question 1

Is Speed Correlated with Total Base Stats?

Speed is often considered one of the most important battle stats, but it may come at the cost of HP or defense. This question explores whether faster Pokémon are designed to be stronger overall.

scatter <- pokemon %>% 
  ggplot(aes(x = speed, y = total)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    x = "Speed",
    y = "Total Base Stats",
    title = "Speed vs Total Base Stats"
  )
scatter

ggsave("figs copy/rq1_speed_vs_total.png", scatter,
       width = 7, height = 5, dpi = 300)

The scatterplot shows each Pokémon’s Speed and Total base stats, along with a fitted trend line.

Correlation Between Speed and Total

cor_speed_total <- cor(pokemon$speed, pokemon$total)
q1_cor_test <- cor.test(pokemon$speed, pokemon$total)

cor_speed_total
[1] 0.5623069
q1_cor_test

    Pearson's product-moment correlation

data:  pokemon$speed and pokemon$total
t = 18.56, df = 745, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5111938 0.6094564
sample estimates:
      cor 
0.5623069 

Speed is positively correlated with Total base stats as seen by the value of 0.56. Correlation coefficients fall between -1 and 1, so any positive value indicates a positive correlation. This relationship is statistically significant as the test reports an extremly small p-value. On average, faster Pokémon tend to have higher total stats, although the scatterplot shows that there is still much variation.

Correlation Between Speed and Other Stats

speed_cor_table <- pokemon %>% 
  summarise(
    HP = cor(speed,hp),
    Attack = cor(speed,attack),
    Defense = cor(speed,defense),
    Sp.Atk = cor(speed,sp_atk),
    Sp.De = cor(speed,sp_def)
     ) %>% 
  pivot_longer(everything(), names_to = "stat", values_to = "correlation")

speed_cor_table
# A tibble: 5 × 2
  stat    correlation
  <chr>         <dbl>
1 HP          0.169  
2 Attack      0.347  
3 Defense     0.00436
4 Sp.Atk      0.456  
5 Sp.De       0.235  

With individual stats we notice that speed correlation is much lower for HP and Defense stats, while is higher on average for Attack stats. This pattern supports the idea that many fast Pokémon are designed to be offensive threats rather than tanky walls.

Question 2

Have Pokémon Become Stronger Across Generations?

If Pokémon have experienced “power creep,” we would expect later generations to have higher total base stats on average. To explore this, I compared total base stats across Generations I–VI and analyzed how each individual stat has changed over time.

violin <- pokemon %>%  
  ggplot(aes(x = generation, y = total)) +
  geom_violin(fill = "grey") +
  labs(
    title = "Total Base Stats Across Generations",
    x = "Generation", 
    y = "Total Base Stats"
  )

violin

ggsave("figs copy/Violin_Total_Base_Stats_Across_Generations.png", violin,
       width = 7, height = 5, dpi = 300)
boxp <- pokemon %>%  
  ggplot(aes(x = generation, y = total)) +
  geom_boxplot(width = 0.2) +
  labs(
    title = "Total Base Stats Across Generations",
    x = "Generation", 
    y = "Total Base Stats"
  )

boxp

ggsave("figs copy/Box_Total_Base_Stats_Across_Generations.png", boxp,
       width = 7, height = 5, dpi = 300)

The violin and box plots shows how Total base stats vary across generation. The violin plot shows a full distribution of stats where the wider section show ranges containing more Pokémon. Later generations, especially Generations V and VI have noticeably wider areas in higher total stat ranges.

The boxplot highlights the median and interquartile range. The gradual upward shift in medians and upper quartiles from earlier to later generations indicates that the typical Pokémon has become slightly stronger over time. Although strong and weak Pokémon exist in every generation, later generations contain a higher concentration of Pokémon with higher Total base stats. This supports the idea of mild power creep across Generations I–VI.

Mean Stats by Generation

mean_stats <- pokemon %>% 
group_by(generation) %>% 
summarise(
mean_hp = mean(hp),
mean_attack = mean(attack),
mean_defense = mean(defense),
mean_sp_atk = mean(sp_atk),
mean_sp_def = mean(sp_def),
mean_speed = mean(speed),
mean_total = mean(total)
)

mean_stats
# A tibble: 6 × 8
  generation mean_hp mean_attack mean_defense mean_sp_atk mean_sp_def mean_speed
  <fct>        <dbl>       <dbl>        <dbl>       <dbl>       <dbl>      <dbl>
1 1             64.2        72.5         68.2        67.1        66.0       68.9
2 2             70.1        67.5         69.0        63.6        71.0       60.9
3 3             65.0        74.0         69.1        68.5        66.8       63.0
4 4             72.7        80.0         77.3        74.8        76.4       70.1
5 5             71.6        82.2         72          71.9        68.4       68.2
6 6             68.5        74.8         76.3        73.2        74.5       65.9
# ℹ 1 more variable: mean_total <dbl>
line <- mean_stats %>%  
  select(-mean_total) %>% 
  pivot_longer(
  cols = -generation,
  names_to = "stat",
  values_to = "mean_value"
) %>% 
  ggplot(aes(x = generation, y = mean_value, group = stat, color = stat)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2) +
  labs(
    title = "Change in Average Base Stats Across Generations",
    x = "Generation", 
    y = "Mean stat value",
    color = "Stat"
  )

line

ggsave("figs copy/Average_base_stats.png", line,
       width = 7, height = 5, dpi = 300)

The mean stat table and line graph together show how individual base stats have changed across generations. The table shows that offensive stats (Attack and Special Attack) tend to increase more noticeably from Generation I to VI, while defensive stats (HP, Defense, and Special Defense) rise more gradually, and Speed remains relatively stable. With the line graph we can visually notice the gradual positive increase of defensive stats and the more volatile changes in attack stats. This visualization also highlights certain interesting trends such as generation 2 having an extremely low average speed and generation 5 having an unusually high mean attack.

Together, these patterns suggest that Pokémon have become slightly stronger across generations, primarily through gradual increases in offensive power rather than through changes to overall stat distribution.

Question 3

Are Dual-Type Pokémon Stronger Than Single-Type Pokémon?

Dual typing can give Pokémon more coverage and defensive options. This question tests whether dual-type Pokémon also tend to have higher total stats.

boxp2 <- pokemon %>% 
  ggplot(aes(x = dual_type, y = total, fill = dual_type)) +
  geom_boxplot(show.legend = FALSE) +
  labs(
    title = "Single Type vs Dual Type Pokémon",
    x = "Type Catagory",
    y = "Tota base stats"
  )

boxp2

ggsave("figs copy/single_duel_type.png", boxp2,
       width = 7, height = 5, dpi = 300)
pokemon %>%  
  group_by(dual_type) %>% 
  summarise(
    mean_total = mean(total),
    median_total = median(total),
    n = n()
  )
# A tibble: 2 × 4
  dual_type   mean_total median_total     n
  <fct>            <dbl>        <dbl> <int>
1 Dual-Type         438.          460   373
2 Single-Type       406.          405   374

The boxplot shows that dual-type Pokémon tend to have higher median total stats than single-type Pokémon. The upper quartile is also noticeably higher for dual-types, suggesting that strong Pokémon are more common among them. The overall shift upward in the distribution indicates that dual-type Pokémon are generally designed with higher overall stat totals.

means <- pokemon %>%
  group_by(dual_type) %>%
  summarise(mean_total = mean(total))

percent_difference <- (means$mean_total[means$dual_type == "Dual-Type"] - means$mean_total[means$dual_type == "Single-Type"]) / means$mean_total[means$dual_type == "Single-Type"] * 100
percent_difference
[1] 8.023173
type_strength <- pokemon %>%
  mutate(type_combo = if_else(is.na(type_2), type_1, paste(type_1, type_2, sep = "/"))) %>%
  group_by(type_combo) %>%
  summarise(
    mean_total = mean(total),
    n = n()
  ) %>%
  arrange(desc(mean_total))

type_strength
# A tibble: 149 × 3
   type_combo      mean_total     n
   <chr>                <dbl> <int>
 1 Dragon/Ice            687.     3
 2 Dragon/Electric       680      1
 3 Dragon/Fire           680      1
 4 Ghost/Dragon          680      2
 5 Psychic/Dark          680      1
 6 Steel/Dragon          680      1
 7 Water/Dragon          610      2
 8 Dragon/Psychic        600      2
 9 Fire/Steel            600      1
10 Fire/Water            600      1
# ℹ 139 more rows
offense <- pokemon %>%
  mutate(type_combo = if_else(is.na(type_2), type_1, paste(type_1, type_2, sep = "/"))) %>%
  group_by(type_combo) %>%
  summarise(
    mean_attack = mean(attack),
    mean_sp_atk = mean(sp_atk),
    offense_score = mean_attack + mean_sp_atk,
    n = n()
  ) %>%
  arrange(desc(offense_score))

offense
# A tibble: 149 × 5
   type_combo      mean_attack mean_sp_atk offense_score     n
   <chr>                 <dbl>       <dbl>         <dbl> <int>
 1 Psychic/Dark           160         170           330      1
 2 Dragon/Ice             140         140           280      3
 3 Dragon/Electric        150         120           270      1
 4 Dragon/Fire            120         150           270      1
 5 Steel/Dragon           120         150           270      1
 6 Psychic/Ghost          110         150           260      1
 7 Fire/Water             110         130           240      1
 8 Water/Dragon           108.        122.          230      2
 9 Dragon/Flying          122.        108.          230.     4
10 Rock/Dark              134          95           229      1
# ℹ 139 more rows
defense <- pokemon %>%
  mutate(type_combo = if_else(is.na(type_2), type_1, paste(type_1, type_2, sep = "/"))) %>%
  group_by(type_combo) %>%
  summarise(
    mean_hp = mean(hp),
    mean_defense = mean(defense),
    mean_sp_def = mean(sp_def),
    defense_score = mean_hp + mean_defense + mean_sp_def,
    n = n()
  ) %>%
  arrange(desc(defense_score))

defense
# A tibble: 149 × 6
   type_combo      mean_hp mean_defense mean_sp_def defense_score     n
   <chr>             <dbl>        <dbl>       <dbl>         <dbl> <int>
 1 Ghost/Dragon      150          110         110            370      2
 2 Rock/Fairy         50          150         150            350      2
 3 Steel/Ground       75          200          65            340      1
 4 Dragon/Electric   100          120         100            320      1
 5 Dragon/Fire       100          100         120            320      1
 6 Steel/Dragon      100          120         100            320      1
 7 Rock/Steel         50          144.        125.           319      3
 8 Dragon/Ice        125           93.3        93.3          312.     3
 9 Rock/Dark         100          110         100            310      1
10 Bug/Rock           46.7        147.        113.           307.     3
# ℹ 139 more rows

On average, dual-type Pokémon have approximately 8% higher Total base stats than single-type Pokémon. Also interestingly enough 7 of the 8 most powerful 2 type combinations all include dragon type, with the most powerful combination being Dragon / Ice. The least powerful combination appears to be Bug / Ghost, with bug type being present in 5 of the 8 least powerful 2 type combinations.

I also decided to find the best combination of types both offensively and defensively by combining the mean attack/sp.atk and defense/sp.def. The most offensively powerful combo is Psychic/Dark and the most defensively powerful combo is Ghost/Dragon.

Conclusion

This project used a cleaned dataset of 747 Pokémon from Generations I–VI to explore how Speed, generation, and typing relate to overall Pokémon strength. First, Speed showed a positive relationship with total base stats, indicating that faster Pokémon are often given higher offensive or overall stat totals to support their roles as sweepers or high-tempo attackers. Second, generational comparison showed modest but noticeable power creep: later generations contain slightly stronger Pokémon on average, driven primarily by gradual increases in offensive stats rather than inflation across all categories. Finally, dual-type Pokémon were shown to be consistently stronger than single-type, with significantly higher total base stats.

Together, these findings demonstrate that Pokémon stat distributions are not at all random; they follow predictable design patterns shaped by gameplay balance, different roles in your team, and the changing of each game generation to keep the game fresh.