library(tidyverse)
library(janitor)Pokémon Stats: What Makes a Strong Pokémon?
Introduction
In the Pokémon games, each Pokémon is assigned a set of base stats that show how strong it is designed to be in battle. These stats include HP, Attack, Defense, Special Attack, Special Defense, and Speed. Game designers can use these stats to create glass cannons, bulky walls, or fast utility Pokémon.
In this project I focus on Pokémon from Generations I–VI (747 Pokémon total) and treat the Total base stat score (added all six stats together) as a measure of overall strength. I use this dataset to answer three questions:
Q1: Is there a correlation between a Pokémon’s Speed and its total base stat score? Do faster Pokémon tend to be stronger overall, or do higher Speed values come at the cost of other stats such as HP and Defense?
Q2: Have Pokémon become stronger over time. i.e do newer generations exhibit higher total base stats than earlier generations?
Q3: Are dual-type Pokémon statistically stronger than single-type Pokémon based on their total base stat scores?
Together, these questions help describe which characteristics are most associated with “strong” Pokémon and whether that has changed across generations.
Data, Licensing, and Ethics
The data for this project comes from the Kaggle dataset “The Complete Pokémon Dataset (Gen 1–6)” (pokemon.csv). Each row corresponds to a single Pokémon and includes its name, Pokédex number, generation, whether it is Legendary, its primary and secondary type, and all six base stats.
Unlike the original Pokédex, this dataset includes:
- Base species
- Alternate forms
- Mega Evolutions
- Primal forms
These additional forms inflate the dataset beyond the canonical 721 species from Generations I–VI.
Because the goal of this project is to analyze standard Pokémon design, I removed:
- Mega Evolutions
- Primal Forms
Key points:
Source: Kaggle – “The Complete Pokémon Dataset (Gen 1–6)” (pokemon.csv). https://www.kaggle.com/datasets/abcsds/pokemon
License: The dataset is released under the MIT License, which is a permissive open-source license. It allows reuse, modification, and redistribution for academic and non-commercial purposes as long as the original source is credited.
Because this dataset consists entirely of non real characters, there are no privacy or personal data concerns. The dataset is provided directly as a downloadable CSV file on Kaggle so there are no terms-of-service violations. Overall, the licensing is clear and the dataset is safe to use.
Data Import, Cleaning, and Quality Checks
I store the CSV file in a data folder at the root of the project and use read_csv() to import it. I then use janitor’s clean_names() function to convert column names to snake_case, which makes code cleaner and more reproducible
pokemon <- read_csv("data copy/pokemon.csv") %>%
clean_names() %>%
filter(!str_detect(name,"Mega")) %>%
filter(!str_detect(name,"Primal")) %>%
filter(!str_detect(name,"-")) %>%
mutate(
generation = factor(generation),
dual_type = if_else(is.na(type_2), "Single-Type", "Dual-Type"),
dual_type = factor(dual_type),
legendary = factor(legendary)
)
glimpse(pokemon)Rows: 747
Columns: 14
$ number <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
$ name <chr> "Bulbasaur", "Ivysaur", "Venusaur", "Charmander", "Charmele…
$ type_1 <chr> "Grass", "Grass", "Grass", "Fire", "Fire", "Fire", "Water",…
$ type_2 <chr> "Poison", "Poison", "Poison", NA, NA, "Flying", NA, NA, NA,…
$ total <dbl> 318, 405, 525, 309, 405, 534, 314, 405, 530, 195, 205, 395,…
$ hp <dbl> 45, 60, 80, 39, 58, 78, 44, 59, 79, 45, 50, 60, 40, 45, 65,…
$ attack <dbl> 49, 62, 82, 52, 64, 84, 48, 63, 83, 30, 20, 45, 35, 25, 90,…
$ defense <dbl> 49, 63, 83, 43, 58, 78, 65, 80, 100, 35, 55, 50, 30, 50, 40…
$ sp_atk <dbl> 65, 80, 100, 60, 80, 109, 50, 65, 85, 20, 25, 90, 20, 25, 4…
$ sp_def <dbl> 65, 80, 100, 50, 65, 85, 64, 80, 105, 20, 25, 80, 20, 25, 8…
$ speed <dbl> 45, 60, 80, 65, 80, 100, 43, 58, 78, 45, 30, 70, 50, 35, 75…
$ generation <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ legendary <fct> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ dual_type <fct> Dual-Type, Dual-Type, Dual-Type, Single-Type, Single-Type, …
After cleaning, the main variables used in this analysis are:
name: Pokémon name
number: Pokédex number
type_1, type_2: primary and secondary types
hp, attack, defense, sp_atk, sp_def, speed: base stats
total: sum of all six base stats
generation: generation (1–6)
dual_type: Single-type vs “Dual-type
legendary: TRUE/FALSE indicator
Data quality checks
To evaluate the quality of the data, I check the overall dimensions of the dataset, missing values, duplicate rows, and ranges of stats.
# Dimensions
dim(pokemon)[1] 747 14
#Missing values
pokemon %>% summarise(across(everything(), ~sum(is.na(.))))# A tibble: 1 × 14
number name type_1 type_2 total hp attack defense sp_atk sp_def speed
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 0 0 0 374 0 0 0 0 0 0 0
# ℹ 3 more variables: generation <int>, legendary <int>, dual_type <int>
#Duplicated rows
sum(duplicated(pokemon))[1] 0
#Summary of stat distributions
pokemon %>% select(total, hp, attack, defense, sp_atk, sp_def, speed) %>% summary() total hp attack defense
Min. :180.0 Min. : 1.00 Min. : 5.00 Min. : 5.00
1st Qu.:325.0 1st Qu.: 50.00 1st Qu.: 55.00 1st Qu.: 50.00
Median :430.0 Median : 65.00 Median : 74.00 Median : 67.00
Mean :421.9 Mean : 68.51 Mean : 75.66 Mean : 71.58
3rd Qu.:500.0 3rd Qu.: 80.00 3rd Qu.: 95.00 3rd Qu.: 89.50
Max. :720.0 Max. :255.00 Max. :180.00 Max. :230.00
sp_atk sp_def speed
Min. : 10.00 Min. : 20.00 Min. : 5.00
1st Qu.: 45.00 1st Qu.: 50.00 1st Qu.: 45.00
Median : 65.00 Median : 65.00 Median : 65.00
Mean : 69.82 Mean : 69.84 Mean : 66.48
3rd Qu.: 90.00 3rd Qu.: 85.00 3rd Qu.: 86.00
Max. :180.00 Max. :230.00 Max. :180.00
The dataset contains 747 rows and the expected 14 columns. The 747 is also expected because the orignal data set was 800, and we removed Mega Evolutions and Primal Forms.
The only real missing values appears in type_2, which is expected because many Pokémon are not dual-type.
There are no duplicated rows, which suggests that each Pokémon only appears once.
The ranges of the base stats are consistent with typical Pokémon design and do not show any obvious data entry errors. (I double checked some of the “outlier stats” with the official Pokémon index to ensure all values were correct)
Overall, the dataset is already very clean. My main cleaning steps were renaming columns with clean_names(), removing the Mega and Primal version of Pokémon and creating the derived factor variables generation, dual_type, and legendary.
Question 1
Correlation Between Speed and Total
cor_speed_total <- cor(pokemon$speed, pokemon$total)
q1_cor_test <- cor.test(pokemon$speed, pokemon$total)
cor_speed_total[1] 0.5623069
q1_cor_test
Pearson's product-moment correlation
data: pokemon$speed and pokemon$total
t = 18.56, df = 745, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.5111938 0.6094564
sample estimates:
cor
0.5623069
Speed is positively correlated with Total base stats as seen by the value of 0.56. Correlation coefficients fall between -1 and 1, so any positive value indicates a positive correlation. This relationship is statistically significant as the test reports an extremly small p-value. On average, faster Pokémon tend to have higher total stats, although the scatterplot shows that there is still much variation.
Correlation Between Speed and Other Stats
speed_cor_table <- pokemon %>%
summarise(
HP = cor(speed,hp),
Attack = cor(speed,attack),
Defense = cor(speed,defense),
Sp.Atk = cor(speed,sp_atk),
Sp.De = cor(speed,sp_def)
) %>%
pivot_longer(everything(), names_to = "stat", values_to = "correlation")
speed_cor_table# A tibble: 5 × 2
stat correlation
<chr> <dbl>
1 HP 0.169
2 Attack 0.347
3 Defense 0.00436
4 Sp.Atk 0.456
5 Sp.De 0.235
With individual stats we notice that speed correlation is much lower for HP and Defense stats, while is higher on average for Attack stats. This pattern supports the idea that many fast Pokémon are designed to be offensive threats rather than tanky walls.
Question 2
Have Pokémon Become Stronger Across Generations?
If Pokémon have experienced “power creep,” we would expect later generations to have higher total base stats on average. To explore this, I compared total base stats across Generations I–VI and analyzed how each individual stat has changed over time.
violin <- pokemon %>%
ggplot(aes(x = generation, y = total)) +
geom_violin(fill = "grey") +
labs(
title = "Total Base Stats Across Generations",
x = "Generation",
y = "Total Base Stats"
)
violinggsave("figs copy/Violin_Total_Base_Stats_Across_Generations.png", violin,
width = 7, height = 5, dpi = 300)boxp <- pokemon %>%
ggplot(aes(x = generation, y = total)) +
geom_boxplot(width = 0.2) +
labs(
title = "Total Base Stats Across Generations",
x = "Generation",
y = "Total Base Stats"
)
boxpggsave("figs copy/Box_Total_Base_Stats_Across_Generations.png", boxp,
width = 7, height = 5, dpi = 300)The violin and box plots shows how Total base stats vary across generation. The violin plot shows a full distribution of stats where the wider section show ranges containing more Pokémon. Later generations, especially Generations V and VI have noticeably wider areas in higher total stat ranges.
The boxplot highlights the median and interquartile range. The gradual upward shift in medians and upper quartiles from earlier to later generations indicates that the typical Pokémon has become slightly stronger over time. Although strong and weak Pokémon exist in every generation, later generations contain a higher concentration of Pokémon with higher Total base stats. This supports the idea of mild power creep across Generations I–VI.
Mean Stats by Generation
mean_stats <- pokemon %>%
group_by(generation) %>%
summarise(
mean_hp = mean(hp),
mean_attack = mean(attack),
mean_defense = mean(defense),
mean_sp_atk = mean(sp_atk),
mean_sp_def = mean(sp_def),
mean_speed = mean(speed),
mean_total = mean(total)
)
mean_stats# A tibble: 6 × 8
generation mean_hp mean_attack mean_defense mean_sp_atk mean_sp_def mean_speed
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 64.2 72.5 68.2 67.1 66.0 68.9
2 2 70.1 67.5 69.0 63.6 71.0 60.9
3 3 65.0 74.0 69.1 68.5 66.8 63.0
4 4 72.7 80.0 77.3 74.8 76.4 70.1
5 5 71.6 82.2 72 71.9 68.4 68.2
6 6 68.5 74.8 76.3 73.2 74.5 65.9
# ℹ 1 more variable: mean_total <dbl>
line <- mean_stats %>%
select(-mean_total) %>%
pivot_longer(
cols = -generation,
names_to = "stat",
values_to = "mean_value"
) %>%
ggplot(aes(x = generation, y = mean_value, group = stat, color = stat)) +
geom_line(linewidth = 1.2) +
geom_point(size = 2) +
labs(
title = "Change in Average Base Stats Across Generations",
x = "Generation",
y = "Mean stat value",
color = "Stat"
)
lineggsave("figs copy/Average_base_stats.png", line,
width = 7, height = 5, dpi = 300)The mean stat table and line graph together show how individual base stats have changed across generations. The table shows that offensive stats (Attack and Special Attack) tend to increase more noticeably from Generation I to VI, while defensive stats (HP, Defense, and Special Defense) rise more gradually, and Speed remains relatively stable. With the line graph we can visually notice the gradual positive increase of defensive stats and the more volatile changes in attack stats. This visualization also highlights certain interesting trends such as generation 2 having an extremely low average speed and generation 5 having an unusually high mean attack.
Together, these patterns suggest that Pokémon have become slightly stronger across generations, primarily through gradual increases in offensive power rather than through changes to overall stat distribution.
Question 3
Are Dual-Type Pokémon Stronger Than Single-Type Pokémon?
Dual typing can give Pokémon more coverage and defensive options. This question tests whether dual-type Pokémon also tend to have higher total stats.
boxp2 <- pokemon %>%
ggplot(aes(x = dual_type, y = total, fill = dual_type)) +
geom_boxplot(show.legend = FALSE) +
labs(
title = "Single Type vs Dual Type Pokémon",
x = "Type Catagory",
y = "Tota base stats"
)
boxp2ggsave("figs copy/single_duel_type.png", boxp2,
width = 7, height = 5, dpi = 300)pokemon %>%
group_by(dual_type) %>%
summarise(
mean_total = mean(total),
median_total = median(total),
n = n()
)# A tibble: 2 × 4
dual_type mean_total median_total n
<fct> <dbl> <dbl> <int>
1 Dual-Type 438. 460 373
2 Single-Type 406. 405 374
The boxplot shows that dual-type Pokémon tend to have higher median total stats than single-type Pokémon. The upper quartile is also noticeably higher for dual-types, suggesting that strong Pokémon are more common among them. The overall shift upward in the distribution indicates that dual-type Pokémon are generally designed with higher overall stat totals.
means <- pokemon %>%
group_by(dual_type) %>%
summarise(mean_total = mean(total))
percent_difference <- (means$mean_total[means$dual_type == "Dual-Type"] - means$mean_total[means$dual_type == "Single-Type"]) / means$mean_total[means$dual_type == "Single-Type"] * 100
percent_difference[1] 8.023173
type_strength <- pokemon %>%
mutate(type_combo = if_else(is.na(type_2), type_1, paste(type_1, type_2, sep = "/"))) %>%
group_by(type_combo) %>%
summarise(
mean_total = mean(total),
n = n()
) %>%
arrange(desc(mean_total))
type_strength# A tibble: 149 × 3
type_combo mean_total n
<chr> <dbl> <int>
1 Dragon/Ice 687. 3
2 Dragon/Electric 680 1
3 Dragon/Fire 680 1
4 Ghost/Dragon 680 2
5 Psychic/Dark 680 1
6 Steel/Dragon 680 1
7 Water/Dragon 610 2
8 Dragon/Psychic 600 2
9 Fire/Steel 600 1
10 Fire/Water 600 1
# ℹ 139 more rows
offense <- pokemon %>%
mutate(type_combo = if_else(is.na(type_2), type_1, paste(type_1, type_2, sep = "/"))) %>%
group_by(type_combo) %>%
summarise(
mean_attack = mean(attack),
mean_sp_atk = mean(sp_atk),
offense_score = mean_attack + mean_sp_atk,
n = n()
) %>%
arrange(desc(offense_score))
offense# A tibble: 149 × 5
type_combo mean_attack mean_sp_atk offense_score n
<chr> <dbl> <dbl> <dbl> <int>
1 Psychic/Dark 160 170 330 1
2 Dragon/Ice 140 140 280 3
3 Dragon/Electric 150 120 270 1
4 Dragon/Fire 120 150 270 1
5 Steel/Dragon 120 150 270 1
6 Psychic/Ghost 110 150 260 1
7 Fire/Water 110 130 240 1
8 Water/Dragon 108. 122. 230 2
9 Dragon/Flying 122. 108. 230. 4
10 Rock/Dark 134 95 229 1
# ℹ 139 more rows
defense <- pokemon %>%
mutate(type_combo = if_else(is.na(type_2), type_1, paste(type_1, type_2, sep = "/"))) %>%
group_by(type_combo) %>%
summarise(
mean_hp = mean(hp),
mean_defense = mean(defense),
mean_sp_def = mean(sp_def),
defense_score = mean_hp + mean_defense + mean_sp_def,
n = n()
) %>%
arrange(desc(defense_score))
defense# A tibble: 149 × 6
type_combo mean_hp mean_defense mean_sp_def defense_score n
<chr> <dbl> <dbl> <dbl> <dbl> <int>
1 Ghost/Dragon 150 110 110 370 2
2 Rock/Fairy 50 150 150 350 2
3 Steel/Ground 75 200 65 340 1
4 Dragon/Electric 100 120 100 320 1
5 Dragon/Fire 100 100 120 320 1
6 Steel/Dragon 100 120 100 320 1
7 Rock/Steel 50 144. 125. 319 3
8 Dragon/Ice 125 93.3 93.3 312. 3
9 Rock/Dark 100 110 100 310 1
10 Bug/Rock 46.7 147. 113. 307. 3
# ℹ 139 more rows
On average, dual-type Pokémon have approximately 8% higher Total base stats than single-type Pokémon. Also interestingly enough 7 of the 8 most powerful 2 type combinations all include dragon type, with the most powerful combination being Dragon / Ice. The least powerful combination appears to be Bug / Ghost, with bug type being present in 5 of the 8 least powerful 2 type combinations.
I also decided to find the best combination of types both offensively and defensively by combining the mean attack/sp.atk and defense/sp.def. The most offensively powerful combo is Psychic/Dark and the most defensively powerful combo is Ghost/Dragon.
Conclusion
This project used a cleaned dataset of 747 Pokémon from Generations I–VI to explore how Speed, generation, and typing relate to overall Pokémon strength. First, Speed showed a positive relationship with total base stats, indicating that faster Pokémon are often given higher offensive or overall stat totals to support their roles as sweepers or high-tempo attackers. Second, generational comparison showed modest but noticeable power creep: later generations contain slightly stronger Pokémon on average, driven primarily by gradual increases in offensive stats rather than inflation across all categories. Finally, dual-type Pokémon were shown to be consistently stronger than single-type, with significantly higher total base stats.
Together, these findings demonstrate that Pokémon stat distributions are not at all random; they follow predictable design patterns shaped by gameplay balance, different roles in your team, and the changing of each game generation to keep the game fresh.