# A tibble: 115 × 8
Pokemon National_No Type Height Weight Catch_rate Egg_Groups Abilities
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Hitmonlee #0106 Fighting… "4'11… 109.8… "Catch ra… "Hatch ti… Limber /…
2 Cubone #0104 Ground /… "1'04… 14.3 … "Catch ra… "Hatch ti… Rock Hea…
3 Exeggcute #0102 Grass / … "1'04… 5.5 l… "Catch ra… "Hatch ti… Chloroph…
4 Voltorb #0100 Electric… "1'08… 22.9 … "Catch ra… "Hatch ti… Soundpro…
5 Krabby #0098 Water / … "1'04… 14.3 … "Catch ra… "Hatch ti… Hyper Cu…
6 Drowzee #0096 Psychic … "3'03… 71.4 … "Catch ra… "Hatch ti… Insomnia…
7 Onix #0095 Rock / G… "28'1… 463.0… "Catch ra… "Hatch ti… Rock Hea…
8 Gastly #0092 Ghost / … "4'03… 0.2 l… "Catch ra… "Hatch ti… Levitate…
9 Shellder #0090 Water / … "1'00… 8.8 l… "Catch ra… "Hatch ti… Shell Ar…
10 Grimer #0088 Poison /… "2'11… 66.1 … "Catch ra… "Hatch ti… Stench /…
# ℹ 105 more rows
Web Scraping and Analysis of Pokemon Data
Introduction and Question
I am a big fan of Pokemon, and I want to analyze the various things that affect a Pokemon’s stats as well as how their stats have changed over time. I would like to know more about the trends in attributes like height, weight, and catch rate based on the type of Pokemon. There are a lot of Pokemon types, and I have always wondered if certain types of Pokemon like grass types happen to have better catch rates or have less weight. Through my analysis of different attributes, Pokemon enthusiasts can inform themselves about the Pokemon types so that if they are playing the video game, they can make better decisions on which Pokemon to catch, and which Pokemon to fight in order to train their own Pokemon. For example, some Pokemon moves do more damage against taller Pokemon. When trying to decide if you want to teach a move like that to your Pokemon, it would be useful to know if you are going to be fighting a lot of tall Pokemon. If your Pokemon usually fights fire types and this analysis shows that fire types are tall, then you would be more likely to teach the move to your Pokemon. This analysis will also be useful to people who are just simply curious as to some of the main differences between Pokemon types. Overall, understanding how different attributes affect Pokemon stats and seeing how the stats have changed over time is important to predict how the next generation of Pokemon games will turn out, while also informing the current players so that they can play in a smart manner. I hope you enjoy my analysis!
Method
I intend to answer my question about Pokemon types and attributes by obtaining some data for around 100 Pokemon with stats about their types, their abilities, and things like their height, weight, and catch rate. I will then create a series of visualizations to compare Pokemon across categories like type to see how they differ in their stats. To get the data, I web-scraped a series of different Bulbapedia pages. Bulbapedia is a wiki for Pokemon information. Here is an example Bulbapedia page for the Pokemon called Aggron: Aggron_Bulbapedia. As you can see, the table on the right hand side of the screen for Aggron has the data we are interested in like type, abilities, and height, so I used web-scraping to find those elements and grab the text from them.
In order to scrape data for over 100 Pokemon, I had to use a loop that would programmatically scrape each of the 100 different webpages. In order to do this, each iteration of the loop would follow these steps:
Scrape the data about the fields we are after (name, type/s, catch rate, height, weight, egg groups) on one Pokemon’s webpage (ex. Aggron’s webpage).
Add the new data to an existing data frame that starts out blank (as we loop through more Pokemon, the existing data frame will grow until we have all 100 or so Pokemon).
After looping through the different webpages and adding the new data to the existing data frame each time, the final data frame becomes a neat table that can be used for analysis, given you clean some of the fields up for correct datatypes and formatting. Here is a portion of the data below:
The data is suitable for my analysis because it records each Pokemon in 1 row, and each column has some stat about that Pokemon. This makes it easy to summarize the Pokemon by type or by some other attribute so that we can answer the questions posed in the intro.
Preparing the Data for Analysis
Before we can do an analysis of the data, it is vital for us to clean the data. We want to make sure that the datatypes are correct for each Pokemon stat and make sure that our fields are ready for math if needed. I added a column to record if the Pokemon is a Legendary Pokemon, and a column to record if the Pokemon has a 4 times weakness. The new columns can help us see things about each Pokemon that were not apparent with the original data frame. Below is the code used to clean the data:
library(tidyverse) # The tidyverse collection of packages
# Make a list of type combinations with 4 times weaknesses. We will use this to
# help us create a new vector called Has_four_times_weakness when we use mutate.
Types_with_4x_weakness <- c(
"Bug / Steel", "Steel / Bug",
"Grass / Ice", "Ice / Grass",
"Ice / Steel", "Steel / Ice",
"Rock / Ground", "Ground / Rock",
"Ground / Fire", "Fire / Ground",
"Rock / Water", "Water / Rock",
"Water / Ground", "Ground / Water",
"Grass / Ground", "Ground / Grass",
"Water / Flying", "Flying / Water",
"Dragon / Flying", "Flying / Dragon",
"Ground / Dragon", "Dragon / Ground",
"Flying / Ground", "Ground / Flying",
"Bug / Flying", "Flying / Bug",
"Fire / Flying", "Flying / Fire",
"Ice / Flying", "Flying / Ice",
"Rock / Bug", "Bug / Rock",
"Rock / Ice", "Ice / Rock",
"Dragon / Fighting", "Fighting / Dragon",
"Electric / Steel", "Steel / Electric",
"Fire / Steel", "Steel / Fire",
"Ground / Steel", "Steel / Ground",
"Rock / Steel", "Steel / Rock"
)
# Use mutate to modify existing vectors and make new ones
Pokemon_data_df <-
Pokemon_data_df %>%
mutate(
# Make the National_No a number so that we can use it as a metric for when a # Pokemon was released. Newer pokemon have a higher National_No.
National_No = as.numeric(str_remove(National_No, "#")),
# Remove Unknown for Pokemon that only have 1 type and not 2
Type = ifelse(str_detect(Type,"Unknown"),
str_remove(Type, " / Unknown"), Type),
# Change the height to inches instead of ft'in\" format
Height = str_replace_all(Height, "[′']", "'"), # normalize quote types
Height = str_replace_all(Height, "[″\"]", "\""), # normalize quote types
Height = str_replace_all(Height, "\"", ""), # removes end of string
Height = str_split_fixed(Height, "'", 2), # splits it into a 2 part matrix
Height = as.numeric(Height[,1]) * 12 + as.numeric(Height[,2]),
# Change the weight to a number instead of a character and get rid of lbs.
Weight = as.numeric(str_remove(Weight, " lbs.")),
# Extract the catch rate number (0-255 -> Higher number means easier to catch)
Catch_rate = as.numeric(str_extract(Catch_rate, "(?<=Catch rate\\n)\\d+")),
# Extract the Hatch time(number is how many cycles you must complete to hatch
# an egg. A cycle is usually about 255 steps in the game's overworld.)
Egg_Groups = as.numeric(str_extract(Egg_Groups, "(?<=Hatch time\\n)\\d+")),
# Also make any legendary Pokemon NA becasue you cannot get an egg for them
Egg_Groups = ifelse(Catch_rate == 3 | Egg_Groups == 120, NA, Egg_Groups),
# Make a column that records if the Pokemon is Legendary or not
Legendary = ifelse(is.na(Egg_Groups), "Legendary", "Non-Legendary"),
# Create a column that records if the pokemon has a 4 times weakness.
# 4 times weaknesses are incredibly dangerous becasue for example,
# a fire type move can do 4 times damage to a Bug/Steel type Pokemon.
Has_four_times_weakness = ifelse(Type %in% Types_with_4x_weakness, 1, 0)
)
Pokemon_data_df# A tibble: 115 × 10
Pokemon National_No Type Height Weight Catch_rate Egg_Groups Abilities
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 Hitmonlee 106 Fighting 59 110. 45 25 Limber /…
2 Cubone 104 Ground 16 14.3 190 20 Rock Hea…
3 Exeggcute 102 Grass / … 16 5.5 90 20 Chloroph…
4 Voltorb 100 Electric 20 22.9 190 20 Soundpro…
5 Krabby 98 Water 16 14.3 225 20 Hyper Cu…
6 Drowzee 96 Psychic 39 71.4 190 20 Insomnia…
7 Onix 95 Rock / G… 346 463 45 25 Rock Hea…
8 Gastly 92 Ghost / … 51 0.2 190 20 Levitate…
9 Shellder 90 Water 12 8.8 190 20 Shell Ar…
10 Grimer 88 Poison 35 66.1 190 20 Stench /…
# ℹ 105 more rows
# ℹ 2 more variables: Legendary <chr>, Has_four_times_weakness <dbl>
Above is the cleaned version of the data frame.
Visuals and Analysis
Note: Since many of the Pokemon in the data set have 2 types, the charts below often show categories with things like fire/flying, normal/grass, water/electric, or ground/steel. For my analysis, I often use a single type to describe a group of Pokemon such as saying “Dragon type Pokemon have higher heights.” This statement refers to any Pokemon with Dragon as one of the 2 types. Now let’s begin the analysis!😊
First I looked at the distribution of heights across the different Pokemon types. I created a box plot that sorts the Pokemon types by their median height so that we can observe the differences easily.
Results and Explanation: The chart shows that a lot of Dragon and Flying type Pokemon have the greatest median heights. Other types like Normal and Fairy are much shorter since many of the type combinations with Normal and Fairy are towards the bottom of the chart. Interestingly, the two types with the greatest median heights that also have a lot of data (have colored box plots instead of just a single value) are type combinations that include Dragon in them. This would suggest that Dragon types are often taller than other types of Pokemon. This informs us that when playing Pokemon and looking out for tall Pokemon to bring to our team for a quest or for some other purpose, Dragon types are a strong contender. It is also interesting to note that Rock/Ground Pokemon have the greatest range of heights for this data set, which suggests that they have more inconsistent heights. Even though the median height is not the highest for this type combination, a team of Rock/Ground types could occasionally be very tall if you look for the right Pokemon when building your team. Here is an example of a tall Pokemon below that I find pretty hilarious (Alola Exeggutor).
Next I looked at Weight distributions across the different Pokemon types. I again created a box plot that shows the weight distributions for each type combination.
Results and Explanation: The results show that Dragon types have some of the largest weight values, which makes sense since a lot of the Dragon types in the data set are also tall and larger in general, as observed in the previous box plot. Flying, Rock, and Ground types show up a lot in the type combinations towards the top of the box plot as well, meaning those Pokemon types are often heavier than other Pokemon types. Type combinations like Rock/Ground, Fire/Flying, and Dragon/Flying have the largest weight medians while still having a significant amount of data points. Ground and Rock Pokemon are known to be heavy because of their name, but Flying types is a surprise. Many of our Flying types are Legendary Pokemon that tend to be large, so this can make the Flying types appear heavy when in reality they may not be when looking at the Pokedex as a whole (all 1,000 or so Pokemon). Overall though, these type combinations are useful to inform you on what types to look out for when battling. Certain Pokemon moves do more damage against heavier Pokemon, so if you are fighting against Pokemon that are of the Rock, Ground, or Flying types, you should opt to use weight based moves more often. On the other hand, if you are fighting types like fairy, you may want to avoid moves that do better against heavy Pokemon since Fairy types have a very light weight median compared to the other types of Pokemon, while still showing a lot of data in the box plot (purple box towards the bottom of the chart). Below is one of the heaviest Pokemon from gen 4 called Giratina. The Pokemon is a Legendary and it weighs about 1,650 lbs!!
The 3rd area of the data I explored was the catch rate. To do this I made a bar chart that records the average catch rate for each of the Pokemon types. I made the color scale diverging so that the bars with greater catch rates are darker green, signifying that the Pokemon from that group are easier to catch. The bars with darker red signify that Pokemon from that group are harder to catch.
Results and Explanation: In this chart you can see that Water, Bug, and Poison type Pokemon show up the most in the type combinations with the best catch rate scores. This means that those Pokemon are easiest to catch, which is likely because many Water, Poison, and Bug type Pokemon are small and do not have an evolution. They are not as strong, and therefore they are easy to catch. Dragon types are clearly the hardest to catch, with all of the lowest catch rates being for type combinations that include Dragon. With all of this information in mind, I would suggest that you use regular Pokeballs and Great Balls on Water, Bug, and Poison type Pokemon. Save the Master Ball and the Ultra Balls for the Dragon types since they have consistently low catch rates. You are going to need them. 😉
Next I looked at how a Pokemon’s height and weight are related across different categories of Pokemon. I split up the categories by if they had a 4 times weakness, and if they are a Legendary Pokemon or not. To visualize the trends, I created a faceted scatter plot. I made a color scale that puts the heavier and taller Pokemon in a brown color to emphasize that they are abnormal.
Results and Explanation: In this chart, you can see how on the whole, Legendary Pokemon tend to have a higher weight and height while Non-Legendary Pokemon usually stay below 300 lbs and 150 inches. When it comes to Pokemon with a 4x Weakness, there is not a whole lot of difference observed, but it is useful to point out that the number of brown dots for Pokemon with a 4x weakness is almost the same as the Pokemon without a 4x weakness at about 7-8 brown dots between the top and bottom facets. There are way more Pokemon in the “No 4x Weakness” category though, meaning the ratio of brown dots to total dots is higher for 4x weakness Pokemon. We could say that it is more common for Pokemon with a 4x weakness to be heavier and taller, but with some suspicion because of the lack of data in each facet (especially with only 3 dots in [Has 4x weakness ~ Legendary]). Overall, the data suggests that Legendary Pokemon and Pokemon with a 4x weakness tend to have higher heights and weights, which makes sense since 4x weak Pokemon might need the extra size to account for their weaknesses. Legendary Pokemon are also known for being big and difficult to defeat. I would suggest that when you are playing a Pokemon game and want to build a team of tanks (tall and heavy Pokemon), adding Legendary Pokemon or those with a 4x weakness would be a good way to go. Also, if you want to build a lightweight team or a short team, avoid adding Legendary Pokemon to your party because they are often very tall and extremely heavy, as observed in this scatter plot.
Below is my favorite heavy Pokemon. His name is Aggron. He weighs almost 800 lbs!!😮
The last thing I looked at in my analysis of the Pokemon data was the differences in Pokemon stats over time. I used the National_No to create two groups of Pokemon. The National_No correlates with the time of the Pokemon’s release. In this final chart, I decided to consider any Pokemon with a National_No > 350 to be “new.” Any other Pokemon will be considered “old.” I created a faceted bar chart to show the differences in the various stats between new and old Pokemon.
Results and Explanation: Overall, this bar chart shows us that there is a clear difference between stats for new vs old Pokemon in this data set. For almost every single statistic, the average value has increased over time, except for the catch rate. This lower catch rate in the new Pokemon is most likely due to more Legendary Pokemon being in the newer generations (this is evident from this same chart under Legendary(%) where the “New” category has a much higher percentage of Pokemon that are Legendary), which all have the lowest catch rates possible. For height and weight, the values are around double for the “New” category over the “Old” category. Again, this could be because of more legendary Pokemon being present in the newer generations, but it is also probable that Pokemon types like Steel being introduced has made way for heavier and taller Pokemon being created, since steel types are often big and heavy. Besides those previously stated attributes, hatch time is slightly larger for new Pokemon as well, but that cannot be attributable to more Legendary Pokemon because they do not have eggs. As generations of Pokemon games have progressed, getting steps in while traveling the over world has become easier, so it makes sense that the hatch times have been increased to account for that. There is also a higher percentage of Pokemon with 4x weaknesses in the newer Pokemon which could be attributable to the introduction of Fairy types. These types are super-effective against Dragon types and Dark types, making that once overpowered type combination of Dragon/Dark lackluster because of its 4x weakness to Fairy. More dual type Pokemon in the newer generations also can attribute to that higher percentage of Pokemon with 4x weaknesses for “New” Pokemon. Overall, many of these stats are different because of the introduction of more things in Pokemon, whether that be more Legendary Pokemon, more mobility in the game, or more types.
Conclusion:
After analyzing this data set, it appears that certain types like Dragon and Flying are great for finding tall Pokemon while Normal and Fairy types end up with the shorter Pokemon. Dragon, Rock, and Ground types are great for heavy Pokemon, while Fairy tend to be a bit lighter. This information is helpful for when you are teaching Pokemon weight or height based moves, or creating a team that you want to be tall or heavy. Dragon types definitely stick out as being the tallest and heaviest when compared to other Pokemon types across height or weight. That is one reason why Dragon types have been so strong since the start of Pokemon. They are usually the type used for Legendary Pokemon and are associated with lots of high stats which goes along with high height and weight. Fairy types stick out more for being smaller in both height and weight. If you want a lightweight team, Fairy types are great, and they also have a lot of great evolved forms that are still powerful in attack and speed stats, so they have many viable strategies in competitive play. Another thing to note is that attributes like having a 4x weakness or being a Legendary positively affect the weight and height of a Pokemon, so keep that in mind when choosing your team.
Other things I got out of analyzing the data set are that catch rates are the highest for Water, Bug, and Poison type Pokemon and the lowest for Dragon types. This can inform you on what types of Pokemon to use your best Pokeballs on (save those Ultra Balls for Dragon types since they have the lower catch rates). Lastly, over time, it appears that most Pokemon stats have changed on average. Notable changes are increases in height, weight, and Legendary percentage, and a decrease in catch rates. Many of the differences were attributable to things that were introduced into the Pokemon games over the years, whether that be more types like Fairy and Steel, more Legendary Pokemon, or more mobility in the over world of the game. That is the last thing that I looked into, but there is so much more to dive into here, so feel free to take my analysis even further if you are inclined.
This concludes my analysis of the Pokemon data. Again, if you would like to access the webpages that I scraped the data from, visit this link where you can click on any Pokemon in the list to get to the webpages I accessed: Bulbapedia_Pokemon_List
Through reading my analysis, I hope you found some interesting insights into what kinds of things affect Pokemon attributes. I also hope you now have an idea of which Pokemon types to target and use for certain activities in the video game or in the trading card game. It was fun looking into the data to learn more about my favorite Pokemon buddies. I hope you can use this information to improve your Pokemon game-play and your enjoyment of this wonderful franchise. So long! Gotta catch em all!!