library(tidyverse) # For Tidyverse functions
library(skimr) # For better summary statisticsAnalysis of Pokemon Data
Introduction
The data I will be analyzing is Pokemon from Generations 1 through 7. This data records things such as Pokemon stats, types, Pokedex data, and other miscellaneous information about the Pokemon. An individual row represents a single Pokemon or form of a Pokemon, and a single column represents a characteristic of that Pokemon.
pokemon <- read.csv("https://myxavier-my.sharepoint.com/:x:/g/personal/scolaj_xavier_edu/ET-7-aXThtZMv6Fx2uQtW4wBaRKIKD_dPQaoYlKvbQpQ6g?download=1")Research Question
What type has on average the highest defensive stats? (i.e. HP, Defense, and Special Defense)? Steel-types are generally thought to be great defensive Pokemon due to Steel’s low amount of type weaknesses, high number of type resistances, and generally high bulk (i.e. HP, Defense, and Special Defense stats). However, there are other types that could potentially have better bulk that just get a bad rap due to type weaknesses. However, I know that some types are naturally bulkier due to a higher percentage of Legendary Pokemon, so this analysis will only be dealing with non-Legendary Pokemon.
Analysis
I intend to answer this question by creating a bar chart with a variable averaging a Pokemon’s Defense, Special Defense, and Speed to get a rough estimate on how bulky that Pokemon is. This makes sense as Pokemon could be hit by either special or physical moves that would take either defense stat into account. This gives a rough sense of bulkiness, however imperfect it is at actually representing the true survivability of the Pokemon. How I will filter the results to exclude Legendary Pokemon will be to filter out all Undiscovered egg groups in the egg.group1 variable. All Legendary Pokemon belong to the Undiscovered egg group since, intended to be rare, they are unable to breed. While this will unfortunately remove some Pokemon that also belong to that egg group, this is the closest to an exact way to filter Legendary Pokemon out of the data, as all other variables are either too generic, too specific, or numeric.
pokemon$bulk <- (pokemon$defense + pokemon$spdefense+ pokemon$hp)/2
pokemon %>%
filter(`egg.group1` != "Undiscovered") %>%
ggplot(aes(x = type1, y = bulk)) +
geom_bar(stat = "summary", fun = mean) +
labs(title = "Average 'Bulkiness' of Each Type's Non-Legendary Pokemon",
x = "Primary Type", y = "Average 'Bulk'")This graph supports my assumption, as Steel-types have the highest average bulk based on the graph. However, it did have a surprising runner-up. Dragon-type makes sense due to the frequency of late game Dragon-types, and Rock-types are also commonly thought of as bulky, but Fairy-type was also within that range, which surprised me since I did not expect them to be on par with Dragon and Rock-types!