Introduction to the topic of choice
I started playing Pokemon many, many years ago as a kid and was hooked on the franchise almost immediately. I followed the game series up until about maybe 5th grade but it still holds a special and nostalgic place in my heart to this day. I didn’t really know much about the stats of Pokemon as a kid and I only really cared if the Pokemon looked cool to me. Now that I am nearing graduation, I would like to revisit my childhood using analytics! Pokemon has since added so much newer Pokemon making it fun to see what all the different stats mean and how they correlate with one another. The goal of this dataset is to get a bit more analytical insight on the Pokemon I grew up with and discover if there are any correlations such as if the weight of a Pokemon correlates with any sort of stat.
Data Source and General Information
This data was sourced from Kaggle and has roughly 800 Pokemon from the franchise. This franchise has come a long way since the original 151 Pokemon. It is also important to note that this dataset comes from the Pokemon games (Gameboy, DS, etc.) and not Pokemon cards or Pokemon Go.
Data Variable Names and Description
In general these are some of the variables and their description. These Pokemon stats can be thought of as their “genetics” or raw attributes that can then be used to determine how much damage they can do or how much defense they have.
| Name | Description |
|---|---|
| Name | Contains the English and Japanese name for each Pokemon |
| Type 1 | Every Pokemon has a type and determines their weakness/resistance to attacks |
| Type 2 | Some Pokemon are a dual type |
| Total | This is the sum of all stats of a given Pokemon |
| HP | Health Points |
| Attack | How powerful a physical move will be |
| Defense | How well a given Pokemon defends agaisnt physical moves |
| SP. Attack | Special attack of a given special move |
| SP. Defense | How well a given Pokemon defends against a special move |
| Speed | Decides which Pokemon acts first in a given battle |
Lets do a quick summary statistic to see how many of a certain type of Pokemon there are in this dataset. This summary statistic below shows that there are only 52 Pokemon that are only fire.
| Type | How many times it appears |
|---|---|
| fire | 52 |
Desciptive Analysis
Now we would like to see what types of Pokemon make up this dataset. There are so many different types of Pokemon so lets see what the number of Pokemon for each type.
Here I made two visualizations, one for primary type and one for dual types. The dual type visualization has one big NA bar but that is just because there are far more Pokemon that do not have a dual type. It’s also interesting to see how much water and normal type Pokemon there are compared to other types.
I also wanted to see what was the type make up of all legendary Pokemon, so I included it at the end!
There is a variety of stats for a given Pokemon. An interesting question we can investigate is if there are any two given stats that correlate with each other. Lets investigate to see if attack and weight correlate with each other. If the Pokemon is heavier does it hit harder?
Lets now look at the total variable. The total variable is the total stat that adds up all stats ranging from HP, Defense, Attack, and more. The total stat is a good overall indicator of how strong a Pokemon is.
It’s a pretty interesting distribution here and you can clearly see two peaks and then it levels off. One of the biggest game mechanism in this franchise is that it allows Pokemon to evolve similar to how metamorphosis is in real life. Evolution in Pokemon is more than just a visual change because it also increases stats which is where you see the 2nd peak! Most but not all Pokemons have a secondary evolution form. Pretty cool to see the evolution aspect of the game in this distribution visual here.
Here was just another visualization on how attack and defense scales with one another for normal Pokemon and legendary Pokemon
Sentiment Analysis using Secondary Data source from Twitter API
The sentiment analysis was sourced from scraping Twitter data. 300 tweets were collected searching for any mention of the word “pokemon”. The collected tweets were also taken from only those in English. The scraping of this twitter data was done on a separate R script and downloaded as a csv. The csv was then hosted on a personal one drive to be used in this rmarkdown document.
So here we have the recent sentiments from Pokemon Tweets categorized in positive and negative columns according to the words used in a given tweet.
Pretty interesting to see how negative it is compared the the positives from the recent tweets. I don’t follow all of the most recent Pokemon games and news but after doing a bit of research - it seems that the fan base has mixed feelings about the direction of the newest Pokemon game releases in recent years which can be seen in our sentiment analysis of the twitter data.
Prescriptive Analysis
One inquiry you can ask to start investigating into is seeing what type of variables interact and influence one another. In this analysis I will be analyzing how the basic stats of a pokemon (HP, Defense, Attack, SP. Attack, SP. Defense, and Speed) influence another variable such as the weight or the base happiness of a Pokemon.
Weight is the measure of the mass of a Pokemon and it is usually measured in kilograms. Every Pokemon species is given a specific weight that is displayed on the Pokedex in game. Happiness is a hidden value from 0 to 255 which is given to all Pokemon in the game once caught. This stat determines how well you have “tamed” your Pokemon. Certain game mechanisms such as evolution will occur if a Pokemon reaches a certain level of happiness.
The summary below will be used to analyze which basic stat of a pokemon influences the weight of a pokemon. We interpret this data by first looking at the coefficients used and their p-value. P-value of less than 0.05 are significant and anything greater than 0.05 is deemed insignificant. We can see here that defense, HP, attack, and then speed is the order of most significant to least. Special attack and special defense seems to not have any significant influence in this case!
##
## Call:
## lm(formula = weight_kg ~ hp + attack + sp_attack + sp_defense +
## defense + speed, data = pokemon)
##
## Residuals:
## Min 1Q Median 3Q Max
## -155.03 -39.55 -9.21 20.55 933.99
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -125.1125 12.6723 -9.873 < 2e-16 ***
## hp 1.2213 0.1431 8.534 < 2e-16 ***
## attack 0.4865 0.1357 3.586 0.000357 ***
## sp_attack 0.1605 0.1320 1.216 0.224344
## sp_defense 0.0429 0.1619 0.265 0.791146
## defense 0.9572 0.1450 6.601 7.6e-11 ***
## speed -0.3045 0.1354 -2.249 0.024824 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 91.38 on 774 degrees of freedom
## (20 observations deleted due to missingness)
## Multiple R-squared: 0.3071, Adjusted R-squared: 0.3018
## F-statistic: 57.18 on 6 and 774 DF, p-value: < 2.2e-16
The next summary is similar to the previous one where we will be using the same coefficients (basic stats of Pokemon) and seeing if there is any significance for the base happiness of a Pokemon.
In this case it seems that special attack, attack, and defense are of significance to the base happiness. The variables that are not significant in this case are HP, special defense, and speed.
##
## Call:
## lm(formula = base_happiness ~ hp + attack + sp_attack + sp_defense +
## defense + speed, data = pokemon)
##
## Residuals:
## Min 1Q Median 3Q Max
## -76.120 -1.679 2.883 7.179 84.943
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82.90914 2.57735 32.168 < 2e-16 ***
## hp 0.02598 0.02910 0.893 0.372180
## attack -0.09475 0.02745 -3.452 0.000586 ***
## sp_attack -0.09344 0.02667 -3.504 0.000485 ***
## sp_defense 0.01226 0.03283 0.373 0.708979
## defense -0.06290 0.02919 -2.155 0.031472 *
## speed -0.02375 0.02728 -0.871 0.384181
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.74 on 794 degrees of freedom
## Multiple R-squared: 0.09215, Adjusted R-squared: 0.08529
## F-statistic: 13.43 on 6 and 794 DF, p-value: 1.517e-14
Conclusion
Overall I really enjoyed the openness of this project and allowing us to choose a data set that is most interesting to us. Doing an analysis on this was a fun and interesting way for me to learn and see one of my favorite childhood games under an analytical lens.