Pokemon Exploratory Data Analysis

Introduction

This project is an exploratory data analysis on the first six generations of Pokemon. Using Pokemon stats which consist of Attack, Total, Defense, Health Points (HP), Special Attack, Special Defense, and Speed, three forms of detailed analysis will be conducted. First, there will be a distribution analysis on each stat which will be grouped by Pokemon type using box plots. Following the distribution analysis, each stat will be assessed by plotting the mean via bar graph for each statistic and will also be grouped by type. After that, each stat again will be assessed by calculating the mean for each stat, but instead of grouping the Pokemon by type, they will be grouped by generation.

Pokemon Stat Guide:

Attack - Determines how much damage is dealt using a physical attack.

Defense - Determines how much damage is received when hit with a physical attack.

Special Attack - determines how much damage is dealt when using a special move.

Special Defense - determines how much damage is received when hit with a special attack.

Speed - determines the order that a Pokemon can act in a battle.

Pokemon Generation Guide:

Generation 1 - 1996-1999

Generation 2 - 1999-2002

Generation 3 - 2002-2006

Generation 4 - 2006-2010

Generation 5 - 2010-2013

Generation 6 - 2013-2016

Distribution Analysis

This section will show the distribution of each stat by type. Utilizing the Box plot will show how many outliers each type has in the corresponding stat, the minimum and maximum level, and where the median lines. This section will give the viewer an accurate depiction of which type has the strongest/weakest Pokemon, as well as determining if each type of Pokemon is evenly distributed, or if there is a large disparity between the levels within that type.

Stat Attributes by Type

This section utilizes bar graphs showing the average stat level by type. Through the six generations of Pokemon, certain types have been favored more than others. Therefore there is a disparity between the number of Pokemon each type has. This may create a situation where a type that has more Pokemon may appear to have stronger Pokemon than others, simply because the total level is higher due to fact that there is more Pokemon to add together. For example, there are more than double the amount of Water type then there are Dragon type and when adding together the total levels, Water type seem to be stronger than Dragon type; which is not the case. To account for this, I took the averages of each type to get a more accurate depiction of the differences in levels between them.

Stat Attributes by Generation

This section utilizes bar graphs showing the average attribute level by generation (1-6). Some people speculate that Pokemon have gotten stronger through the generations, so this analysis is useful for determining if that statement is true or if it is pure speculation. Just as the previous section before, the averages of each stat by generation were taken to convey a more accurate depiction of the true level differences between each generation.

Regression Plots

These regression plots and summary statistics show what kind of relationship weight and height has on the a Pokemon’s attack and total levels. The first two regression plots show that there is a non-linear relationship between the independent variables (height, weight) and the dependent variable (Attack). The last two regression plots show that there is also a nonlinear relationship between the independent variables (height,weight) and the dependent variable (Total). To account for this non-linearity, the independent variables (height,weight) were both squared and included in the regression.

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Regression Results

The first regression is an analysis on the effect of Height and weight on a Pokemon’s attack level. The results show that all of the independent variables are statistically significant at the 0.01 level. With an r-squared of 0.34, it is an indicator that the independent variables do a good job at explaining the dependent variables variance.

The second regression is an an analysis on the effect of Height and weight on a Pokemon’s total level. The results show that all of the independent variables are statistically significant at the 0.01 level. With an r-squared of 0.48, it is an indicator that the independent variables do a good job at explaining the dependent variables variance.

The third regression is an analysis on the effect generation has on a Pokemon’s attack level. Generation is not statistically significant at the 0.05 level, indicating that a Pokemon’s generation is not a good predictor of its attack level.

The fourth regression gives a numerical representation of each Pokemon’s type and their attack levels.

## The following objects are masked from PokemonHeightWeight (pos = 5):
## 
##     Attack, BMI, Defense, Generation, Height.m., HP, Legendary, Name,
##     Sp..Atk, Sp..Def, Speed, Total, Type.1, Type.2, Weight..lbs., X.
## 
## Call:
## lm(formula = PokemonHeightWeight$Attack ~ PokemonHeightWeight$Weight..lbs. + 
##     PokemonHeightWeight$Height.m. + PokemonHeightWeight$Weight2 + 
##     PokemonHeightWeight$Height2 + PokemonHeightWeight$Height3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -78.867 -16.636  -1.929  14.768  90.866 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       4.229e+01  2.415e+00  17.508  < 2e-16 ***
## PokemonHeightWeight$Weight..lbs.  3.757e-02  1.184e-02   3.173  0.00157 ** 
## PokemonHeightWeight$Height.m.     3.646e+01  3.398e+00  10.730  < 2e-16 ***
## PokemonHeightWeight$Weight2      -9.409e-06  6.442e-06  -1.460  0.14458    
## PokemonHeightWeight$Height2      -5.632e+00  7.230e-01  -7.789 2.12e-14 ***
## PokemonHeightWeight$Height3       2.324e-01  4.192e-02   5.545 4.01e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.81 on 794 degrees of freedom
## Multiple R-squared:  0.3714, Adjusted R-squared:  0.3674 
## F-statistic: 93.82 on 5 and 794 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = PokemonHeightWeight$Total ~ PokemonHeightWeight$Weight..lbs. + 
##     PokemonHeightWeight$Height.m. + PokemonHeightWeight$Weight2 + 
##     PokemonHeightWeight$Height2 + PokemonHeightWeight$Height3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -331.75  -53.80   -7.22   46.00  299.21 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       2.698e+02  7.881e+00  34.234  < 2e-16 ***
## PokemonHeightWeight$Weight..lbs.  8.924e-02  3.863e-02   2.310   0.0212 *  
## PokemonHeightWeight$Height.m.     1.716e+02  1.109e+01  15.471  < 2e-16 ***
## PokemonHeightWeight$Weight2      -1.909e-05  2.102e-05  -0.908   0.3640    
## PokemonHeightWeight$Height2      -2.494e+01  2.360e+00 -10.569  < 2e-16 ***
## PokemonHeightWeight$Height3       9.930e-01  1.368e-01   7.259 9.29e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 84.24 on 794 degrees of freedom
## Multiple R-squared:   0.51,  Adjusted R-squared:  0.5069 
## F-statistic: 165.3 on 5 and 794 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = PokemonHeightWeight$Attack ~ PokemonHeightWeight$Generation)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -74.681 -24.681  -3.676  19.314 113.335 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     75.6601     2.5662  29.483   <2e-16 ***
## PokemonHeightWeight$Generation   1.0052     0.6907   1.455    0.146    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.43 on 798 degrees of freedom
## Multiple R-squared:  0.002647,   Adjusted R-squared:  0.001397 
## F-statistic: 2.118 on 1 and 798 DF,  p-value: 0.146
## 
## Call:
## lm(formula = Attack ~ Type.1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -68.704 -22.715  -3.469  18.847 118.544 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     70.9710     3.7123  19.118  < 2e-16 ***
## Type.1Dark      17.4161     6.6675   2.612 0.009172 ** 
## Type.1Dragon    41.1540     6.5952   6.240 7.17e-10 ***
## Type.1Electric  -1.8801     5.9492  -0.316 0.752066    
## Type.1Fairy     -9.4416     8.3497  -1.131 0.258497    
## Type.1Fighting  25.8068     7.0000   3.687 0.000243 ***
## Type.1Fire      13.7982     5.6629   2.437 0.015048 *  
## Type.1Flying     7.7790    15.8590   0.491 0.623912    
## Type.1Ghost      2.8102     6.5952   0.426 0.670152    
## Type.1Grass      2.2433     5.2312   0.429 0.668170    
## Type.1Ground    24.7790     6.5952   3.757 0.000185 ***
## Type.1Ice        1.7790     7.3077   0.243 0.807729    
## Type.1Normal     2.4984     4.8461   0.516 0.606317    
## Type.1Poison     3.7076     6.9096   0.537 0.591709    
## Type.1Psychic    0.4851     5.5194   0.088 0.929983    
## Type.1Rock      21.8926     5.9492   3.680 0.000249 ***
## Type.1Steel     21.7327     7.0000   3.105 0.001974 ** 
## Type.1Water      3.1808     4.7193   0.674 0.500513    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.84 on 782 degrees of freedom
## Multiple R-squared:  0.1166, Adjusted R-squared:  0.09736 
## F-statistic:  6.07 on 17 and 782 DF,  p-value: 2.231e-13

Conclusion

This exploratory data analysis can be useful in helping an aspiring Pokemon trainer strategically catch Pokemon that will best improve their odds in winning battles. After reading this report, a trainer will be educated on which Pokemon serve their interests the best. Whether a trainer is looking for a Pokemon that can deal a lot of damage quickly, has an impenetrable defense that will be able to withstand the strongest of Pokemon, has a high number of health points to outlast a barrage of attacks, or is looking for a Pokemon with high speed stats that will always attack first each round, this is great place to research which Pokemon will be best suited for each part in that trainer’s game plan and strategy.

Christopher Fleming

December 22, 2019