Overview

For this analysis I’m going to dive into Pokemon ratings and whether or not they are marked as being “Legendary”. I will focus on creating a logistic model and a desicion tree model to help predict whether or not a pokemon will be Legendary based on specific variable parameters

Sample of Data

##   ID                  Name Type_1 Type_2 Total HP Attack Defense Sp_Atk
## 1  1             Bulbasaur  Grass Poison   318 45     49      49     65
## 2  2               Ivysaur  Grass Poison   405 60     62      63     80
## 3  3              Venusaur  Grass Poison   525 80     82      83    100
## 4  4 VenusaurMega Venusaur  Grass Poison   625 80    100     123    122
## 5  5            Charmander   Fire          309 39     52      43     60
## 6  6            Charmeleon   Fire          405 58     64      58     80
##   Sp_Def Speed Generation Legendary
## 1     65    45          1     FALSE
## 2     80    60          1     FALSE
## 3    100    80          1     FALSE
## 4    120    80          1     FALSE
## 5     50    65          1     FALSE
## 6     65    80          1     FALSE

Logistic Model

Test with all the dependent variables first. The model could be improved.
## 
## Call:
## glm(formula = Legendary ~ Attack + Defense + Sp_Atk + Sp_Def + 
##     HP + Speed, family = "binomial", data = pkdf)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.26688  -0.14919  -0.03675  -0.00546   2.11519  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -21.877464   2.442548  -8.957  < 2e-16 ***
## Attack        0.017868   0.006629   2.695  0.00703 ** 
## Defense       0.033009   0.008294   3.980 6.89e-05 ***
## Sp_Atk        0.035763   0.007141   5.008 5.50e-07 ***
## Sp_Def        0.043027   0.009013   4.774 1.81e-06 ***
## HP            0.034396   0.008776   3.919 8.89e-05 ***
## Speed         0.050447   0.009592   5.259 1.44e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 450.90  on 799  degrees of freedom
## Residual deviance: 176.89  on 793  degrees of freedom
## AIC: 190.89
## 
## Number of Fisher Scoring iterations: 8
Test with deleteing Defense variable:
## 
## Call:
## glm(formula = Legendary ~ Attack + Sp_Atk + Sp_Def + HP + Speed, 
##     family = "binomial", data = pkdf)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.94686  -0.17799  -0.04890  -0.01067   3.13987  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -18.812637   2.003383  -9.390  < 2e-16 ***
## Attack        0.027386   0.005931   4.618 3.88e-06 ***
## Sp_Atk        0.030284   0.006517   4.647 3.37e-06 ***
## Sp_Def        0.053891   0.008839   6.097 1.08e-09 ***
## HP            0.027741   0.007496   3.701 0.000215 ***
## Speed         0.040585   0.008607   4.715 2.42e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 450.90  on 799  degrees of freedom
## Residual deviance: 192.46  on 794  degrees of freedom
## AIC: 204.46
## 
## Number of Fisher Scoring iterations: 8
Test with deleteing Speed variable:
## 
## Call:
## glm(formula = Legendary ~ Attack + Sp_Atk + Sp_Def + HP, family = "binomial", 
##     data = pkdf)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.87102  -0.24077  -0.08417  -0.02674   2.74552  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -14.735341   1.456811 -10.115  < 2e-16 ***
## Attack        0.029982   0.005642   5.314 1.07e-07 ***
## Sp_Atk        0.038977   0.006249   6.237 4.45e-10 ***
## Sp_Def        0.043578   0.007496   5.813 6.12e-09 ***
## HP            0.021061   0.006744   3.123  0.00179 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 450.90  on 799  degrees of freedom
## Residual deviance: 220.07  on 795  degrees of freedom
## AIC: 230.07
## 
## Number of Fisher Scoring iterations: 8

We can see from this model that deleteing SPEED produced a good mdoel.

Looking at the results from this model we can interpret the coeffiencent of Attack as if Attack increases by 1 point, chance of this Pokemon being legendary increases by 3.04%

Decision Model

Putting all the same variables in one tree.

Comparing the Logisitic Model and the Decision Tree

Final Thoughts

If we use the model we can try predict two Pokemon ID numbers and determine if they are Legendary.

Pokemon 59
- Model Prediction: Most likely not legendary
- Actual: Not legendary

Pokemon 799
- Model Prediction: Most likely legendary
- Actual: Legendary

Looking at both thesse line chart we can see that Logisitic Model performs better and therefore is a better choice to use when predicting if a Pokemon will be Legendary.