For this analysis I’m going to dive into Pokemon ratings and whether or not they are marked as being “Legendary”. I will focus on creating a logistic model and a desicion tree model to help predict whether or not a pokemon will be Legendary based on specific variable parameters
## ID Name Type_1 Type_2 Total HP Attack Defense Sp_Atk
## 1 1 Bulbasaur Grass Poison 318 45 49 49 65
## 2 2 Ivysaur Grass Poison 405 60 62 63 80
## 3 3 Venusaur Grass Poison 525 80 82 83 100
## 4 4 VenusaurMega Venusaur Grass Poison 625 80 100 123 122
## 5 5 Charmander Fire 309 39 52 43 60
## 6 6 Charmeleon Fire 405 58 64 58 80
## Sp_Def Speed Generation Legendary
## 1 65 45 1 FALSE
## 2 80 60 1 FALSE
## 3 100 80 1 FALSE
## 4 120 80 1 FALSE
## 5 50 65 1 FALSE
## 6 65 80 1 FALSE
##
## Call:
## glm(formula = Legendary ~ Attack + Defense + Sp_Atk + Sp_Def +
## HP + Speed, family = "binomial", data = pkdf)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.26688 -0.14919 -0.03675 -0.00546 2.11519
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -21.877464 2.442548 -8.957 < 2e-16 ***
## Attack 0.017868 0.006629 2.695 0.00703 **
## Defense 0.033009 0.008294 3.980 6.89e-05 ***
## Sp_Atk 0.035763 0.007141 5.008 5.50e-07 ***
## Sp_Def 0.043027 0.009013 4.774 1.81e-06 ***
## HP 0.034396 0.008776 3.919 8.89e-05 ***
## Speed 0.050447 0.009592 5.259 1.44e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 450.90 on 799 degrees of freedom
## Residual deviance: 176.89 on 793 degrees of freedom
## AIC: 190.89
##
## Number of Fisher Scoring iterations: 8
##
## Call:
## glm(formula = Legendary ~ Attack + Sp_Atk + Sp_Def + HP + Speed,
## family = "binomial", data = pkdf)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.94686 -0.17799 -0.04890 -0.01067 3.13987
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -18.812637 2.003383 -9.390 < 2e-16 ***
## Attack 0.027386 0.005931 4.618 3.88e-06 ***
## Sp_Atk 0.030284 0.006517 4.647 3.37e-06 ***
## Sp_Def 0.053891 0.008839 6.097 1.08e-09 ***
## HP 0.027741 0.007496 3.701 0.000215 ***
## Speed 0.040585 0.008607 4.715 2.42e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 450.90 on 799 degrees of freedom
## Residual deviance: 192.46 on 794 degrees of freedom
## AIC: 204.46
##
## Number of Fisher Scoring iterations: 8
##
## Call:
## glm(formula = Legendary ~ Attack + Sp_Atk + Sp_Def + HP, family = "binomial",
## data = pkdf)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.87102 -0.24077 -0.08417 -0.02674 2.74552
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -14.735341 1.456811 -10.115 < 2e-16 ***
## Attack 0.029982 0.005642 5.314 1.07e-07 ***
## Sp_Atk 0.038977 0.006249 6.237 4.45e-10 ***
## Sp_Def 0.043578 0.007496 5.813 6.12e-09 ***
## HP 0.021061 0.006744 3.123 0.00179 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 450.90 on 799 degrees of freedom
## Residual deviance: 220.07 on 795 degrees of freedom
## AIC: 230.07
##
## Number of Fisher Scoring iterations: 8
We can see from this model that deleteing SPEED produced a good mdoel.
Looking at the results from this model we can interpret the coeffiencent of Attack as if Attack increases by 1 point, chance of this Pokemon being legendary increases by 3.04%
Putting all the same variables in one tree.
If we use the model we can try predict two Pokemon ID numbers and determine if they are Legendary.
Pokemon 59
- Model Prediction: Most likely not legendary
- Actual: Not legendary
Pokemon 799
- Model Prediction: Most likely legendary
- Actual: Legendary
Looking at both thesse line chart we can see that Logisitic Model performs better and therefore is a better choice to use when predicting if a Pokemon will be Legendary.