I got this Pokemon dataset from Kaggle. This data set includes 721 Pokemon, including their number, name, first and second type, and basic stats: HP, Attack, Defense, Special Attack, Special Defense, and Speed.
library(Zelig)
## Loading required package: survival
library(ZeligChoice)
library(faraway)
##
## Attaching package: 'faraway'
## The following objects are masked from 'package:survival':
##
## rats, solder
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(survival)
library(readr)
Pokemon=read_csv("Desktop/Pokemon.csv")
## Parsed with column specification:
## cols(
## `#` = col_double(),
## Name = col_character(),
## `Type 1` = col_character(),
## `Type 2` = col_character(),
## Total = col_double(),
## HP = col_double(),
## Attack = col_double(),
## Defense = col_double(),
## `Sp. Atk` = col_double(),
## `Sp. Def` = col_double(),
## Speed = col_double(),
## Generation = col_double(),
## Legendary = col_logical()
## )
head(Pokemon)
## # A tibble: 6 x 13
## `#` Name `Type 1` `Type 2` Total HP Attack Defense `Sp. Atk`
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 Bulb… Grass Poison 318 45 49 49 65
## 2 2 Ivys… Grass Poison 405 60 62 63 80
## 3 3 Venu… Grass Poison 525 80 82 83 100
## 4 3 Venu… Grass Poison 625 80 100 123 122
## 5 4 Char… Fire <NA> 309 39 52 43 60
## 6 5 Char… Fire <NA> 405 58 64 58 80
## # … with 4 more variables: `Sp. Def` <dbl>, Speed <dbl>, Generation <dbl>,
## # Legendary <lgl>
#recoding Legendary to factor
Pokemon$Legendary=as.factor(Pokemon$Legendary)
library(texreg)
## Version: 1.36.23
## Date: 2017-03-03
## Author: Philip Leifeld (University of Glasgow)
##
## Please cite the JSS article in your publications -- see citation("texreg").
library(magrittr)
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:texreg':
##
## extract
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
##
## extract
## The following object is masked from 'package:texreg':
##
## extract
We will start with our main explanatory variable–Legendary (False or True)–and add features to try to best explain hitting points. We will then choose our working model based on the lowest AIC/BIC score.
reg1=zelig(HP~Legendary,model = "poisson",data=Pokemon,cite=F)
reg2=zelig(HP~Legendary+Attack,model = "poisson",data=Pokemon,cite=F)
reg3=zelig(HP~Legendary+Attack+Defense,model = "poisson",data=Pokemon,cite=F)
reg4=zelig(HP~Legendary+Attack+Defense+Speed,model = "poisson",data=Pokemon,cite=F)
htmlreg(list(reg1,reg2,reg3,reg4))
| Model 1 | Model 2 | Model 3 | Model 4 | ||
|---|---|---|---|---|---|
| (Intercept) | 4.21*** | 3.89*** | 3.86*** | 3.86*** | |
| (0.00) | (0.01) | (0.01) | (0.02) | ||
| LegendaryTRUE | 0.32*** | 0.15*** | 0.15*** | 0.15*** | |
| (0.01) | (0.01) | (0.01) | (0.02) | ||
| Attack | 0.00*** | 0.00*** | 0.00*** | ||
| (0.00) | (0.00) | (0.00) | |||
| Defense | 0.00*** | 0.00*** | |||
| (0.00) | (0.00) | ||||
| Speed | -0.00 | ||||
| (0.00) | |||||
| AIC | 11319.91 | 10422.74 | 10404.06 | 10406.06 | |
| BIC | 11329.28 | 10436.80 | 10422.80 | 10429.49 | |
| Log Likelihood | -5657.95 | -5208.37 | -5198.03 | -5198.03 | |
| Deviance | 6506.50 | 5607.34 | 5586.66 | 5586.66 | |
| Num. obs. | 800 | 800 | 800 | 800 | |
| p < 0.001, p < 0.01, p < 0.05 | |||||
Based on both AIC and BIC, the lowest score for both is reg3, which we will use for our simulation and analysis!
Let us set our counterfactuals for Pokemon Legendary being false or true
reg3$setx(Legendary="FALSE")
reg3$setx1(Legendary="TRUE")
Simulating our Data:
reg3$sim()
Plotting our Data:
reg3$graph()
Here, we set up a first difference simulation for Legendary where all other features in regression 3 are set to their default values (i.e. mode, median, etc.). We set X to be Legendary that is false, and X1 to be Legendary that is true.
From looking at the visuals, we can see that Legendary being false has an expected value of scoring about 10.4 less hitting points than if Legendary being true. Breaking it down more, the Legendary being false has an expected value of scoring about 67.6 hitting points whereas the Legendary being false has an expected value of scoring about 78 hitting points.
reg3$get_qi(xvalue="x1",qi="fd")%>%
data.frame()%>%
summary()
## .
## Min. : 6.898
## 1st Qu.: 9.949
## Median :10.726
## Mean :10.759
## 3rd Qu.:11.598
## Max. :14.733