The model used in this research is directly related to “Week 8 Data Dive” found in this account’s RMarkdown notebooks.
Last week we asked how attack predicted speed. For multiple regression, we will consider how several stats together predict speed.
library(readr)
pokemon <- read_csv("pokemon.csv")
## Rows: 801 Columns: 41
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): abilities, capture_rate, classfication, japanese_name, name, type1...
## dbl (34): against_bug, against_dark, against_dragon, against_electric, again...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
linear_model <- lm(speed ~ attack, data = pokemon)
summary(linear_model)
##
## Call:
## lm(formula = speed ~ attack, data = pokemon)
##
## Residuals:
## Min 1Q Median 3Q Max
## -64.451 -20.673 -1.769 18.475 108.231
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.65016 2.50646 16.62 <2e-16 ***
## attack 0.31705 0.02976 10.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27.07 on 799 degrees of freedom
## Multiple R-squared: 0.1244, Adjusted R-squared: 0.1233
## F-statistic: 113.5 on 1 and 799 DF, p-value: < 2.2e-16
In multiple regression, the slope for attack is the expected change in Speed for a 1-point increase in attack while holding all other variables constant.
Game designers are primarily concerned with providing variability in game play to keep the player experience fresh and exciting. Last week, we discussed the possibility of roles in Pokemon such as tanks (high attack and defense, low speed) that allow players to adopt various strategies in battle. Testing this theory likely includes the addition of stats such as defense and weight. The line of thinking is that ‘tankier’ Pokemon may literally be bigger to act as a shield to their team in battle.
\(Speed = \beta_0 + \beta_1 \cdot Attack + \beta_2 \cdot Defense + \beta_3 \cdot Weight + \epsilon\)
Each slope now has a unique question. For attack, how does speed change while holding defense and weight constant. For defense, how does speed change holding attack and weight constant. Lastly, for weight, how does speed change holding attack and defense constant.
\(H_0: \beta_1 = \beta_2 = \beta_3 = 0\)
The existence of tanks in Pokemon allows for some predictions related to our model. We can expect attack will likely be a weak predictor of speed, defense will likely have a negative slope, and weight will likely be a strong negative predictor (heavier Pokemon are likely slower). The overall R-squared should increase compared to simple regression, because the addition of multiple variables explains more variation in speed than just attack alone in our previous model.
multi_model <- lm(speed ~ attack + defense + weight_kg, data = pokemon)
summary(multi_model)
##
## Call:
## lm(formula = speed ~ attack + defense + weight_kg, data = pokemon)
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.906 -19.649 -2.118 17.454 109.265
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.619150 2.922226 15.953 < 2e-16 ***
## attack 0.419326 0.034222 12.253 < 2e-16 ***
## defense -0.165284 0.036526 -4.525 6.98e-06 ***
## weight_kg -0.013897 0.009834 -1.413 0.158
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26.48 on 777 degrees of freedom
## (20 observations deleted due to missingness)
## Multiple R-squared: 0.1642, Adjusted R-squared: 0.1609
## F-statistic: 50.87 on 3 and 777 DF, p-value: < 2.2e-16
The focuses of this model are the signs of each slope, the p-values and R-squared and adjusted R-squared. A positive slope will show an increase in speed, while a negative slop will show a decrease in speed. The p-value conveys whether the predictors matter or not (small p = matters). Lastly, R-squared conveys how much variation in speed is explained, and the adjusted R-squared penalize any unnecessary predictors (in other words, helping our analysis when there are outliers).
Attack and defense are seen with incredibly small p-values, providing evidence that we should reject their null hypotheses. This also means that attack and defense are statistically significant predictors of speed. However, weight is seen with a large p-value, providing evidence that we should fail to reject the null hypothesis. Weight, then, is not a statistically significant predictor of Speed after controlling for Attack and Defense respectively. Our R-squared value of .164 means that attack, defense and weight together explain about 16.4% of variation in Speed. Our adjusted R-squared, which is close to our multiple R-squared, means that the model is stable.