ANOVA and Regression Models

There are several continuous columns in our Pokemon data set. However, some of the most valuable of these columns are those related to a Pokemon’s stats (i.e. attack, speed, defense etc.). A Pokemon’s stats totals are a direct measurement of its proficiency in battle and how favorable the outcome of a battle between Pokemon trainers will fare. Trainers may adopt an offensive play style with their teams, where they purposely choose Pokemon with higher than average attack or special attack stats. Other trainers may play more defensively and opt to have Pokemon with higher hp or defense stats. Either way, every Pokemon has a niche in the competitive scene.

Players who choose to compete competitively in live tournaments often have an ‘ace’ on their team with a niche ability or an offensive edge. Therefore, most players interested in viewing or participating in the competitive scene are likely to be most interested in a Pokemon’s attack stat, which will be the response variable. A categorical variable that may be directly related to a Pokemon’s attack stat is its type, which, in other words, will influence our response variable.

H0: The population mean Attack stat is equivalent across all groups (Pokemon types)

H1: At least one of the three types has a different mean Attack stat

Before calculating our data, we know that the size of the F-statistic (variation between groups/ variation within groups) will help determine whether we should reject or fail to reject the null hypothesis. If there is a large F-statistic, then that is evidence that we should reject the null hypothesis. While if the F-statistic is small, that is evidence that we should fail to reject the null hypothesis. Furthermore, a small p-value (probability value) is indicative of strong evidence that we should reject the null hypothesis.

ANOVA assumes independence of observations, normality of residuals within each group, and equal variances across groups. Although our null hypothesis may appear to be similar to the core value of ‘equal variances’, it should be clarified that ‘equal variances’ is referring to the idea that the standard deviations of the population are roughly similar.

## Rows: 801 Columns: 41
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): abilities, capture_rate, classfication, japanese_name, name, type1...
## dbl (34): against_bug, against_dark, against_dragon, against_electric, again...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
##              Df Sum Sq Mean Sq F value Pr(>F)
## type1         2   2632  1316.0   1.587  0.207
## Residuals   241 199911   829.5

The ANOVA test in relation to our hypothesis provides evidence that we should fail to reject our null hypothesis. The relatively small F-statistic of 1.5 and large p-value (p > .05) of .207, tells us that the between-group (differences in mean attack) variation is not much larger than the within-group (natural variability inside each type). Furthermore, our plot appears to have relatively similar medians with similar spreads. We notice a few outliers in our water types, but not dramatic enough to convey a significant ANOVA.

Linear Regression Models

Competitively viable Pokemon have high base stats, useful typings and strong abilities. One of the most useful stats for a Pokemon is its speed. Pokemon with high speed stats tend to move first in battle, which is a huge advantage for gaining the upper hand. In fact, there are a plethora of moves such as ‘Trick Room’, ‘Agility’, and ‘Dragon Dance’ that can increase a Pokemon’s speed and other stats or even given extremely slow Pokemon the chance to move first. Either way, Pokemon with high attack and speed stats are incredibly valuable on a team. Knowing this information, how does speed change, on average, when attack increases by one point?

Regression is about the relationships between two numeric variables. Linear regression follows this particular model:

\(Speed = \beta_0 + \beta_1 \cdot Attack + \epsilon\)

Where \(\beta_0\) is our intercept, \(\beta_1\) is our slope, and \(\epsilon\) is the random noise. Before doing our calculations, we know that a slope greater than zero indicates a higher attack and speed while a slope less than zero indicates a higher speed and lower speed. If our slope is roughly equivalent to zero, then attack is not a predictor of speed.

\(H_0: \beta_1 = 0\)

\(H_1: \beta_1 \neq 0\)

In other words, our null hypothesis states that attack likely has no linear relationship with speed. Our alternative hypothesis states that attack may predict speed. If we predict that higher attack correlates with higher speed, for example, then we can expect traits from our model such as a positive slope, a small p-value and a regression line that tilts upwards.

## 
## Call:
## lm(formula = speed ~ attack, data = pokemon)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -64.451 -20.673  -1.769  18.475 108.231 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 41.65016    2.50646   16.62   <2e-16 ***
## attack       0.31705    0.02976   10.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.07 on 799 degrees of freedom
## Multiple R-squared:  0.1244, Adjusted R-squared:  0.1233 
## F-statistic: 113.5 on 1 and 799 DF,  p-value: < 2.2e-16

From our model, we can see an incredibly small slop, large p-value and a tiny R-squared. This means that Pokemon with high attack are not consistently fast, and that Pokemon with high speed are not consistently strong attackers. In the context of the game’s design, this makes a lot of sense. If Pokemon with high attack stats had consistently high speed stats, they would likely become the meta and make many other Pokemon nearly useless in battle. This may eventually make the game stale and even discourage players whose favorite Pokemon do not fit the meta. Instead, this means that the game designers can balance roles within the game. For example, Pokemon with high attack and low speed may also have higher defense or special defense to make them tanks that protect others on the team. Game balancing is tricky, but providing variance in stats so that Pokemon can fulfill different roles is a smart method for keeping gameplay fresh and players happy.