We wanted to predict a Pokémon’s combat power after evolution
(cp_new) using what is known before evolution. Because the
value we are predicting is numeric, linear regression is the right
method. We first fit a simple regression model with only pre-evolution
combat power (cp). Then we fit a multiple regression model
that adds other pre-evolution variables (hp, species, attack values,
weight, and height). After that we checked the model assumptions to make
sure the model was valid.
# Load the dataset
pokemon <- read.csv("https://www.openintro.org/data/csv/pokemon_go.csv")
# Keep only the variables we will use
pokemon_data <- pokemon[, c("species", "cp", "hp", "weight", "height",
"attack_weak_value", "attack_strong_value", "cp_new")]
# Structure and basic summaries
str(pokemon)
## 'data.frame': 75 obs. of 27 variables:
## $ name : chr "Pidgey1" "Pidgey2" "Pidgey3" "Pidgey4" ...
## $ species : chr "Pidgey" "Pidgey" "Pidgey" "Pidgey" ...
## $ cp : int 384 366 353 338 242 129 10 25 24 161 ...
## $ hp : int 56 54 55 51 45 35 10 14 13 35 ...
## $ weight : num 2.31 1.67 1.94 1.73 1.44 2.07 0.92 2.72 2.07 1.45 ...
## $ height : num 0.34 0.29 0.3 0.31 0.27 0.35 0.25 0.37 0.32 0.31 ...
## $ power_up_stardust : int 2500 2500 3000 3000 1900 800 200 200 200 1000 ...
## $ power_up_candy : int 2 2 3 3 2 1 1 1 1 1 ...
## $ attack_weak : chr "Tackle" "Quick Attack" "Quick Attack" "Tackle" ...
## $ attack_weak_type : chr "Normal" "Normal" "Normal" "Normal" ...
## $ attack_weak_value : int 12 10 10 12 10 10 12 12 10 12 ...
## $ attack_strong : chr "Aerial Ace" "Twister" "Aerial Ace" "Air Cutter" ...
## $ attack_strong_type : chr "Flying" "Dragon" "Flying" "Flying" ...
## $ attack_strong_value : int 30 25 30 30 30 30 30 25 25 25 ...
## $ cp_new : int 694 669 659 640 457 243 15 47 47 305 ...
## $ hp_new : int 84 81 83 79 69 52 13 21 21 54 ...
## $ weight_new : num 2.6 1.93 3.51 30 1.42 30 30 2.63 3.27 30 ...
## $ height_new : num 1.24 1.05 1.11 1.12 0.98 1.27 0.9 1.35 1.16 1.14 ...
## $ power_up_stardust_new : int 2500 2500 3000 3000 1900 800 200 200 200 1000 ...
## $ power_up_candy_new : int 2 2 3 3 2 1 1 1 1 1 ...
## $ attack_weak_new : chr "Steel Wing" "Wing Attack" "Wing Attack" "Steel Wing" ...
## $ attack_weak_type_new : chr "Steel" "Flying" "Flying" "Steel" ...
## $ attack_weak_value_new : int 15 9 9 15 9 9 9 15 9 15 ...
## $ attack_strong_new : chr "Air Cutter" "Air Cutter" "Air Cutter" "Air Cutter" ...
## $ attack_strong_type_new : chr "Flying" "Flying" "Flying" "Flying" ...
## $ attack_strong_value_new: int 30 30 30 30 25 30 30 30 25 30 ...
## $ notes : chr "" "" "" "" ...
summary(pokemon)
## name species cp hp
## Length:75 Length:75 Min. : 10.0 Min. :10.00
## Class :character Class :character 1st Qu.: 94.0 1st Qu.:30.50
## Mode :character Mode :character Median :169.0 Median :43.00
## Mean :197.2 Mean :40.69
## 3rd Qu.:250.5 3rd Qu.:50.00
## Max. :619.0 Max. :74.00
## weight height power_up_stardust power_up_candy
## Min. : 0.780 Min. :0.2000 Min. : 200 Min. :1.000
## 1st Qu.: 1.720 1st Qu.:0.2800 1st Qu.: 800 1st Qu.:1.000
## Median : 2.240 Median :0.3000 Median :1600 Median :2.000
## Mean : 2.750 Mean :0.3017 Mean :1616 Mean :1.733
## 3rd Qu.: 3.325 3rd Qu.:0.3200 3rd Qu.:2500 3rd Qu.:2.000
## Max. :10.420 Max. :0.3900 Max. :3000 Max. :3.000
## attack_weak attack_weak_type attack_weak_value attack_strong
## Length:75 Length:75 Min. : 5.000 Length:75
## Class :character Class :character 1st Qu.: 6.000 Class :character
## Mode :character Mode :character Median :10.000 Mode :character
## Mean : 9.293
## 3rd Qu.:12.000
## Max. :12.000
## attack_strong_type attack_strong_value cp_new hp_new
## Length:75 Min. :15.00 Min. : 10.0 Min. : 10.00
## Class :character 1st Qu.:15.00 1st Qu.: 138.5 1st Qu.: 43.50
## Mode :character Median :25.00 Median : 226.0 Median : 54.00
## Mean :24.47 Mean : 356.5 Mean : 56.39
## 3rd Qu.:30.00 3rd Qu.: 465.0 3rd Qu.: 68.50
## Max. :70.00 Max. :1646.0 Max. :165.00
## weight_new height_new power_up_stardust_new power_up_candy_new
## Min. : 0.020 Min. :0.4400 Min. : 200 Min. :1.000
## 1st Qu.: 2.450 1st Qu.:0.6750 1st Qu.: 800 1st Qu.:1.000
## Median : 3.880 Median :0.9500 Median :1600 Median :2.000
## Mean : 7.931 Mean :0.9017 Mean :1616 Mean :1.733
## 3rd Qu.: 6.310 3rd Qu.:1.1200 3rd Qu.:2500 3rd Qu.:2.000
## Max. :30.000 Max. :1.3500 Max. :3000 Max. :3.000
## attack_weak_new attack_weak_type_new attack_weak_value_new
## Length:75 Length:75 Min. : 5.00
## Class :character Class :character 1st Qu.: 6.00
## Mode :character Mode :character Median : 9.00
## Mean :10.01
## 3rd Qu.:15.00
## Max. :15.00
## attack_strong_new attack_strong_type_new attack_strong_value_new
## Length:75 Length:75 Min. : 15.00
## Class :character Class :character 1st Qu.: 15.00
## Mode :character Mode :character Median : 25.00
## Mean : 26.33
## 3rd Qu.: 30.00
## Max. :100.00
## notes
## Length:75
## Class :character
## Mode :character
##
##
##
# Check for missing values
colSums(is.na(pokemon))
## name species cp
## 0 0 0
## hp weight height
## 0 0 0
## power_up_stardust power_up_candy attack_weak
## 0 0 0
## attack_weak_type attack_weak_value attack_strong
## 0 0 0
## attack_strong_type attack_strong_value cp_new
## 0 0 0
## hp_new weight_new height_new
## 0 0 0
## power_up_stardust_new power_up_candy_new attack_weak_new
## 0 0 0
## attack_weak_type_new attack_weak_value_new attack_strong_new
## 0 0 0
## attack_strong_type_new attack_strong_value_new notes
## 0 0 0
# Mean and SD of post-evolution CP by species
aggregate(cp_new ~ species, data = pokemon_data, FUN = mean)
## species cp_new
## 1 Caterpie 142.1000
## 2 Eevee 1375.1667
## 3 Pidgey 366.3077
## 4 Weedle 139.1000
aggregate(cp_new ~ species, data = pokemon_data, FUN = sd)
## species cp_new
## 1 Caterpie 75.51814
## 2 Eevee 217.36183
## 3 Pidgey 210.50893
## 4 Weedle 81.37237
# Pairwise plots for main numeric variables
pairs(pokemon_data[, c("cp", "hp", "weight", "height",
"attack_weak_value", "attack_strong_value", "cp_new")],
main = "Relationships Between Key Variables")
# Boxplot to compare evolved CP across species
boxplot(cp_new ~ species, data = pokemon_data,
main = "Post-evolution CP by Species",
xlab = "Species", ylab = "Post-evolution CP")
# Simple linear regression: cp_new predicted by cp
model_simple <- lm(cp_new ~ cp, data = pokemon_data)
summary(model_simple)
##
## Call:
## lm(formula = cp_new ~ cp, data = pokemon_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -214.444 -64.516 1.893 64.961 271.386
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -119.34551 19.81529 -6.023 6.35e-08 ***
## cp 2.41351 0.08127 29.698 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 101 on 73 degrees of freedom
## Multiple R-squared: 0.9236, Adjusted R-squared: 0.9225
## F-statistic: 882 on 1 and 73 DF, p-value: < 2.2e-16
# Scatterplot + regression line (must be together in one chunk)
plot(pokemon_data$cp, pokemon_data$cp_new,
main = "Pre-evolution CP vs Post-evolution CP",
xlab = "Pre-evolution CP",
ylab = "Post-evolution CP",
pch = 19, col = "darkblue")
abline(model_simple, col = "red", lwd = 2)
# Multiple regression including other predictors we listed in the proposal
model_multiple <- lm(cp_new ~ cp + hp + species +
attack_weak_value + attack_strong_value +
weight + height,
data = pokemon_data)
summary(model_multiple)
##
## Call:
## lm(formula = cp_new ~ cp + hp + species + attack_weak_value +
## attack_strong_value + weight + height, data = pokemon_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -127.008 -21.934 7.178 23.632 90.068
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 146.8432 63.2187 2.323 0.023332 *
## cp 2.1472 0.1114 19.271 < 2e-16 ***
## hp -3.2811 0.8010 -4.096 0.000119 ***
## speciesEevee 691.6379 57.1247 12.108 < 2e-16 ***
## speciesPidgey 150.7572 24.8301 6.072 7.29e-08 ***
## speciesWeedle 2.1916 17.0968 0.128 0.898398
## attack_weak_value -1.8753 2.9031 -0.646 0.520565
## attack_strong_value -6.2783 0.9177 -6.841 3.30e-09 ***
## weight -14.5025 8.9700 -1.617 0.110770
## height -11.7565 232.8217 -0.050 0.959882
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 37.4 on 65 degrees of freedom
## Multiple R-squared: 0.9907, Adjusted R-squared: 0.9894
## F-statistic: 765.8 on 9 and 65 DF, p-value: < 2.2e-16
# Check regression assumptions
par(mfrow = c(2, 2))
plot(model_multiple)
par(mfrow = c(1, 1))
# Compare the simple and multiple models
adj_r2_simple <- summary(model_simple)$adj.r.squared
adj_r2_multiple <- summary(model_multiple)$adj.r.squared
adj_r2_simple
## [1] 0.9225108
adj_r2_multiple
## [1] 0.9893633
The descriptive statistics and plots showed that Pokémon with higher combat power before evolution usually ended up with higher combat power after evolution. The simple linear regression confirmed this, with an adjusted R-squared a little above 0.92, which means pre-evolution CP by itself explains most of the change in CP. When we added hp, species, the two attack values, weight, and height, the multiple regression model improved and the adjusted R-squared increased to about 0.99. The diagnostic plots for the multiple regression did not show major problems, so the model assumptions were met. This means regression was an appropriate method, the visuals supported the research question, and the final model predicts evolved CP very well.