Choosing Methods

We wanted to predict a Pokémon’s combat power after evolution (cp_new) using what is known before evolution. Because the value we are predicting is numeric, linear regression is the right method. We first fit a simple regression model with only pre-evolution combat power (cp). Then we fit a multiple regression model that adds other pre-evolution variables (hp, species, attack values, weight, and height). After that we checked the model assumptions to make sure the model was valid.

# Load the dataset
pokemon <- read.csv("https://www.openintro.org/data/csv/pokemon_go.csv")

# Keep only the variables we will use
pokemon_data <- pokemon[, c("species", "cp", "hp", "weight", "height",
                            "attack_weak_value", "attack_strong_value", "cp_new")]

Descriptive Statistics and Visuals

# Structure and basic summaries
str(pokemon)
## 'data.frame':    75 obs. of  27 variables:
##  $ name                   : chr  "Pidgey1" "Pidgey2" "Pidgey3" "Pidgey4" ...
##  $ species                : chr  "Pidgey" "Pidgey" "Pidgey" "Pidgey" ...
##  $ cp                     : int  384 366 353 338 242 129 10 25 24 161 ...
##  $ hp                     : int  56 54 55 51 45 35 10 14 13 35 ...
##  $ weight                 : num  2.31 1.67 1.94 1.73 1.44 2.07 0.92 2.72 2.07 1.45 ...
##  $ height                 : num  0.34 0.29 0.3 0.31 0.27 0.35 0.25 0.37 0.32 0.31 ...
##  $ power_up_stardust      : int  2500 2500 3000 3000 1900 800 200 200 200 1000 ...
##  $ power_up_candy         : int  2 2 3 3 2 1 1 1 1 1 ...
##  $ attack_weak            : chr  "Tackle" "Quick Attack" "Quick Attack" "Tackle" ...
##  $ attack_weak_type       : chr  "Normal" "Normal" "Normal" "Normal" ...
##  $ attack_weak_value      : int  12 10 10 12 10 10 12 12 10 12 ...
##  $ attack_strong          : chr  "Aerial Ace" "Twister" "Aerial Ace" "Air Cutter" ...
##  $ attack_strong_type     : chr  "Flying" "Dragon" "Flying" "Flying" ...
##  $ attack_strong_value    : int  30 25 30 30 30 30 30 25 25 25 ...
##  $ cp_new                 : int  694 669 659 640 457 243 15 47 47 305 ...
##  $ hp_new                 : int  84 81 83 79 69 52 13 21 21 54 ...
##  $ weight_new             : num  2.6 1.93 3.51 30 1.42 30 30 2.63 3.27 30 ...
##  $ height_new             : num  1.24 1.05 1.11 1.12 0.98 1.27 0.9 1.35 1.16 1.14 ...
##  $ power_up_stardust_new  : int  2500 2500 3000 3000 1900 800 200 200 200 1000 ...
##  $ power_up_candy_new     : int  2 2 3 3 2 1 1 1 1 1 ...
##  $ attack_weak_new        : chr  "Steel Wing" "Wing Attack" "Wing Attack" "Steel Wing" ...
##  $ attack_weak_type_new   : chr  "Steel" "Flying" "Flying" "Steel" ...
##  $ attack_weak_value_new  : int  15 9 9 15 9 9 9 15 9 15 ...
##  $ attack_strong_new      : chr  "Air Cutter" "Air Cutter" "Air Cutter" "Air Cutter" ...
##  $ attack_strong_type_new : chr  "Flying" "Flying" "Flying" "Flying" ...
##  $ attack_strong_value_new: int  30 30 30 30 25 30 30 30 25 30 ...
##  $ notes                  : chr  "" "" "" "" ...
summary(pokemon)
##      name             species                cp              hp       
##  Length:75          Length:75          Min.   : 10.0   Min.   :10.00  
##  Class :character   Class :character   1st Qu.: 94.0   1st Qu.:30.50  
##  Mode  :character   Mode  :character   Median :169.0   Median :43.00  
##                                        Mean   :197.2   Mean   :40.69  
##                                        3rd Qu.:250.5   3rd Qu.:50.00  
##                                        Max.   :619.0   Max.   :74.00  
##      weight           height       power_up_stardust power_up_candy 
##  Min.   : 0.780   Min.   :0.2000   Min.   : 200      Min.   :1.000  
##  1st Qu.: 1.720   1st Qu.:0.2800   1st Qu.: 800      1st Qu.:1.000  
##  Median : 2.240   Median :0.3000   Median :1600      Median :2.000  
##  Mean   : 2.750   Mean   :0.3017   Mean   :1616      Mean   :1.733  
##  3rd Qu.: 3.325   3rd Qu.:0.3200   3rd Qu.:2500      3rd Qu.:2.000  
##  Max.   :10.420   Max.   :0.3900   Max.   :3000      Max.   :3.000  
##  attack_weak        attack_weak_type   attack_weak_value attack_strong     
##  Length:75          Length:75          Min.   : 5.000    Length:75         
##  Class :character   Class :character   1st Qu.: 6.000    Class :character  
##  Mode  :character   Mode  :character   Median :10.000    Mode  :character  
##                                        Mean   : 9.293                      
##                                        3rd Qu.:12.000                      
##                                        Max.   :12.000                      
##  attack_strong_type attack_strong_value     cp_new           hp_new      
##  Length:75          Min.   :15.00       Min.   :  10.0   Min.   : 10.00  
##  Class :character   1st Qu.:15.00       1st Qu.: 138.5   1st Qu.: 43.50  
##  Mode  :character   Median :25.00       Median : 226.0   Median : 54.00  
##                     Mean   :24.47       Mean   : 356.5   Mean   : 56.39  
##                     3rd Qu.:30.00       3rd Qu.: 465.0   3rd Qu.: 68.50  
##                     Max.   :70.00       Max.   :1646.0   Max.   :165.00  
##    weight_new       height_new     power_up_stardust_new power_up_candy_new
##  Min.   : 0.020   Min.   :0.4400   Min.   : 200          Min.   :1.000     
##  1st Qu.: 2.450   1st Qu.:0.6750   1st Qu.: 800          1st Qu.:1.000     
##  Median : 3.880   Median :0.9500   Median :1600          Median :2.000     
##  Mean   : 7.931   Mean   :0.9017   Mean   :1616          Mean   :1.733     
##  3rd Qu.: 6.310   3rd Qu.:1.1200   3rd Qu.:2500          3rd Qu.:2.000     
##  Max.   :30.000   Max.   :1.3500   Max.   :3000          Max.   :3.000     
##  attack_weak_new    attack_weak_type_new attack_weak_value_new
##  Length:75          Length:75            Min.   : 5.00        
##  Class :character   Class :character     1st Qu.: 6.00        
##  Mode  :character   Mode  :character     Median : 9.00        
##                                          Mean   :10.01        
##                                          3rd Qu.:15.00        
##                                          Max.   :15.00        
##  attack_strong_new  attack_strong_type_new attack_strong_value_new
##  Length:75          Length:75              Min.   : 15.00         
##  Class :character   Class :character       1st Qu.: 15.00         
##  Mode  :character   Mode  :character       Median : 25.00         
##                                            Mean   : 26.33         
##                                            3rd Qu.: 30.00         
##                                            Max.   :100.00         
##     notes          
##  Length:75         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
# Check for missing values
colSums(is.na(pokemon))
##                    name                 species                      cp 
##                       0                       0                       0 
##                      hp                  weight                  height 
##                       0                       0                       0 
##       power_up_stardust          power_up_candy             attack_weak 
##                       0                       0                       0 
##        attack_weak_type       attack_weak_value           attack_strong 
##                       0                       0                       0 
##      attack_strong_type     attack_strong_value                  cp_new 
##                       0                       0                       0 
##                  hp_new              weight_new              height_new 
##                       0                       0                       0 
##   power_up_stardust_new      power_up_candy_new         attack_weak_new 
##                       0                       0                       0 
##    attack_weak_type_new   attack_weak_value_new       attack_strong_new 
##                       0                       0                       0 
##  attack_strong_type_new attack_strong_value_new                   notes 
##                       0                       0                       0
# Mean and SD of post-evolution CP by species
aggregate(cp_new ~ species, data = pokemon_data, FUN = mean)
##    species    cp_new
## 1 Caterpie  142.1000
## 2    Eevee 1375.1667
## 3   Pidgey  366.3077
## 4   Weedle  139.1000
aggregate(cp_new ~ species, data = pokemon_data, FUN = sd)
##    species    cp_new
## 1 Caterpie  75.51814
## 2    Eevee 217.36183
## 3   Pidgey 210.50893
## 4   Weedle  81.37237
# Pairwise plots for main numeric variables
pairs(pokemon_data[, c("cp", "hp", "weight", "height",
                       "attack_weak_value", "attack_strong_value", "cp_new")],
      main = "Relationships Between Key Variables")

# Boxplot to compare evolved CP across species
boxplot(cp_new ~ species, data = pokemon_data,
        main = "Post-evolution CP by Species",
        xlab = "Species", ylab = "Post-evolution CP")

Regression / Association

# Simple linear regression: cp_new predicted by cp
model_simple <- lm(cp_new ~ cp, data = pokemon_data)
summary(model_simple)
## 
## Call:
## lm(formula = cp_new ~ cp, data = pokemon_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -214.444  -64.516    1.893   64.961  271.386 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -119.34551   19.81529  -6.023 6.35e-08 ***
## cp             2.41351    0.08127  29.698  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 101 on 73 degrees of freedom
## Multiple R-squared:  0.9236, Adjusted R-squared:  0.9225 
## F-statistic:   882 on 1 and 73 DF,  p-value: < 2.2e-16
# Scatterplot + regression line (must be together in one chunk)
plot(pokemon_data$cp, pokemon_data$cp_new,
     main = "Pre-evolution CP vs Post-evolution CP",
     xlab = "Pre-evolution CP",
     ylab = "Post-evolution CP",
     pch = 19, col = "darkblue")
abline(model_simple, col = "red", lwd = 2)

# Multiple regression including other predictors we listed in the proposal
model_multiple <- lm(cp_new ~ cp + hp + species +
                       attack_weak_value + attack_strong_value +
                       weight + height,
                     data = pokemon_data)
summary(model_multiple)
## 
## Call:
## lm(formula = cp_new ~ cp + hp + species + attack_weak_value + 
##     attack_strong_value + weight + height, data = pokemon_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -127.008  -21.934    7.178   23.632   90.068 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         146.8432    63.2187   2.323 0.023332 *  
## cp                    2.1472     0.1114  19.271  < 2e-16 ***
## hp                   -3.2811     0.8010  -4.096 0.000119 ***
## speciesEevee        691.6379    57.1247  12.108  < 2e-16 ***
## speciesPidgey       150.7572    24.8301   6.072 7.29e-08 ***
## speciesWeedle         2.1916    17.0968   0.128 0.898398    
## attack_weak_value    -1.8753     2.9031  -0.646 0.520565    
## attack_strong_value  -6.2783     0.9177  -6.841 3.30e-09 ***
## weight              -14.5025     8.9700  -1.617 0.110770    
## height              -11.7565   232.8217  -0.050 0.959882    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37.4 on 65 degrees of freedom
## Multiple R-squared:  0.9907, Adjusted R-squared:  0.9894 
## F-statistic: 765.8 on 9 and 65 DF,  p-value: < 2.2e-16

Model Diagnostics and Comparison

# Check regression assumptions
par(mfrow = c(2, 2))
plot(model_multiple)

par(mfrow = c(1, 1))
# Compare the simple and multiple models
adj_r2_simple <- summary(model_simple)$adj.r.squared
adj_r2_multiple <- summary(model_multiple)$adj.r.squared

adj_r2_simple
## [1] 0.9225108
adj_r2_multiple
## [1] 0.9893633

Summary of Findings

The descriptive statistics and plots showed that Pokémon with higher combat power before evolution usually ended up with higher combat power after evolution. The simple linear regression confirmed this, with an adjusted R-squared a little above 0.92, which means pre-evolution CP by itself explains most of the change in CP. When we added hp, species, the two attack values, weight, and height, the multiple regression model improved and the adjusted R-squared increased to about 0.99. The diagnostic plots for the multiple regression did not show major problems, so the model assumptions were met. This means regression was an appropriate method, the visuals supported the research question, and the final model predicts evolved CP very well.