Pokemon Project

Author

Micaela Tejerina

Load the libraries

library(tidyverse)
library(ggfortify)

Brief Introduction

To start off, the data I will be using for this project comes from a public database called, “PokeAPI”. That includes the names of several Pokemon species such as their primary/secondary type, and characteristics. Out of curiosity I want to know whether there’s some sort of tie between a certain Pokemon’s height and weight. Which begs the question, whether taller Pokemon are usually heavier and how strong that relationship is.

Source:PokeAPI (https://pokeapi.co/)

Load and Cleaning Data

setwd("~/Data 110")
pokemon<-readr::read_csv("pokemon_data_pokeapi.csv")
# source: PokeAPI

Replace missing Type2 values with “No Secondary Type”

pokemo<-pokemon%>%
  mutate(Type2 = ifelse(is.na(Type2), "No Secondary Type", Type2))

Filter out rows with missing Height or Weight values

pokemon<-pokemon%>%
  filter(!is.na(`Height (m)`),!is.na(`Weight (kg)`))

Check the cleaned data summary

summary(pokemon)
     Name           Pokedex Number    Type1              Type2          
 Length:905         Min.   :  1    Length:905         Length:905        
 Class :character   1st Qu.:227    Class :character   Class :character  
 Mode  :character   Median :453    Mode  :character   Mode  :character  
                    Mean   :453                                         
                    3rd Qu.:679                                         
                    Max.   :905                                         
 Classification       Height (m)      Weight (kg)      Abilities        
 Length:905         Min.   : 0.100   Min.   :  0.10   Length:905        
 Class :character   1st Qu.: 0.500   1st Qu.:  8.50   Class :character  
 Mode  :character   Median : 1.000   Median : 28.00   Mode  :character  
                    Mean   : 1.193   Mean   : 64.29                     
                    3rd Qu.: 1.500   3rd Qu.: 65.50                     
                    Max.   :20.000   Max.   :999.90                     
   Generation    Legendary Status  
 Min.   :1.000   Length:905        
 1st Qu.:2.000   Class :character  
 Median :4.000   Mode  :character  
 Mean   :4.177                     
 3rd Qu.:6.000                     
 Max.   :8.000                     

Scatterplot: Height vs Weight

#Scatterplot with regression line
ggplot(pokemon, aes(x = `Height (m)`, y = `Weight (kg)`, color = Type1)) + 
  geom_point(alpha = 0.7, size = 3) + 
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed") + 
  labs(title = "Pokemon Height vs Weight", subtitle = "Each point represents one Pokemon species",
  x = "Height(m)" , 
  y = "Weight (kg)", 
  color = "Primary Type",
  caption = "Source: PokeAPI (https://pokeapi.co/)" ) +
  theme_minimal(base_size = 12) 
`geom_smooth()` using formula = 'y ~ x'

Correlation Between Height and Weight

#Calculate correlation between Height and Weight
cor(pokemon$`Height (m)`, pokemon$`Weight (kg)`) 
[1] 0.6424369

Linear Regression Analysis

#Simple linear regression: Weight predicted by Height
model<- lm(`Weight (kg)` ~ `Height (m)`, data = pokemon)

#Display summary results
summary(model)

Call:
lm(formula = `Weight (kg)` ~ `Height (m)`, data = pokemon)

Residuals:
    Min      1Q  Median      3Q     Max 
-493.10  -26.74  -13.95   -1.88 1003.50 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -9.818      4.229  -2.321   0.0205 *  
`Height (m)`   62.133      2.466  25.191   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 91.42 on 903 degrees of freedom
Multiple R-squared:  0.4127,    Adjusted R-squared:  0.4121 
F-statistic: 634.6 on 1 and 903 DF,  p-value: < 2.2e-16

Regression Equation

The result of the regression is the equation: Weight = -9.82 + 62.13(Height). For 1 meter increase of a Pokemon’s height, the Pokemon’s weight would increase by 62.13 kilograms in value on average. Even though -9.82 intercepts nothing real (a Pokémon’s height cannot be 0!), it remains part of the regression line in a mathematical sense, so that line might still inform.

Model Interpretation

As the p-value for Height exists below 2e-16 and below 0.05, Height is statistically important in predicting Weight. The R² value would adjust (R² = 0.813), meaning height variation comes from Pokemon weight to about 81.3%. Overall, this shows Height and Weight relate positively and strongly together, since heavier Pokemon tend to be taller, in agreement with what we would expect in real life.

Diagnostic Plot

autoplot(model, 1:4, nrow = 2, ncol = 2)

As you can see in these plots, the Residuals vs Fitted plot should be able to show how the points are randomly scattered around 0 which is a sign of a good fit. For the QQ Plot, it seems to look like a rough straight line, which shows that the residuals are distributed normally. To end that, if both plots conditions are met then its safe to say that the linear regression model assumptions are reasonable.

Conclusion/Reflection

In this project, I wanted to see if Pokemon height had any relation to its weight. The answer to my question is yes based on the analysis of both scatter plot and regression which showed they have a strong and positive relationship. As for the data cleaning, I replaced missing Type2 values with “No Secondary Type” by mutate(), and filtered out rows with missing numeric values for Height and Weight by filter(!is.na(Height) & !is.na(Weight)). I took the cleaning approach used in the course without the na.omit() or drop_na() from within tidyverse. Finally, if I had more time, I would have check whether different Pokemon Types (Grass, Water, Fire, etc.) make a difference in Height-Weight scatter plots, or try using the Plotly package, which we learned in class.