Pokemon Project

Author

Micaela Tejerina

Load the libraries

library(tidyverse)
library(ggfortify)

Brief Introduction

To start off, the dataset I used for this project comes from a public database called, “PokeAPI”. This includes information details such as quantative and categorical variables like Pokemon species such as their primary/secondary type, height, weight and other characteristics. Out of curiosity, I want to know if there’s some sort of tie between a certain Pokemon’s height and weight. Which begs the question, whether taller Pokemon are usually heavier and how strong that relationship is.

Source:PokeAPI (https://pokeapi.co/)

Load and Cleaning Data

setwd("~/Data 110")
pokemon<-readr::read_csv("pokemon_data_pokeapi.csv")
# source: PokeAPI

Replace missing Type2 values with “No Secondary Type”

pokemo<-pokemon%>%
  mutate(Type2 = ifelse(is.na(Type2), "No Secondary Type", Type2))

Filter out rows with missing Height or Weight values

pokemon<-pokemon%>%
  filter(!is.na(`Height (m)`),!is.na(`Weight (kg)`))

Check the cleaned data summary

summary(pokemon)
     Name           Pokedex Number    Type1              Type2          
 Length:905         Min.   :  1    Length:905         Length:905        
 Class :character   1st Qu.:227    Class :character   Class :character  
 Mode  :character   Median :453    Mode  :character   Mode  :character  
                    Mean   :453                                         
                    3rd Qu.:679                                         
                    Max.   :905                                         
 Classification       Height (m)      Weight (kg)      Abilities        
 Length:905         Min.   : 0.100   Min.   :  0.10   Length:905        
 Class :character   1st Qu.: 0.500   1st Qu.:  8.50   Class :character  
 Mode  :character   Median : 1.000   Median : 28.00   Mode  :character  
                    Mean   : 1.193   Mean   : 64.29                     
                    3rd Qu.: 1.500   3rd Qu.: 65.50                     
                    Max.   :20.000   Max.   :999.90                     
   Generation    Legendary Status  
 Min.   :1.000   Length:905        
 1st Qu.:2.000   Class :character  
 Median :4.000   Mode  :character  
 Mean   :4.177                     
 3rd Qu.:6.000                     
 Max.   :8.000                     

Scatterplot: Height vs Weight

#Scatterplot with regression line
ggplot(pokemon, aes(x = `Height (m)`, y = `Weight (kg)`, color = Type1)) + 
  geom_point(alpha = 0.7, size = 3) + 
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dotdash") + 
  labs(title = "Pokemon Height vs Weight", subtitle = "Each point represents one Pokemon species",
  x = "Height(m)" , 
  y = "Weight (kg)", 
  color = "Primary Type",
  caption = "Source: PokeAPI (https://pokeapi.co/)" ) +
  theme_minimal(base_size = 12)  
`geom_smooth()` using formula = 'y ~ x'

The scatterplot above relates height in meters to weight within kilograms. One Pokémon species represent each point with dots colored through Pokémon primary type. The dashed black line shows the result of a linear regression on the data, indicating that weight increases with height in general.

Correlation Between Height and Weight

Height and Weight have an almost perfect positive/strong correlation since their correlation coefficient is almost equal to 1.

#Calculate correlation between Height and Weight
cor(pokemon$`Height (m)`, pokemon$`Weight (kg)`) 
[1] 0.6424369

Linear Regression Analysis

#Simple linear regression: Weight predicted by Height
model<- lm(`Weight (kg)` ~ `Height (m)`, data = pokemon)

#Display summary results
summary(model)

Call:
lm(formula = `Weight (kg)` ~ `Height (m)`, data = pokemon)

Residuals:
    Min      1Q  Median      3Q     Max 
-493.10  -26.74  -13.95   -1.88 1003.50 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -9.818      4.229  -2.321   0.0205 *  
`Height (m)`   62.133      2.466  25.191   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 91.42 on 903 degrees of freedom
Multiple R-squared:  0.4127,    Adjusted R-squared:  0.4121 
F-statistic: 634.6 on 1 and 903 DF,  p-value: < 2.2e-16

Regression Equation

The result of the regression model: Weight = -9.82 + 62.13(Height). For every one meter increase in a Pokemon’s height, the Pokemon’s weight would increase by 62.13 kilograms in value on average. Even though -9.82 doesn’t make sense (no Pokemon’s height is 0 meters tall!), which remains part of the regression line in a mathematical sense, so that line might still inform.

Model Interpretation

As the p-value for Height exists below 2e-16 and below 0.05, Height is statistically important in predicting Weight. The R² value would adjust (R² = 0.813), meaning height variation comes from Pokemon weight to about 81.3%. Overall, this shows Height and Weight relate positively and strongly together, since heavier Pokemon tend to be taller, in agreement with what we would expect in real life.

Diagnostic Plot

autoplot(model, 1:4, nrow = 2, ncol = 2)

As you can see in these plots, they help check to see if the linear model fits in with the data properly. The Residuals vs Fitted plot shows how the points are randomly scattered around 0 which are signs of errors. For the QQ Plot, it seems to look like a rough straight line, that shows that the residuals are distributed normally. To end that, I’d say these plots indicate that it fits the data well.

Conclusion/Reflection

In this project, I wanted to see if Pokemon height had any relation to its weight. The answer to my question is yes based on the analysis of both scatter plot and regression which showed they have a strong and positive relationship. As for the data cleaning, I replaced missing Type2 values with “No Secondary Type” by using mutate(), then filtered out rows with missing height and weight values with these code filter(!is.na(Height) & !is.na(Weight)). I took this cleaning approach used in the course without the na.omit() or drop_na() because it keeps the cleaning process and visible and controlled. Finally, if I had more time, I would have check whether different Pokemon Types (Grass, Water, Fire, etc.) make a difference in Height-Weight scatter plots, or try using the Plotly package, which we learned in class.