library(tidyverse)
library(ggfortify)Pokemon Project
Load the libraries
Brief Introduction
To start off, the dataset I used for this project comes from a public database called, “PokeAPI”. This includes information details such as quantative and categorical variables like Pokemon species such as their primary/secondary type, height, weight and other characteristics. Out of curiosity, I want to know if there’s some sort of tie between a certain Pokemon’s height and weight. Which begs the question, whether taller Pokemon are usually heavier and how strong that relationship is.
Source:PokeAPI (https://pokeapi.co/)
Load and Cleaning Data
setwd("~/Data 110")
pokemon<-readr::read_csv("pokemon_data_pokeapi.csv")
# source: PokeAPIReplace missing Type2 values with “No Secondary Type”
pokemo<-pokemon%>%
mutate(Type2 = ifelse(is.na(Type2), "No Secondary Type", Type2))Filter out rows with missing Height or Weight values
pokemon<-pokemon%>%
filter(!is.na(`Height (m)`),!is.na(`Weight (kg)`))Check the cleaned data summary
summary(pokemon) Name Pokedex Number Type1 Type2
Length:905 Min. : 1 Length:905 Length:905
Class :character 1st Qu.:227 Class :character Class :character
Mode :character Median :453 Mode :character Mode :character
Mean :453
3rd Qu.:679
Max. :905
Classification Height (m) Weight (kg) Abilities
Length:905 Min. : 0.100 Min. : 0.10 Length:905
Class :character 1st Qu.: 0.500 1st Qu.: 8.50 Class :character
Mode :character Median : 1.000 Median : 28.00 Mode :character
Mean : 1.193 Mean : 64.29
3rd Qu.: 1.500 3rd Qu.: 65.50
Max. :20.000 Max. :999.90
Generation Legendary Status
Min. :1.000 Length:905
1st Qu.:2.000 Class :character
Median :4.000 Mode :character
Mean :4.177
3rd Qu.:6.000
Max. :8.000
Scatterplot: Height vs Weight
#Scatterplot with regression line
ggplot(pokemon, aes(x = `Height (m)`, y = `Weight (kg)`, color = Type1)) +
geom_point(alpha = 0.7, size = 3) +
geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dotdash") +
labs(title = "Pokemon Height vs Weight", subtitle = "Each point represents one Pokemon species",
x = "Height(m)" ,
y = "Weight (kg)",
color = "Primary Type",
caption = "Source: PokeAPI (https://pokeapi.co/)" ) +
theme_minimal(base_size = 12) `geom_smooth()` using formula = 'y ~ x'
The scatterplot above relates height in meters to weight within kilograms. One Pokémon species represent each point with dots colored through Pokémon primary type. The dashed black line shows the result of a linear regression on the data, indicating that weight increases with height in general.
Correlation Between Height and Weight
Height and Weight have an almost perfect positive/strong correlation since their correlation coefficient is almost equal to 1.
#Calculate correlation between Height and Weight
cor(pokemon$`Height (m)`, pokemon$`Weight (kg)`) [1] 0.6424369
Linear Regression Analysis
#Simple linear regression: Weight predicted by Height
model<- lm(`Weight (kg)` ~ `Height (m)`, data = pokemon)
#Display summary results
summary(model)
Call:
lm(formula = `Weight (kg)` ~ `Height (m)`, data = pokemon)
Residuals:
Min 1Q Median 3Q Max
-493.10 -26.74 -13.95 -1.88 1003.50
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.818 4.229 -2.321 0.0205 *
`Height (m)` 62.133 2.466 25.191 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 91.42 on 903 degrees of freedom
Multiple R-squared: 0.4127, Adjusted R-squared: 0.4121
F-statistic: 634.6 on 1 and 903 DF, p-value: < 2.2e-16
Regression Equation
The result of the regression model: Weight = -9.82 + 62.13(Height). For every one meter increase in a Pokemon’s height, the Pokemon’s weight would increase by 62.13 kilograms in value on average. Even though -9.82 doesn’t make sense (no Pokemon’s height is 0 meters tall!), which remains part of the regression line in a mathematical sense, so that line might still inform.
Model Interpretation
As the p-value for Height exists below 2e-16 and below 0.05, Height is statistically important in predicting Weight. The R² value would adjust (R² = 0.813), meaning height variation comes from Pokemon weight to about 81.3%. Overall, this shows Height and Weight relate positively and strongly together, since heavier Pokemon tend to be taller, in agreement with what we would expect in real life.
Diagnostic Plot
autoplot(model, 1:4, nrow = 2, ncol = 2)As you can see in these plots, they help check to see if the linear model fits in with the data properly. The Residuals vs Fitted plot shows how the points are randomly scattered around 0 which are signs of errors. For the QQ Plot, it seems to look like a rough straight line, that shows that the residuals are distributed normally. To end that, I’d say these plots indicate that it fits the data well.
Conclusion/Reflection
In this project, I wanted to see if Pokemon height had any relation to its weight. The answer to my question is yes based on the analysis of both scatter plot and regression which showed they have a strong and positive relationship. As for the data cleaning, I replaced missing Type2 values with “No Secondary Type” by using mutate(), then filtered out rows with missing height and weight values with these code filter(!is.na(Height) & !is.na(Weight)). I took this cleaning approach used in the course without the na.omit() or drop_na() because it keeps the cleaning process and visible and controlled. Finally, if I had more time, I would have check whether different Pokemon Types (Grass, Water, Fire, etc.) make a difference in Height-Weight scatter plots, or try using the Plotly package, which we learned in class.