# Load necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Read the dataset
pokemon_data <- read.csv("PokemonStats.csv")
# Here are three sets of variable combinations I propose:
# 1. Combat Readiness:
#Response Variable: Total (as a measure of overall combat readiness)
#Explanatory Variables: HP (health points), Attack, Defense, SpAtk (special attack), SpDef (special defense), Speed
#New Variable: Combat Efficiency (a combination of Attack, Defense, and Speed)
#2. Size Comparison:
#Response Variable: Weight
#Explanatory Variables: Height
#New Variable: Size Index (a combination of Height and Weight)
#3. Special vs. Regular Combat:
#Response Variable: SpAtk (special attack)
#Explanatory Variables: Attack, Defense
#New Variable: Special Dominance (a ratio of Special Attack to Regular Attack)
# Creating new variables
pokemon_data$Combat_Efficiency <- (pokemon_data$Attack + pokemon_data$Defense + pokemon_data$Speed) / 3
pokemon_data$Size_Index <- pokemon_data$Height * pokemon_data$Weight
pokemon_data$Special_Dominance <- pokemon_data$SpAtk / ifelse(pokemon_data$Attack == 0, 1, pokemon_data$Attack)
# Displaying the dataset with the new variables
head(pokemon_data[, c("Total", "HP", "Attack", "Defense", "SpAtk", "SpDef", "Speed", "Combat_Efficiency", "Height", "Weight", "Size_Index", "Special_Dominance")])
## Total HP Attack Defense SpAtk SpDef Speed Combat_Efficiency Height Weight
## 1 318 45 49 49 65 65 45 47.66667 0.7 6.9
## 2 405 60 62 63 80 80 60 61.66667 1.0 13.0
## 3 525 80 82 83 100 100 80 81.66667 2.0 100.0
## 4 625 80 100 123 122 120 80 101.00000 2.4 155.5
## 5 309 39 52 43 60 50 65 53.33333 0.6 8.5
## 6 405 58 64 58 80 65 80 67.33333 1.1 19.0
## Size_Index Special_Dominance
## 1 4.83 1.326531
## 2 13.00 1.290323
## 3 200.00 1.219512
## 4 373.20 1.220000
## 5 5.10 1.153846
## 6 20.90 1.250000
# Visualizations for Combat Readiness
combat_vars <- c("HP", "Attack", "Defense", "SpAtk", "SpDef", "Speed", "Combat_Efficiency")
par(mfrow=c(1,2))
for (var in combat_vars) {
plot(pokemon_data[[var]], pokemon_data$Total, xlab=var, ylab="Total", main=paste("Total vs", var))
}




#Observations:
#All variables (HP, Attack, Defense, SpAtk, SpDef, Speed) have a positive correlation with the "Total" value. This is expected since "Total" is the sum of these individual stats.
#The newly created "Combat Efficiency" variable also demonstrates a strong positive correlation with "Total." This variable, being an average of Attack, Defense, and Speed, provides a consolidated view of a Pokémon's combat capabilities.
#Insights and Significance:
#A Pokémon's overall combat readiness ("Total") can be effectively gauged by its individual stats. The higher the individual stats, the higher the overall combat readiness.
#The "Combat Efficiency" variable provides a summarized metric to assess the combat readiness of a Pokémon without considering its health or special abilities.
#Further Questions:
#How do Pokémon of different primary types (Type1) compare in terms of Combat Efficiency?
#Are there specific Pokémon types that excel in one stat but perform poorly in others?
# We'll now calculate the correlation coefficients for these relationships
# Calculating correlation coefficients for the Combat Readiness set
correlations <- cor(pokemon_data[, c("Total", combat_vars)])
correlations_combat_readiness <- correlations["Total", -1] # Excluding 'Total' vs 'Total' correlation
correlations_combat_readiness
## HP Attack Defense SpAtk
## 0.6552852 0.7322339 0.6360577 0.7204245
## SpDef Speed Combat_Efficiency
## 0.7195645 0.5607774 0.8885669
#Interpretation:
#All explanatory variables have a positive correlation with "Total", as observed in the scatter plots.
#"Combat Efficiency" has the strongest correlation with "Total", indicating that it's a robust metric to gauge a Pokémon's overall combat readiness.
#"Speed" has the weakest correlation, implying that while it contributes to a Pokémon's overall strength, it's less impactful than other stats like Attack or Defense.
#Based on the visualizations:
#The correlation values make sense, especially the strong correlation of "Combat Efficiency" with "Total". This is because "Combat Efficiency" is derived from several stats that directly contribute to "Total".
# Visualizations for the "Size Comparison" set, showcasing the relationship between "Weight" (response variable) and "Height" & "Size Index" (explanatory variables).
# Visualizations for Size Comparison
plot(pokemon_data$Height, pokemon_data$Weight, xlab="Height", ylab="Weight", main="Weight vs Height")

plot(pokemon_data$Size_Index, pokemon_data$Weight, xlab="Size Index", ylab="Weight", main="Weight vs Size Index")

# Observations:
#There's a clear positive correlation between "Height" and "Weight". As Pokémon get taller, they generally also get heavier.
#The "Size Index", which is a product of Height and Weight, also has a strong positive correlation with "Weight". This is expected since the "Size Index" is directly derived from "Weight".
#Insights and Significance:
#The physical stature of a Pokémon, both in terms of height and the derived "Size Index", is a strong determinant of its weight.
#Pokémon that are taller are also generally heavier, which can have implications in combat scenarios where size and weight might play a role.
#Further Questions:
#How do different Pokémon types vary in terms of size and weight? Are certain types generally larger or smaller than others?
#Is there a specific range of heights or weights where Pokémon types diversify the most?
# Defining the explanatory variables for Size Comparison set
explanatory_vars_size <- c("Height", "Size_Index")
# Removing rows with NA values in the relevant columns
pokemon_data_clean <- pokemon_data[complete.cases(pokemon_data[, c("Weight", explanatory_vars_size)]), ]
# Calculating correlation coefficients
correlations_size_comparison <- cor(pokemon_data_clean[, c("Weight", explanatory_vars_size)])
correlations_size <- correlations_size_comparison["Weight", -1] # Excluding 'Weight' vs 'Weight' correlation
correlations_size
## Height Size_Index
## 0.6459702 0.7395989
# Interpretation:
# Both "Height" and "Size Index" have a positive correlation with "Weight", which was evident in the scatter plots.
# The "Size Index" has a stronger correlation with "Weight" compared to just "Height". This makes sense since "Size Index" is a combination of both height and weight, emphasizing the relationship.
# Visualizations for the "Special vs. Regular Combat" set, showcasing the relationship between "SpAtk" (response variable) and "Attack" & "Defense" & "Special Dominance" (explanatory variables)
# Visualizations for Special vs. Regular Combat
combat_vars2 <- c("Attack", "Defense", "Special_Dominance")
par(mfrow=c(1,3))
for (var in combat_vars2) {
plot(pokemon_data[[var]], pokemon_data$SpAtk, xlab=var, ylab="SpAtk", main=paste("SpAtk vs", var))
}

#Observations:
#There's a positive correlation between "Attack" and "SpAtk". Pokémon with higher regular attack values also tend to have higher special attack values.
#The relationship between "Defense" and "SpAtk" seems weaker but still positive. Pokémon with higher defense might have slightly higher special attack values.
#"Special Dominance", which represents the ratio of Special Attack to Regular Attack, showcases a mix of relationships. As expected, for Pokémon where this value is higher, "SpAtk" values are generally higher.
#Insights and Significance:
#A Pokémon's special attack capability is influenced by its regular attack and defense stats. Those with strong regular attacks also tend to excel in special attacks.
#"Special Dominance" provides an interesting perspective, highlighting Pokémon that lean more towards special attacks relative to their regular attack capabilities.
#Further Questions:
#Are there specific Pokémon types that have a higher "Special Dominance" value, indicating a preference for special attacks?
#How does the "Special Dominance" value impact other combat stats like defense or speed?
# Defining the explanatory variables for Special vs. Regular Combat set
explanatory_vars_combat <- c("Attack", "Defense", "Special_Dominance")
# Calculating correlation coefficients for the Special vs. Regular Combat set
correlations_combat <- cor(pokemon_data[, c("SpAtk", explanatory_vars_combat)])
correlations_special_combat <- correlations_combat["SpAtk", -1] # Excluding 'SpAtk' vs 'SpAtk' correlation
correlations_special_combat
## Attack Defense Special_Dominance
## 0.3357883 0.2242839 0.4769087
#Interpretation:
#Both "Attack" and "Defense" have a positive correlation with "SpAtk", though the correlations are modest. This indicates that while there's a relationship, other factors might influence a Pokémon's special attack capabilities.
#The "Special Dominance" variable, as expected, has a more pronounced correlation with "SpAtk". This suggests that the ratio of Special Attack to Regular Attack is a good indicator of a Pokémon's special attack prowess.
#The correlation values align with our observations. While "Attack" and "Defense" show positive trends with "SpAtk", it's the "Special Dominance" that provides a more distinct relationship.
# Calculating confidence intervals for response variables
ci_total <- t.test(pokemon_data$Total)$conf.int
ci_weight <- t.test(pokemon_data$Weight, na.rm=TRUE)$conf.int
ci_spAtk <- t.test(pokemon_data$SpAtk)$conf.int
ci_total
## [1] 434.3358 448.0780
## attr(,"conf.level")
## [1] 0.95
ci_weight
## [1] 65.71203 80.84136
## attr(,"conf.level")
## [1] 0.95
ci_spAtk
## [1] 71.02427 74.73788
## attr(,"conf.level")
## [1] 0.95
# Creating a dataframe for visualization
ci_data <- data.frame(
Variable = c("Total", "Weight", "SpAtk"),
Mean = c(mean(pokemon_data$Total), mean(pokemon_data$Weight, na.rm=TRUE), mean(pokemon_data$SpAtk)),
Lower_Bound = c(ci_total[1], ci_weight[1], ci_spAtk[1]),
Upper_Bound = c(ci_total[2], ci_weight[2], ci_spAtk[2])
)
# Visualizing confidence intervals using ggplot2
ggplot(ci_data, aes(x = Variable, y = Mean, ymin = Lower_Bound, ymax = Upper_Bound)) +
geom_pointrange(color = "blue") +
labs(title = "95% Confidence Intervals for Response Variables", y = "Value") +
theme_minimal()

#Interpretation:
#Total: We are 95% confident that the average "Total" value for the entire population of Pokémon (if we had access to data for all Pokémon ever created) would fall between 434.34 and 448.08.
#SpAtk: We are 95% confident that the average "SpAtk" value for the entire Pokémon population would fall between 71.02 and 74.74.
#Insights and Significance:
#The confidence intervals provide us with a range in which we expect the true population mean to lie, based on our sample data. For instance, even though our sample might give an average "Total" value, the true average for all Pokémon might be slightly different. The confidence interval captures this uncertainty.
#Further Questions:
#What factors or attributes might cause deviations in the "Total" and "SpAtk" values outside these confidence intervals?
#CONCLUSIONS:
#Combat Readiness:
#All individual stats (HP, Attack, Defense, SpAtk, SpDef, Speed) contribute positively to a Pokémon's overall combat readiness, represented by the "Total" value.
#Our "Combat Efficiency" metric, an average of Attack, Defense, and Speed, provides a strong indicator of a Pokémon's combat capabilities.
#Size Comparison:
#Height and "Size Index" (product of Height and Weight) are strong determinants of a Pokémon's weight.
#Generally, taller Pokémon tend to be heavier.
#Special vs. Regular Combat:
#A Pokémon's special attack capabilities ("SpAtk") are influenced by its regular attack and defense stats.
#The "Special Dominance" metric (ratio of Special Attack to Regular Attack) provides an indicator of a Pokémon's tendency to rely on special attacks.
#Confidence Intervals:
#For "Total", the average value for the entire population of Pokémon is likely to be between 434.34 and 448.08.
#For "SpAtk", the average value for the entire population is likely to be between 71.02 and 74.74.
#Further Investigations:
#Type Analysis: It would be insightful to understand how different Pokémon types (e.g., Water, Fire, Grass) vary in terms of combat stats, size, and special vs. regular combat tendencies.
#Outliers: As with any dataset, outliers can provide intriguing insights. Identifying and analyzing Pokémon with stats that deviate significantly from the average could uncover unique characteristics or rarities.
#Weight Data Limitations: We need to explore why there were potential NaN values in the "Weight" data and determine if these can be addressed or if there are specific Pokémon types or categories that lack weight data.
#Further Statistical Analysis: Advanced statistical techniques, like regression analysis, could help in understanding the predictive power of various stats on a Pokémon's overall combat readiness.