knitr::include_graphics("~/Desktop/RWD/094.png")
The data I have chosen for my final project is the Pokemon with stats data set from kaggle. This data set has information on 800 different Pokemon. This includes their names, types, hp, attack, defense, speed, attack speed, defense speed, what generation the pokemon is in, and whether it is a legendary Pokemon.
This data was collected from several sources, including Bulbapedia, PokemonDB, and Serebii. This data was brought to us by Alberto Barradas.
There were no missing values in this data or duplicates, and it was overall immaculate data. I converted Type.1 to just type as it was easier to use in my data.
I chose this data set because I love pokemon. I’ve been playing Pokemon games since I was about four years old, and I find it interesting to research my favorite Pokemon with my new found data knowledge. This data set also makes it very easy to run statistical tests and create fun, exciting plots.
library(readr)
library(gganimate)
## Loading required package: ggplot2
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(gifski)
library(ggplot2)
library(ggiraph)
setwd("~/Desktop/RWD")
pokemon <- read_csv("pokemon.csv")
## Rows: 800 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Name, Type 1, Type 2
## dbl (9): #, Total, HP, Attack, Defense, Sp. Atk, Sp. Def, Speed, Generation
## lgl (1): Legendary
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(pokemon)
## # A tibble: 6 × 13
## `#` Name `Type 1` `Type 2` Total HP Attack Defense `Sp. Atk` `Sp. Def`
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 Bulbas… Grass Poison 318 45 49 49 65 65
## 2 2 Ivysaur Grass Poison 405 60 62 63 80 80
## 3 3 Venusa… Grass Poison 525 80 82 83 100 100
## 4 3 Venusa… Grass Poison 625 80 100 123 122 120
## 5 4 Charma… Fire <NA> 309 39 52 43 60 50
## 6 5 Charme… Fire <NA> 405 58 64 58 80 65
## # ℹ 3 more variables: Speed <dbl>, Generation <dbl>, Legendary <lgl>
glimpse(pokemon)
## Rows: 800
## Columns: 13
## $ `#` <dbl> 1, 2, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9, 9, 10, 11, 12, 13, 14, …
## $ Name <chr> "Bulbasaur", "Ivysaur", "Venusaur", "VenusaurMega Venusaur"…
## $ `Type 1` <chr> "Grass", "Grass", "Grass", "Grass", "Fire", "Fire", "Fire",…
## $ `Type 2` <chr> "Poison", "Poison", "Poison", "Poison", NA, NA, "Flying", "…
## $ Total <dbl> 318, 405, 525, 625, 309, 405, 534, 634, 634, 314, 405, 530,…
## $ HP <dbl> 45, 60, 80, 80, 39, 58, 78, 78, 78, 44, 59, 79, 79, 45, 50,…
## $ Attack <dbl> 49, 62, 82, 100, 52, 64, 84, 130, 104, 48, 63, 83, 103, 30,…
## $ Defense <dbl> 49, 63, 83, 123, 43, 58, 78, 111, 78, 65, 80, 100, 120, 35,…
## $ `Sp. Atk` <dbl> 65, 80, 100, 122, 60, 80, 109, 130, 159, 50, 65, 85, 135, 2…
## $ `Sp. Def` <dbl> 65, 80, 100, 120, 50, 65, 85, 85, 115, 64, 80, 105, 115, 20…
## $ Speed <dbl> 45, 60, 80, 80, 65, 80, 100, 100, 100, 43, 58, 78, 78, 45, …
## $ Generation <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Legendary <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
str(pokemon)
## spc_tbl_ [800 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ # : num [1:800] 1 2 3 3 4 5 6 6 6 7 ...
## $ Name : chr [1:800] "Bulbasaur" "Ivysaur" "Venusaur" "VenusaurMega Venusaur" ...
## $ Type 1 : chr [1:800] "Grass" "Grass" "Grass" "Grass" ...
## $ Type 2 : chr [1:800] "Poison" "Poison" "Poison" "Poison" ...
## $ Total : num [1:800] 318 405 525 625 309 405 534 634 634 314 ...
## $ HP : num [1:800] 45 60 80 80 39 58 78 78 78 44 ...
## $ Attack : num [1:800] 49 62 82 100 52 64 84 130 104 48 ...
## $ Defense : num [1:800] 49 63 83 123 43 58 78 111 78 65 ...
## $ Sp. Atk : num [1:800] 65 80 100 122 60 80 109 130 159 50 ...
## $ Sp. Def : num [1:800] 65 80 100 120 50 65 85 85 115 64 ...
## $ Speed : num [1:800] 45 60 80 80 65 80 100 100 100 43 ...
## $ Generation: num [1:800] 1 1 1 1 1 1 1 1 1 1 ...
## $ Legendary : logi [1:800] FALSE FALSE FALSE FALSE FALSE FALSE ...
## - attr(*, "spec")=
## .. cols(
## .. `#` = col_double(),
## .. Name = col_character(),
## .. `Type 1` = col_character(),
## .. `Type 2` = col_character(),
## .. Total = col_double(),
## .. HP = col_double(),
## .. Attack = col_double(),
## .. Defense = col_double(),
## .. `Sp. Atk` = col_double(),
## .. `Sp. Def` = col_double(),
## .. Speed = col_double(),
## .. Generation = col_double(),
## .. Legendary = col_logical()
## .. )
## - attr(*, "problems")=<externalptr>
# Rename type to make it cleaner
pokemon1 <- pokemon %>%
rename(Type = "Type 1")
pokemon_type_avg <- pokemon1 %>%
group_by(Type) %>%
summarise(avg_HP = mean(HP),
avg_Attack = mean(Attack),
avg_Defense = mean(Defense),
avg_Sp_Atk = mean(`Sp. Atk`),
avg_Sp_Def = mean(`Sp. Def`),
avg_Speed = mean(Speed)) %>%
ungroup()
ggplot(pokemon_type_avg, aes(x = Type)) +
geom_col(aes(y = avg_HP, fill = "HP"), width = 0.5) +
geom_col(aes(y = avg_Attack, fill = "Attack"), width = 0.5) +
geom_col(aes(y = avg_Defense, fill = "Defense"), width = 0.5) +
geom_col(aes(y = avg_Sp_Atk, fill = "Sp. Atk"), width = 0.5) +
geom_col(aes(y = avg_Sp_Def, fill = "Sp. Def"), width = 0.5) +
geom_col(aes(y = avg_Speed, fill = "Speed"), width = 0.5) +
scale_fill_manual(values = c("HP" = "#FF8E72", "Attack" = "#66A182", "Defense" = "#00BFC4",
"Sp. Atk" = "#DBBBF5", "Sp. Def" = "#DBBBF5", "Speed" = "#F7B2BD")) +
theme_classic() +
labs(title = "Average Stats by Pokemon Type", x = "Type", y = "Average Stat") +
coord_flip()
# Animate our plot
ggplot(pokemon_type_avg, aes(x = Type, y = 0)) +
geom_col(aes(y = avg_HP, fill = "HP"), width = 0.5) +
geom_col(aes(y = avg_Attack, fill = "Attack"), width = 0.5) +
geom_col(aes(y = avg_Defense, fill = "Defense"), width = 0.5) +
geom_col(aes(y = avg_Sp_Atk, fill = "Sp. Atk"), width = 0.5) +
geom_col(aes(y = avg_Sp_Def, fill = "Sp. Def"), width = 0.5) +
geom_col(aes(y = avg_Speed, fill = "Speed"), width = 0.5) +
scale_fill_manual(values = c("HP" = "#FF8E72", "Attack" = "#66A182", "Defense" = "#00BFC4",
"Sp. Atk" = "#DBBBF5", "Sp. Def" = "#DBBBF5", "Speed" = "#F7B2BD")) +
theme_classic() +
labs(title = "Average Stats by Pokemon Type", x = "Type", y = "Average Stat") +
coord_flip() +
transition_states(Type, transition_length = 2, state_length = 1) +
ease_aes("linear")
This stacked bar chart shows each Pokemon type’s different stats. It
shows each Pokemon type’s average speed, defense, speed attack, and
speed defense. I used gganimate to create an animated plot that
transitions between different states based on the type of Pokemon. Each
stat is represented by a particular color. This plot is an interesting
and appealing way to show the average stats of each Pokemon, and the
animation adds some pazazz to it.
# Filter the data for Steel type
steel_data <- filter(pokemon1, Type == "Steel")
# Calculate the correlation between Attack and Defense for Steel type
correlation <- cor(steel_data$Attack, steel_data$Defense)
# Visualize the relationship between Attack and Defense for Steel type
ggplot(steel_data, aes(x = Attack, y = Defense)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Relationship between Attack and Defense for Steel Type",
x = "Attack", y = "Defense",
subtitle = paste("Correlation coefficient:", round(correlation, 2)))
## `geom_smooth()` using formula = 'y ~ x'
Our correlation test shows a correlation of .37, indicating a moderate positive correlation. In other words, as attack increases, so does defense. However, there are some very prominent outliers for this, as shown in our plot. There are also many exceptions to this, as many other factors likely play a significant role in determining a Pokemon’s overall strength.
#Filter Data to just include dragon types
dragon_data <- filter(pokemon1, Type == "Dragon")
# Calculate Correlation
correlation2 <- cor(dragon_data$Attack, dragon_data$Defense)
ggplot(dragon_data, aes(x = Attack, y = Defense)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Relationship between Attack and Defense for Dragon Type",
x = "Attack", y = "Defense",
subtitle = paste("Correlation coefficient:", round(correlation, 2)))
## `geom_smooth()` using formula = 'y ~ x'
This correlation test also shows that there’s a 37 correlation moderate positive correlation for Dragon-type Pokemon. I was surprised by this as this data looks far more significant together than the other data.
# Filter for Steel and Dragon types
SvD <- filter(pokemon1, Type %in% c("Steel", "Dragon"))
# Create Plot
ggplot(SvD, aes(x = Attack, y = Defense, color = Type)) +
geom_point(size = 3, alpha = 0.7) +
scale_color_manual(values = c("#2978A0", "#315659")) +
labs(title = "Steel vs. Dragon Type",
x = "Attack", y = "Defense",
color = "Type") +
theme_bw() +
theme(panel.grid.major = element_line(colour = "grey90", linewidth = 0.5),
axis.text = element_text(size = 12),
axis.title = element_text(size = 14, face = "bold"))
# Add animations to plot in put it in new frame
poke <- ggplot(SvD, aes(x = Attack, y = Defense, color = Type)) +
geom_point(size = 3, alpha = 0.7) +
scale_color_manual(values = c("#2978A0", "#315659")) +
labs(title = "Steel vs. Dragon Type",
x = "Attack", y = "Defense",
color = "Type") +
theme_bw() +
theme(panel.grid.major = element_line(colour = "grey90", size = 0.5),
axis.text = element_text(size = 12),
axis.title = element_text(size = 14, face = "bold")) +
transition_states(Type,
transition_length = 2,
state_length = 1)
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Animate our plot
animate(poke, fps = 10, nframes = 100)
This graph shows Steel versus Dragon type’s defense and attack stats. It’s clear from this data that Steel-type Pokemon have higher defense than Dragon types. Dragon-type deals more attack damage, however. This data set makes it very clear that steel Pokemon are the best and are better then dragon-type Pokemon.
Pokemon is a top-rated game known worldwide. It was popularized by an anime series following a boy names Ash Ketchum and his journey to be a pokemon master! So what is a Pokemon? According to the article “Parents Guide to Pokemon,” a Pokemon is a “Pokémon are creatures of all shapes and sizes who live in the wild or alongside their human partners (called “Trainers”)” Trainers must catch their Pokemon and train them by battling so they can evolve to their highest level. There are tons of Pokemon, each with different types and abilities. There also many different games that the article describes as “In many games, the player takes on the role of a young Trainer whose journey involves traveling from place to place, catching and training Pokémon, and battling against other Trainers’ Pokémon on a quest to become the Pokémon League Champion.” (Parents Guide to Pokemon) Pokemon makes a comeback every couple of years and has a fan base that spans far and wide over many ages and types of people.
My first plot shows the average stats of each pokemon type. Each bar represents a different variable, and each color indicates what it corresponds with. This plot was fascinating, and a pattern that I found was that certain types of Pokemon had much better stats than other types; for example, dragon Pokemon had a high attack while steel Pokemon had a high defense. This is great to know when building your team in the Pokemon game, as it will allow you to have a more well-rounded team. A boxplot would’ve been a good thing to add, as theirs many obvious outliers that would be good to understand. I animated this plot to show one at a time and it looks very cool and gives you more time to focus on the best.
Next, I chose to explore whether the dragon or steel type had more of a correlation between attack and defense. I was surprised to learn that they had the exact weak positive correlation; although one was there, it was not strong. I then plotted the differences between these two Pokemon types: defense and attack, as they’re dubbed the best. I animated it so it would show one at a time.