knitr::include_graphics("~/Desktop/RWD/094.png")

The data I have chosen for my final project is the Pokemon with stats data set from kaggle. This data set has information on 800 different Pokemon. This includes their names, types, hp, attack, defense, speed, attack speed, defense speed, what generation the pokemon is in, and whether it is a legendary Pokemon.

This data was collected from several sources, including Bulbapedia, PokemonDB, and Serebii. This data was brought to us by Alberto Barradas.

There were no missing values in this data or duplicates, and it was overall immaculate data. I converted Type.1 to just type as it was easier to use in my data.

I chose this data set because I love pokemon. I’ve been playing Pokemon games since I was about four years old, and I find it interesting to research my favorite Pokemon with my new found data knowledge. This data set also makes it very easy to run statistical tests and create fun, exciting plots.

Load libraries!!

library(readr)
library(gganimate)
## Loading required package: ggplot2
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(gifski)
library(ggplot2)
library(ggiraph)

Read In The Data, Set the Working directory and explore!

setwd("~/Desktop/RWD")
pokemon <- read_csv("pokemon.csv")
## Rows: 800 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Name, Type 1, Type 2
## dbl (9): #, Total, HP, Attack, Defense, Sp. Atk, Sp. Def, Speed, Generation
## lgl (1): Legendary
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(pokemon)
## # A tibble: 6 × 13
##     `#` Name    `Type 1` `Type 2` Total    HP Attack Defense `Sp. Atk` `Sp. Def`
##   <dbl> <chr>   <chr>    <chr>    <dbl> <dbl>  <dbl>   <dbl>     <dbl>     <dbl>
## 1     1 Bulbas… Grass    Poison     318    45     49      49        65        65
## 2     2 Ivysaur Grass    Poison     405    60     62      63        80        80
## 3     3 Venusa… Grass    Poison     525    80     82      83       100       100
## 4     3 Venusa… Grass    Poison     625    80    100     123       122       120
## 5     4 Charma… Fire     <NA>       309    39     52      43        60        50
## 6     5 Charme… Fire     <NA>       405    58     64      58        80        65
## # ℹ 3 more variables: Speed <dbl>, Generation <dbl>, Legendary <lgl>
glimpse(pokemon)
## Rows: 800
## Columns: 13
## $ `#`        <dbl> 1, 2, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9, 9, 10, 11, 12, 13, 14, …
## $ Name       <chr> "Bulbasaur", "Ivysaur", "Venusaur", "VenusaurMega Venusaur"…
## $ `Type 1`   <chr> "Grass", "Grass", "Grass", "Grass", "Fire", "Fire", "Fire",…
## $ `Type 2`   <chr> "Poison", "Poison", "Poison", "Poison", NA, NA, "Flying", "…
## $ Total      <dbl> 318, 405, 525, 625, 309, 405, 534, 634, 634, 314, 405, 530,…
## $ HP         <dbl> 45, 60, 80, 80, 39, 58, 78, 78, 78, 44, 59, 79, 79, 45, 50,…
## $ Attack     <dbl> 49, 62, 82, 100, 52, 64, 84, 130, 104, 48, 63, 83, 103, 30,…
## $ Defense    <dbl> 49, 63, 83, 123, 43, 58, 78, 111, 78, 65, 80, 100, 120, 35,…
## $ `Sp. Atk`  <dbl> 65, 80, 100, 122, 60, 80, 109, 130, 159, 50, 65, 85, 135, 2…
## $ `Sp. Def`  <dbl> 65, 80, 100, 120, 50, 65, 85, 85, 115, 64, 80, 105, 115, 20…
## $ Speed      <dbl> 45, 60, 80, 80, 65, 80, 100, 100, 100, 43, 58, 78, 78, 45, …
## $ Generation <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Legendary  <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
str(pokemon)
## spc_tbl_ [800 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ #         : num [1:800] 1 2 3 3 4 5 6 6 6 7 ...
##  $ Name      : chr [1:800] "Bulbasaur" "Ivysaur" "Venusaur" "VenusaurMega Venusaur" ...
##  $ Type 1    : chr [1:800] "Grass" "Grass" "Grass" "Grass" ...
##  $ Type 2    : chr [1:800] "Poison" "Poison" "Poison" "Poison" ...
##  $ Total     : num [1:800] 318 405 525 625 309 405 534 634 634 314 ...
##  $ HP        : num [1:800] 45 60 80 80 39 58 78 78 78 44 ...
##  $ Attack    : num [1:800] 49 62 82 100 52 64 84 130 104 48 ...
##  $ Defense   : num [1:800] 49 63 83 123 43 58 78 111 78 65 ...
##  $ Sp. Atk   : num [1:800] 65 80 100 122 60 80 109 130 159 50 ...
##  $ Sp. Def   : num [1:800] 65 80 100 120 50 65 85 85 115 64 ...
##  $ Speed     : num [1:800] 45 60 80 80 65 80 100 100 100 43 ...
##  $ Generation: num [1:800] 1 1 1 1 1 1 1 1 1 1 ...
##  $ Legendary : logi [1:800] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `#` = col_double(),
##   ..   Name = col_character(),
##   ..   `Type 1` = col_character(),
##   ..   `Type 2` = col_character(),
##   ..   Total = col_double(),
##   ..   HP = col_double(),
##   ..   Attack = col_double(),
##   ..   Defense = col_double(),
##   ..   `Sp. Atk` = col_double(),
##   ..   `Sp. Def` = col_double(),
##   ..   Speed = col_double(),
##   ..   Generation = col_double(),
##   ..   Legendary = col_logical()
##   .. )
##  - attr(*, "problems")=<externalptr>

Clean The Data!

# Rename type to make it cleaner 
pokemon1 <- pokemon %>%
  rename(Type = "Type 1")

Summarise and group certain data to perpare for first visual

pokemon_type_avg <- pokemon1 %>%
  group_by(Type) %>%
  summarise(avg_HP = mean(HP),
            avg_Attack = mean(Attack),
            avg_Defense = mean(Defense),
            avg_Sp_Atk = mean(`Sp. Atk`),
            avg_Sp_Def = mean(`Sp. Def`),
            avg_Speed = mean(Speed)) %>%
  ungroup()

Create our plot showing average stats of all pokemon type

 ggplot(pokemon_type_avg, aes(x = Type)) +
  geom_col(aes(y = avg_HP, fill = "HP"), width = 0.5) +
  geom_col(aes(y = avg_Attack, fill = "Attack"), width = 0.5) +
  geom_col(aes(y = avg_Defense, fill = "Defense"), width = 0.5) +
  geom_col(aes(y = avg_Sp_Atk, fill = "Sp. Atk"), width = 0.5) +
  geom_col(aes(y = avg_Sp_Def, fill = "Sp. Def"), width = 0.5) +
  geom_col(aes(y = avg_Speed, fill = "Speed"), width = 0.5) +
  scale_fill_manual(values = c("HP" = "#FF8E72", "Attack" = "#66A182", "Defense" = "#00BFC4", 
                               "Sp. Atk" = "#DBBBF5", "Sp. Def" = "#DBBBF5", "Speed" = "#F7B2BD")) +
  theme_classic() +
  labs(title = "Average Stats by Pokemon Type", x = "Type", y = "Average Stat") +
  coord_flip()

Animate Our Visual

# Animate our plot 
ggplot(pokemon_type_avg, aes(x = Type, y = 0)) +
  geom_col(aes(y = avg_HP, fill = "HP"), width = 0.5) +
  geom_col(aes(y = avg_Attack, fill = "Attack"), width = 0.5) +
  geom_col(aes(y = avg_Defense, fill = "Defense"), width = 0.5) +
  geom_col(aes(y = avg_Sp_Atk, fill = "Sp. Atk"), width = 0.5) +
  geom_col(aes(y = avg_Sp_Def, fill = "Sp. Def"), width = 0.5) +
  geom_col(aes(y = avg_Speed, fill = "Speed"), width = 0.5) +
  scale_fill_manual(values = c("HP" = "#FF8E72", "Attack" = "#66A182", "Defense" = "#00BFC4", 
                               "Sp. Atk" = "#DBBBF5", "Sp. Def" = "#DBBBF5", "Speed" = "#F7B2BD")) +
  theme_classic() +
  labs(title = "Average Stats by Pokemon Type", x = "Type", y = "Average Stat") +
  coord_flip() +
  transition_states(Type, transition_length = 2, state_length = 1) +
  ease_aes("linear")

This stacked bar chart shows each Pokemon type’s different stats. It shows each Pokemon type’s average speed, defense, speed attack, and speed defense. I used gganimate to create an animated plot that transitions between different states based on the type of Pokemon. Each stat is represented by a particular color. This plot is an interesting and appealing way to show the average stats of each Pokemon, and the animation adds some pazazz to it.

Lets explore the two best pokemon type steel!

# Filter the data for Steel type
steel_data <- filter(pokemon1, Type == "Steel")

# Calculate the correlation between Attack and Defense for Steel type
correlation <- cor(steel_data$Attack, steel_data$Defense)

# Visualize the relationship between Attack and Defense for Steel type
ggplot(steel_data, aes(x = Attack, y = Defense)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Relationship between Attack and Defense for Steel Type", 
       x = "Attack", y = "Defense", 
       subtitle = paste("Correlation coefficient:", round(correlation, 2)))
## `geom_smooth()` using formula = 'y ~ x'

Our correlation test shows a correlation of .37, indicating a moderate positive correlation. In other words, as attack increases, so does defense. However, there are some very prominent outliers for this, as shown in our plot. There are also many exceptions to this, as many other factors likely play a significant role in determining a Pokemon’s overall strength.

Lets run another correlation test for dragon type pokemon

#Filter Data to just include dragon types
dragon_data <- filter(pokemon1, Type == "Dragon")

# Calculate Correlation
correlation2 <- cor(dragon_data$Attack, dragon_data$Defense)

ggplot(dragon_data, aes(x = Attack, y = Defense)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Relationship between Attack and Defense for Dragon Type", 
       x = "Attack", y = "Defense", 
       subtitle = paste("Correlation coefficient:", round(correlation, 2)))
## `geom_smooth()` using formula = 'y ~ x'

This correlation test also shows that there’s a 37 correlation moderate positive correlation for Dragon-type Pokemon. I was surprised by this as this data looks far more significant together than the other data.

Lets explore the stats between the two strongest pokemon

# Filter for Steel and Dragon types 
SvD <- filter(pokemon1, Type %in% c("Steel", "Dragon"))

# Create Plot
ggplot(SvD, aes(x = Attack, y = Defense, color = Type)) +
  geom_point(size = 3, alpha = 0.7) +
  scale_color_manual(values = c("#2978A0", "#315659")) +
  labs(title = "Steel vs. Dragon Type", 
       x = "Attack", y = "Defense", 
       color = "Type") +
  theme_bw() +
  theme(panel.grid.major = element_line(colour = "grey90", linewidth = 0.5),
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 14, face = "bold"))

# Add animations to plot in put it in new frame
poke <- ggplot(SvD, aes(x = Attack, y = Defense, color = Type)) +
  geom_point(size = 3, alpha = 0.7) +
  scale_color_manual(values = c("#2978A0", "#315659")) +
  labs(title = "Steel vs. Dragon Type", 
       x = "Attack", y = "Defense", 
       color = "Type") +
  theme_bw() +
  theme(panel.grid.major = element_line(colour = "grey90", size = 0.5),
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 14, face = "bold")) +
  transition_states(Type, 
                    transition_length = 2, 
                    state_length = 1)
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Animate our plot
animate(poke, fps = 10, nframes = 100)

Overview of Our Second Graph

This graph shows Steel versus Dragon type’s defense and attack stats. It’s clear from this data that Steel-type Pokemon have higher defense than Dragon types. Dragon-type deals more attack damage, however. This data set makes it very clear that steel Pokemon are the best and are better then dragon-type Pokemon.

Conclusion

Pokemon is a top-rated game known worldwide. It was popularized by an anime series following a boy names Ash Ketchum and his journey to be a pokemon master! So what is a Pokemon? According to the article “Parents Guide to Pokemon,” a Pokemon is a “Pokémon are creatures of all shapes and sizes who live in the wild or alongside their human partners (called “Trainers”)” Trainers must catch their Pokemon and train them by battling so they can evolve to their highest level. There are tons of Pokemon, each with different types and abilities. There also many different games that the article describes as “In many games, the player takes on the role of a young Trainer whose journey involves traveling from place to place, catching and training Pokémon, and battling against other Trainers’ Pokémon on a quest to become the Pokémon League Champion.” (Parents Guide to Pokemon) Pokemon makes a comeback every couple of years and has a fan base that spans far and wide over many ages and types of people.

My first plot shows the average stats of each pokemon type. Each bar represents a different variable, and each color indicates what it corresponds with. This plot was fascinating, and a pattern that I found was that certain types of Pokemon had much better stats than other types; for example, dragon Pokemon had a high attack while steel Pokemon had a high defense. This is great to know when building your team in the Pokemon game, as it will allow you to have a more well-rounded team. A boxplot would’ve been a good thing to add, as theirs many obvious outliers that would be good to understand. I animated this plot to show one at a time and it looks very cool and gives you more time to focus on the best.

Next, I chose to explore whether the dragon or steel type had more of a correlation between attack and defense. I was surprised to learn that they had the exact weak positive correlation; although one was there, it was not strong. I then plotted the differences between these two Pokemon types: defense and attack, as they’re dubbed the best. I animated it so it would show one at a time.