Project1 - Pokemon

Author

Efren Martinez

Project 1 - Pokemon

Introduction

Pokemon

My data set topic is about Pokemon, the popularized Japanese video game about “pocket monsters” that humans capture, train, and battle with. This data set specifically is captured over a week in December of 2016. The information collected about each interaction a trainer had with a Pokemon it captured as well as their traits. Some of the variables for the trainer included what level and which region they were from, but for the Pokemon, it included their level, gender, typing (both one and two), nature, as well as which Poke ball was used to capture it. I’m excited to be able to compare the trainers in different regions having a favorite typing. This would tell me if there is a “meta” that goes across regions of what types of pokemon are the best or favored by the fans. The source of the data is a fan-made site that captured this data, from http://serebii.net/.

Load the librarys

library(treemap)
library(tidyverse)
library(RColorBrewer)
setwd("C:/Users/Marti/OneDrive/Desktop/MC-DV")
Pokemon <- read_csv("pokemon.csv")

Preview the Pokemon Dataset

head(Pokemon)

# A tibble: 6 × 15
  Date       Time  Pokemon `Trainer Region` `Trainer Subregion` `Pokemon Region`
  <chr>      <tim> <chr>   <chr>            <chr>               <chr>           
1 12/13/2016 17:28 Oricor… South Korea      <NA>                <NA>            
2 12/13/2016 17:30 Zubat   United States    Texas               GER             
3 12/13/2016 17:31 Carbink United States    Oklahoma            <NA>            
4 12/13/2016 17:33 Klefki  United States    Connecticut         <NA>            
5 12/13/2016 17:34 Luvdisc United States    <NA>                <NA>            
6 12/13/2016 17:35 Roggen… United Kingdom   <NA>                SPA             
# ℹ 9 more variables: Level <dbl>, `Level Met` <dbl>, Gender <chr>,
#   Type1 <chr>, Type2 <chr>, Nature <chr>, Pokeball <chr>, `Held Item` <lgl>,
#   `Perfect IVs` <dbl>

Clean up the data!

Lowering the header’s capital letters and connecting separated words with underscores will help uniform and find the variables later.

names(Pokemon) <- tolower(names(Pokemon))
names(Pokemon) <- gsub(" ","_",names(Pokemon))
# gsub will remove spaces in between words in the headers and replace them with underscore
head(Pokemon)

# A tibble: 6 × 15
  date       time  pokemon trainer_region trainer_subregion pokemon_region level
  <chr>      <tim> <chr>   <chr>          <chr>             <chr>          <dbl>
1 12/13/2016 17:28 Oricor… South Korea    <NA>              <NA>              13
2 12/13/2016 17:30 Zubat   United States  Texas             GER                8
3 12/13/2016 17:31 Carbink United States  Oklahoma          <NA>              10
4 12/13/2016 17:33 Klefki  United States  Connecticut       <NA>              29
5 12/13/2016 17:34 Luvdisc United States  <NA>              <NA>              16
6 12/13/2016 17:35 Roggen… United Kingdom <NA>              SPA               10
# ℹ 8 more variables: level_met <dbl>, gender <chr>, type1 <chr>, type2 <chr>,
#   nature <chr>, pokeball <chr>, held_item <lgl>, perfect_ivs <dbl>

Combine Types

I will try to combine the type1 and type 2 to be able to visualize them together.

Pokemon <- Pokemon |>
 mutate(type_combined = if_else(!is.na(type2), paste(type1, type2, sep = "_"), type1))
head(Pokemon)

# A tibble: 6 × 16
  date       time  pokemon trainer_region trainer_subregion pokemon_region level
  <chr>      <tim> <chr>   <chr>          <chr>             <chr>          <dbl>
1 12/13/2016 17:28 Oricor… South Korea    <NA>              <NA>              13
2 12/13/2016 17:30 Zubat   United States  Texas             GER                8
3 12/13/2016 17:31 Carbink United States  Oklahoma          <NA>              10
4 12/13/2016 17:33 Klefki  United States  Connecticut       <NA>              29
5 12/13/2016 17:34 Luvdisc United States  <NA>              <NA>              16
6 12/13/2016 17:35 Roggen… United Kingdom <NA>              SPA               10
# ℹ 9 more variables: level_met <dbl>, gender <chr>, type1 <chr>, type2 <chr>,
#   nature <chr>, pokeball <chr>, held_item <lgl>, perfect_ivs <dbl>,
#   type_combined <chr>

Grouping Trainer Regions

Now we will group the different trainer regions and we will also get a count of each Pokemon be be able to see how many Pokemon were caught in that region.

# Summarize data by region
region_summary <- Pokemon |>
  group_by(trainer_region, type_combined) |>
  summarise(
    pokemon_count = n()
  ) |>
  arrange(trainer_region, desc(pokemon_count))

`summarise()` has grouped output by 'trainer_region'. You can override using
the `.groups` argument.

Bar graph for Trends of Typing per Region

Now we will plot using ggplot and a bar graph

# Plot trends of primary typing per region
ggplot(region_summary, aes(x = type_combined, fill = trainer_region)) +
  geom_bar(position = "stack") +
  labs(title = "Trends of Typing per Region", x = "Pokemon Type", y = "Pokemon Count", fill = "Trainer Region") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Review and Pivot

Since this seems like too much data for one visualization, we will try to cut it down to be able to digest it better.

# Summarize the count of Pokemon per region
region_counts <- Pokemon |>
  group_by(trainer_region) |>
  summarise(count = n())|>
  arrange(desc(count)) |>
  slice(1:5) # Get the top five regions
head(region_counts)

# A tibble: 5 × 2
  trainer_region count
  <chr>          <int>
1 United States    153
2 Japan            128
3 Germany           43
4 United Kingdom    32
5 France            27

Filter the Regions

We will use filter to try and get only the top five regions

# Filter the original data set to include only the top five regions
top_five_regions <- Pokemon |>
  filter(trainer_region %in% region_counts$trainer_region)

We will also get the count of the Pokemon per type

# Summarize the count of Pokemon per type
type_counts <- top_five_regions |>
  group_by(type_combined) |>
  summarise(count = n()) |>
  arrange(desc(count)) |>
  slice(1:5) # Get the top five types
head(type_counts)

# A tibble: 5 × 2
  type_combined count
  <chr>         <int>
1 Water            39
2 Normal           29
3 Normal_Flying    23
4 Dark_Normal      20
5 Psychic          19

Filter the Types of Pokemon

We will filter for the top five types of Pokemon

# Filter the dataset to include only the top five types
top_five_types <- top_five_regions |>
  filter(type_combined %in% type_counts$type_combined)

Second Attempt to Plot

Now we will plot again to only the top five types in the top five regions

# Plot trends of primary typing per region for top ten types within top ten regions
ggplot(top_five_types, aes(x = type_combined, fill = trainer_region)) +
  geom_bar(position = "stack") +
  labs(title = "Trends of Top Five Types in Top Five Regions", 
       x = "Combined Type", y = "Count", fill = "Region",
    caption = "Source: https://serebii.net/"
  ) +
 
  scale_fill_brewer(palette = "Accent")

I used “tolower”, “gsub”, and “mutate” to allow the data and the table to look the way I needed to on clean up to be able to continue to make my visualization. This visualization represents what could be determined as the “meta” of the favorite types of Pokemon across the most popular regions of Pokemon. In this, you see some outlines of how the United Kingdom is completely off the Water Pokemon Trend, compared to France not being fans of the normal flying types. On the contrary, the trends all seem to have all regions using lot of water types, followed by normal type. One thing I wish I could know how to do better is to get more information on a chart that could filter through the different types while keeping the visual small and down to these simple top five trainer regions. This could help the Pokemon company review and either give “buffs” or added benefits to other Pokemon to give them more popularity or usefulness or double down on these Pokemon as the trends show they are the fan favorites.