library(treemap)
library(tidyverse)
library(RColorBrewer)
setwd("C:/Users/Marti/OneDrive/Desktop/MC-DV")
Pokemon <- read_csv("pokemon.csv")Project1 - Pokemon
Project 1 - Pokemon
Introduction
Pokemon
My data set topic is about Pokemon, the popularized Japanese video game about “pocket monsters” that humans capture, train, and battle with. This data set specifically is captured over a week in December of 2016. The information collected about each interaction a trainer had with a Pokemon it captured as well as their traits. Some of the variables for the trainer included what level and which region they were from, but for the Pokemon, it included their level, gender, typing (both one and two), nature, as well as which Poke ball was used to capture it. I’m excited to be able to compare the trainers in different regions having a favorite typing. This would tell me if there is a “meta” that goes across regions of what types of pokemon are the best or favored by the fans. The source of the data is a fan-made site that captured this data, from http://serebii.net/.
Load the librarys
Preview the Pokemon Dataset
head(Pokemon)# A tibble: 6 × 15
Date Time Pokemon `Trainer Region` `Trainer Subregion` `Pokemon Region`
<chr> <tim> <chr> <chr> <chr> <chr>
1 12/13/2016 17:28 Oricor… South Korea <NA> <NA>
2 12/13/2016 17:30 Zubat United States Texas GER
3 12/13/2016 17:31 Carbink United States Oklahoma <NA>
4 12/13/2016 17:33 Klefki United States Connecticut <NA>
5 12/13/2016 17:34 Luvdisc United States <NA> <NA>
6 12/13/2016 17:35 Roggen… United Kingdom <NA> SPA
# ℹ 9 more variables: Level <dbl>, `Level Met` <dbl>, Gender <chr>,
# Type1 <chr>, Type2 <chr>, Nature <chr>, Pokeball <chr>, `Held Item` <lgl>,
# `Perfect IVs` <dbl>
Clean up the data!
Lowering the header’s capital letters and connecting separated words with underscores will help uniform and find the variables later.
names(Pokemon) <- tolower(names(Pokemon))
names(Pokemon) <- gsub(" ","_",names(Pokemon))
# gsub will remove spaces in between words in the headers and replace them with underscore
head(Pokemon)# A tibble: 6 × 15
date time pokemon trainer_region trainer_subregion pokemon_region level
<chr> <tim> <chr> <chr> <chr> <chr> <dbl>
1 12/13/2016 17:28 Oricor… South Korea <NA> <NA> 13
2 12/13/2016 17:30 Zubat United States Texas GER 8
3 12/13/2016 17:31 Carbink United States Oklahoma <NA> 10
4 12/13/2016 17:33 Klefki United States Connecticut <NA> 29
5 12/13/2016 17:34 Luvdisc United States <NA> <NA> 16
6 12/13/2016 17:35 Roggen… United Kingdom <NA> SPA 10
# ℹ 8 more variables: level_met <dbl>, gender <chr>, type1 <chr>, type2 <chr>,
# nature <chr>, pokeball <chr>, held_item <lgl>, perfect_ivs <dbl>
Combine Types
I will try to combine the type1 and type 2 to be able to visualize them together.
Pokemon <- Pokemon |>
mutate(type_combined = if_else(!is.na(type2), paste(type1, type2, sep = "_"), type1))
head(Pokemon)# A tibble: 6 × 16
date time pokemon trainer_region trainer_subregion pokemon_region level
<chr> <tim> <chr> <chr> <chr> <chr> <dbl>
1 12/13/2016 17:28 Oricor… South Korea <NA> <NA> 13
2 12/13/2016 17:30 Zubat United States Texas GER 8
3 12/13/2016 17:31 Carbink United States Oklahoma <NA> 10
4 12/13/2016 17:33 Klefki United States Connecticut <NA> 29
5 12/13/2016 17:34 Luvdisc United States <NA> <NA> 16
6 12/13/2016 17:35 Roggen… United Kingdom <NA> SPA 10
# ℹ 9 more variables: level_met <dbl>, gender <chr>, type1 <chr>, type2 <chr>,
# nature <chr>, pokeball <chr>, held_item <lgl>, perfect_ivs <dbl>,
# type_combined <chr>
Grouping Trainer Regions
Now we will group the different trainer regions and we will also get a count of each Pokemon be be able to see how many Pokemon were caught in that region.
# Summarize data by region
region_summary <- Pokemon |>
group_by(trainer_region, type_combined) |>
summarise(
pokemon_count = n()
) |>
arrange(trainer_region, desc(pokemon_count))`summarise()` has grouped output by 'trainer_region'. You can override using
the `.groups` argument.
Bar graph for Trends of Typing per Region
Now we will plot using ggplot and a bar graph
# Plot trends of primary typing per region
ggplot(region_summary, aes(x = type_combined, fill = trainer_region)) +
geom_bar(position = "stack") +
labs(title = "Trends of Typing per Region", x = "Pokemon Type", y = "Pokemon Count", fill = "Trainer Region") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Review and Pivot
Since this seems like too much data for one visualization, we will try to cut it down to be able to digest it better.
# Summarize the count of Pokemon per region
region_counts <- Pokemon |>
group_by(trainer_region) |>
summarise(count = n())|>
arrange(desc(count)) |>
slice(1:5) # Get the top five regions
head(region_counts)# A tibble: 5 × 2
trainer_region count
<chr> <int>
1 United States 153
2 Japan 128
3 Germany 43
4 United Kingdom 32
5 France 27
Filter the Regions
We will use filter to try and get only the top five regions
# Filter the original data set to include only the top five regions
top_five_regions <- Pokemon |>
filter(trainer_region %in% region_counts$trainer_region)We will also get the count of the Pokemon per type
# Summarize the count of Pokemon per type
type_counts <- top_five_regions |>
group_by(type_combined) |>
summarise(count = n()) |>
arrange(desc(count)) |>
slice(1:5) # Get the top five types
head(type_counts)# A tibble: 5 × 2
type_combined count
<chr> <int>
1 Water 39
2 Normal 29
3 Normal_Flying 23
4 Dark_Normal 20
5 Psychic 19
Filter the Types of Pokemon
We will filter for the top five types of Pokemon
# Filter the dataset to include only the top five types
top_five_types <- top_five_regions |>
filter(type_combined %in% type_counts$type_combined)Second Attempt to Plot
Now we will plot again to only the top five types in the top five regions
# Plot trends of primary typing per region for top ten types within top ten regions
ggplot(top_five_types, aes(x = type_combined, fill = trainer_region)) +
geom_bar(position = "stack") +
labs(title = "Trends of Top Five Types in Top Five Regions",
x = "Combined Type", y = "Count", fill = "Region",
caption = "Source: https://serebii.net/"
) +
scale_fill_brewer(palette = "Accent") I used “tolower”, “gsub”, and “mutate” to allow the data and the table to look the way I needed to on clean up to be able to continue to make my visualization. This visualization represents what could be determined as the “meta” of the favorite types of Pokemon across the most popular regions of Pokemon. In this, you see some outlines of how the United Kingdom is completely off the Water Pokemon Trend, compared to France not being fans of the normal flying types. On the contrary, the trends all seem to have all regions using lot of water types, followed by normal type. One thing I wish I could know how to do better is to get more information on a chart that could filter through the different types while keeping the visual small and down to these simple top five trainer regions. This could help the Pokemon company review and either give “buffs” or added benefits to other Pokemon to give them more popularity or usefulness or double down on these Pokemon as the trends show they are the fan favorites.