How to Start:

Pokemon

For this assessment you will use a dataset about Pokemon. I wrangled these data from two different sources on Kaggle:

pokemon<-read.csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/pokemonMid2.csv")

Background information:

For decades, kids all over the world have been discovering the enchanting world of Pokémon (an abbreviation for Pocket Monsters). Many of those children become lifelong fans. Today, the Pokémon family of products includes video games, the Pokémon Trading Card Game (TCG), an animated series, movies, toys, books, and much more.

Pokémon are creatures of all shapes and sizes who live in the wild or alongside their human partners (called “Trainers”). During their adventures, Pokémon grow and become more experienced and even, on occasion, evolve into stronger Pokémon. Hundreds of known Pokémon inhabit the Pokémon universe, with untold numbers waiting to be discovered!

Source: https://www.pokemon.com/us/parents-guide/

Variables

Learn about the variables available:

str(pokemon)
## 'data.frame':    751 obs. of  17 variables:
##  $ PokeDexNumber  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ PokeName       : chr  "Bulbasaur" "Ivysaur" "Venusaur" "Charmander" ...
##  $ Type           : chr  "Grass" "Grass" "Grass" "Fire" ...
##  $ OtherType      : chr  "Poison" "Poison" "Poison" "" ...
##  $ SumOfAttack    : int  318 405 525 309 405 534 314 405 530 195 ...
##  $ HitPoints      : int  45 60 80 39 58 78 44 59 79 45 ...
##  $ AttStrg        : int  49 62 82 52 64 84 48 63 83 30 ...
##  $ DefStrg        : int  49 63 83 43 58 78 65 80 100 35 ...
##  $ SpAttStrg      : int  65 80 100 60 80 109 50 65 85 20 ...
##  $ SpDefStrg      : int  65 80 100 50 65 85 64 80 105 20 ...
##  $ Speed          : int  45 60 80 65 80 100 43 58 78 45 ...
##  $ Generation     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Legendary      : chr  "False" "False" "False" "False" ...
##  $ capture_rate   : int  45 45 45 45 45 45 45 45 45 255 ...
##  $ height_m       : num  0.7 1 2 0.6 1.1 1.7 0.5 1 1.6 0.3 ...
##  $ weight_kg      : num  6.9 13 100 8.5 19 90.5 9 22.5 85.5 2.9 ...
##  $ percentage_male: num  88.1 88.1 88.1 88.1 88.1 88.1 88.1 88.1 88.1 50 ...

Question 1: Generations

When Professor Smalley was a child she used to collect Pokemon cards. In the first generation of Pokemon there were “150 or more to see”.

Question: (4 points) Which generation introduced the most new Pokémon and how many were in that generation?

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
pokemon%>%
  count(Generation)%>%
  arrange(desc(n))
##   Generation   n
## 1          5 164
## 2          1 151
## 3          3 140
## 4          4 116
## 5          2  99
## 6          6  81

Look up the PokéRap after class if you’ve never heard the song.

Question 2: Types

Question: (4 points) Looking only at the main type. How many different types of Pokemon are there? Which Pokemon type are there the most species of?

species<-pokemon%>%
  count(Type)%>%
  arrange(desc(n))

## HOW MANY TYPES
dim(species)
## [1] 18  2
## COUNT SPECIES BY TYPE
head(species)
##      Type   n
## 1   Water 107
## 2  Normal  94
## 3   Grass  66
## 4     Bug  65
## 5 Psychic  52
## 6    Fire  48

Question 3: Gotta Catch Em All! (Catch Rate)

Each species of Pokémon has a catch rate that applies to all its members. When a Poké Ball is thrown at a wild Pokémon, the game uses that Pokémon’s catch rate in a formula to determine the chances of catching that Pokémon. Higher catch rates mean that the Pokémon is easier to catch, up to a maximum of 255.

Source: https://bulbapedia.bulbagarden.net/wiki/Catch_rate

Question: (4 points) Create a histogram plot for the distribution of capture_rate. Test different amounts of bins. What number of bins best illustrates the shape of the data?

NOTE: There is not one right answer, this is about articulating your point and providing support for your choice.

ggplot(data=pokemon, aes(x=capture_rate))+
  geom_histogram(bins=7)

Question 4: Fastest

Question: (4 points) Create a new data frame to show which Pokemon type has the fastest average speed.

State which Type has the fastest average speed and what that speed is.

fastest<-pokemon%>%
  group_by(Type)%>%
  summarise(avgSpeed=mean(Speed, na.rm=TRUE))%>%
  arrange(desc(avgSpeed))

## TOP 6
head(fastest)
## # A tibble: 6 × 2
##   Type     avgSpeed
##   <chr>       <dbl>
## 1 Flying      102. 
## 2 Electric     84.2
## 3 Dragon       78.1
## 4 Psychic      77.2
## 5 Dark         75.4
## 6 Fire         74.0

Question 5: Legendary!

Legendary Pokémon are a group of incredibly rare and often very powerful Pokémon, generally featured prominently in the legends and myths of the Pokémon world.

Source: https://bulbapedia.bulbagarden.net/wiki/Legendary_Pok%C3%A9mon

Question: (4 points) Create a new dataframe to summarise the following information for each Legendary status group:

  • Number of Pokemon
  • Average Speed
  • Average Hit Points
  • Average Capture Rate

State your observations.

legendary<-pokemon%>%
  group_by(Legendary)%>%
  summarise(n=n(), 
            avgSpeed=mean(Speed, na.rm = TRUE), 
            avgHP=mean(HitPoints, na.rm=TRUE), 
            avgCR=mean(capture_rate, na.rm=TRUE))

## Legendary
legendary
## # A tibble: 2 × 5
##   Legendary     n avgSpeed avgHP  avgCR
##   <chr>     <int>    <dbl> <dbl>  <dbl>
## 1 False       692     63.9  66.6 105.  
## 2 True         59     98.3  93.2   6.56

Question 6: Compare Speed

Question: (4 points) Create a side-by-side boxplot to show the distribution of speeds by Legendary. Use the proper aesthetic so that each box contains the color for the Legendary status.

State your observations.

ggplot(data=pokemon, aes(x=Legendary, y=Speed, fill=Legendary))+
  geom_boxplot()

Question 7: Shape of Distribution

Question: (4 points) Now, just looking at water type Pokemon, describe the shape of the distribution for the sum of attack points (`SumOfAttack’)? Use the most appropriate geometries to create plot(s).

Be sure to include commentary on symmetry/skew, modality, spread, and outliers.

water<-pokemon%>%
  filter(Type=="Water")

## BIOMODAL SHAPE
ggplot(data=water, aes(x=SumOfAttack))+
  geom_density()

## OUTLIERS?
ggplot(data=water, aes(x=SumOfAttack))+
  geom_boxplot()

Question 8: Imperial System

Question: (4 points) Please convert the values for height and weight to the imperial system, which is standard in the United States.

  • 1 m (meter) = 39.3701 inches
  • 1 kg (kilogram) = 2.205 pounds

Create a new data frame to accomplish this so that we can use these values in subsequent Parts 8, 9, and 10.

imperial<-pokemon%>%
  mutate(height_in=height_m*39.3701, 
         weight_lb =weight_kg*2.205)

str(imperial)
## 'data.frame':    751 obs. of  19 variables:
##  $ PokeDexNumber  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ PokeName       : chr  "Bulbasaur" "Ivysaur" "Venusaur" "Charmander" ...
##  $ Type           : chr  "Grass" "Grass" "Grass" "Fire" ...
##  $ OtherType      : chr  "Poison" "Poison" "Poison" "" ...
##  $ SumOfAttack    : int  318 405 525 309 405 534 314 405 530 195 ...
##  $ HitPoints      : int  45 60 80 39 58 78 44 59 79 45 ...
##  $ AttStrg        : int  49 62 82 52 64 84 48 63 83 30 ...
##  $ DefStrg        : int  49 63 83 43 58 78 65 80 100 35 ...
##  $ SpAttStrg      : int  65 80 100 60 80 109 50 65 85 20 ...
##  $ SpDefStrg      : int  65 80 100 50 65 85 64 80 105 20 ...
##  $ Speed          : int  45 60 80 65 80 100 43 58 78 45 ...
##  $ Generation     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Legendary      : chr  "False" "False" "False" "False" ...
##  $ capture_rate   : int  45 45 45 45 45 45 45 45 45 255 ...
##  $ height_m       : num  0.7 1 2 0.6 1.1 1.7 0.5 1 1.6 0.3 ...
##  $ weight_kg      : num  6.9 13 100 8.5 19 90.5 9 22.5 85.5 2.9 ...
##  $ percentage_male: num  88.1 88.1 88.1 88.1 88.1 88.1 88.1 88.1 88.1 50 ...
##  $ height_in      : num  27.6 39.4 78.7 23.6 43.3 ...
##  $ weight_lb      : num  15.2 28.7 220.5 18.7 41.9 ...

Question 9:

Question: (4 points) It appears that the distributions for height and weight are skewed. When there is skew in your data, what metric should be used for center? Support your answer.

Using this metric, summarise your data to find the center value for the height and weight for each Type.

HINT: CAUTION WITH NA’s.

imperial%>%
  group_by(Type)%>%
  summarise(medHeight=median(height_in, na.rm=TRUE), 
            medWeight=median(weight_lb, na.rm=TRUE))
## # A tibble: 18 × 3
##    Type     medHeight medWeight
##    <chr>        <dbl>     <dbl>
##  1 Bug           31.5      32.0
##  2 Dark          39.4      63.9
##  3 Dragon        70.9     218. 
##  4 Electric      23.6      33.5
##  5 Fairy         23.6      16.5
##  6 Fighting      47.2      88.2
##  7 Fire          39.4      84.3
##  8 Flying        59.1     139. 
##  9 Ghost         41.3      33.1
## 10 Grass         31.5      32.0
## 11 Ground        43.3     150. 
## 12 Ice           43.3     122. 
## 13 Normal        37.4      54.4
## 14 Poison        43.3      59.5
## 15 Psychic       33.5      41.3
## 16 Rock          47.2     132. 
## 17 Steel         35.4     133. 
## 18 Water         39.4      62.8

Question 10: Height Weight Ratio

Question: (4 points) Create a new column for the height (in) / weight (lbs) ratio for each Pokemon. Then find the the Pokemon type with the highest average height-weight ratio.

HINT: CAUTION WITH NA’s.

ratio<-imperial%>%
  mutate(ratio=height_in/weight_lb)%>%
  group_by(Type)%>%
  summarise(avgRatio=mean(ratio, na.rm=TRUE))

## Top 6
head(ratio)
## # A tibble: 6 × 2
##   Type     avgRatio
##   <chr>       <dbl>
## 1 Bug         1.25 
## 2 Dark        0.818
## 3 Dragon      0.955
## 4 Electric    3.61 
## 5 Fairy       3.00 
## 6 Fighting    0.553