Filtering and Sorting with Pokemon Data in Python

Data Analysis in Python

Author
Affiliations

John Karuitha

Karatina University, School of Business

Graduate School of Business Administration, University of the Witwatersrand

1 Background

In this project, I analyze Pokemon data. The data is available on Kaggle. The project was part of a course created by FreeCodeCamp. The course is available on YouTube on this link. The course is project based.

Tip

Please visit my rpubs site to see more data projects. Alternatively, copy and paste the link <www.rpubs.com/Karuitha> into your browser. You can also view my linkedin site for my skills and education.My Tableau public profile contains my data visualizations.

Important

Skills & Technologies Applied: Python, Quarto, Data Science, and Machine Learning.

Important

Python is a widely-used programming language known for its simplicity and readability. Created by Guido van Rossum, Python emphasizes code clarity and comes with an extensive standard library. It supports multiple programming paradigms and has a thriving ecosystem of third-party libraries and frameworks. Python is popular for web development, scientific computing, data analysis, and machine learning. Its clean syntax and active community make it an excellent choice for developers of all levels to build a variety of applications efficiently.

2 Objective

The purpose is to illustrate the basics of data analysis using the Python language.

3 Pokemon Data

I start by loading packages.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

Next, I read the data.

pokemon = pd.read_csv("pokemon.csv")
pokemon.head()
abilities against_bug against_dark against_dragon against_electric against_fairy against_fight against_fire against_flying against_ghost ... percentage_male pokedex_number sp_attack sp_defense speed type1 type2 weight_kg generation is_legendary
0 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 1 65 65 45 grass poison 6.9 1 0
1 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 2 80 80 60 grass poison 13.0 1 0
2 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 3 122 120 80 grass poison 100.0 1 0
3 ['Blaze', 'Solar Power'] 0.5 1.0 1.0 1.0 0.5 1.0 0.5 1.0 1.0 ... 88.1 4 60 50 65 fire NaN 8.5 1 0
4 ['Blaze', 'Solar Power'] 0.5 1.0 1.0 1.0 0.5 1.0 0.5 1.0 1.0 ... 88.1 5 80 65 80 fire NaN 19.0 1 0

5 rows × 41 columns

Let us look at the columns and rows in the data.

pokemon.shape
(801, 41)

The data has 801 rows and 41 columns. Let us see the names of the columns.

pokemon.columns
Index(['abilities', 'against_bug', 'against_dark', 'against_dragon',
       'against_electric', 'against_fairy', 'against_fight', 'against_fire',
       'against_flying', 'against_ghost', 'against_grass', 'against_ground',
       'against_ice', 'against_normal', 'against_poison', 'against_psychic',
       'against_rock', 'against_steel', 'against_water', 'attack',
       'base_egg_steps', 'base_happiness', 'base_total', 'capture_rate',
       'classfication', 'defense', 'experience_growth', 'height_m', 'hp',
       'japanese_name', 'name', 'percentage_male', 'pokedex_number',
       'sp_attack', 'sp_defense', 'speed', 'type1', 'type2', 'weight_kg',
       'generation', 'is_legendary'],
      dtype='object')

4 Exploratory Data Analysis

We explore the data by getting the basic information about the data.

pokemon.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 801 entries, 0 to 800
Data columns (total 41 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   abilities          801 non-null    object 
 1   against_bug        801 non-null    float64
 2   against_dark       801 non-null    float64
 3   against_dragon     801 non-null    float64
 4   against_electric   801 non-null    float64
 5   against_fairy      801 non-null    float64
 6   against_fight      801 non-null    float64
 7   against_fire       801 non-null    float64
 8   against_flying     801 non-null    float64
 9   against_ghost      801 non-null    float64
 10  against_grass      801 non-null    float64
 11  against_ground     801 non-null    float64
 12  against_ice        801 non-null    float64
 13  against_normal     801 non-null    float64
 14  against_poison     801 non-null    float64
 15  against_psychic    801 non-null    float64
 16  against_rock       801 non-null    float64
 17  against_steel      801 non-null    float64
 18  against_water      801 non-null    float64
 19  attack             801 non-null    int64  
 20  base_egg_steps     801 non-null    int64  
 21  base_happiness     801 non-null    int64  
 22  base_total         801 non-null    int64  
 23  capture_rate       801 non-null    object 
 24  classfication      801 non-null    object 
 25  defense            801 non-null    int64  
 26  experience_growth  801 non-null    int64  
 27  height_m           781 non-null    float64
 28  hp                 801 non-null    int64  
 29  japanese_name      801 non-null    object 
 30  name               801 non-null    object 
 31  percentage_male    703 non-null    float64
 32  pokedex_number     801 non-null    int64  
 33  sp_attack          801 non-null    int64  
 34  sp_defense         801 non-null    int64  
 35  speed              801 non-null    int64  
 36  type1              801 non-null    object 
 37  type2              417 non-null    object 
 38  weight_kg          781 non-null    float64
 39  generation         801 non-null    int64  
 40  is_legendary       801 non-null    int64  
dtypes: float64(21), int64(13), object(7)
memory usage: 256.7+ KB

Next, lets look at the summary of the data.

pokemon.describe()
against_bug against_dark against_dragon against_electric against_fairy against_fight against_fire against_flying against_ghost against_grass ... height_m hp percentage_male pokedex_number sp_attack sp_defense speed weight_kg generation is_legendary
count 801.000000 801.000000 801.000000 801.000000 801.000000 801.000000 801.000000 801.000000 801.000000 801.000000 ... 781.000000 801.000000 703.000000 801.000000 801.000000 801.000000 801.000000 781.000000 801.000000 801.000000
mean 0.996255 1.057116 0.968789 1.073970 1.068976 1.065543 1.135456 1.192884 0.985019 1.034020 ... 1.163892 68.958801 55.155761 401.000000 71.305868 70.911361 66.334582 61.378105 3.690387 0.087391
std 0.597248 0.438142 0.353058 0.654962 0.522167 0.717251 0.691853 0.604488 0.558256 0.788896 ... 1.080326 26.576015 20.261623 231.373075 32.353826 27.942501 28.907662 109.354766 1.930420 0.282583
min 0.250000 0.250000 0.000000 0.000000 0.250000 0.000000 0.250000 0.250000 0.000000 0.250000 ... 0.100000 1.000000 0.000000 1.000000 10.000000 20.000000 5.000000 0.100000 1.000000 0.000000
25% 0.500000 1.000000 1.000000 0.500000 1.000000 0.500000 0.500000 1.000000 1.000000 0.500000 ... 0.600000 50.000000 50.000000 201.000000 45.000000 50.000000 45.000000 9.000000 2.000000 0.000000
50% 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 ... 1.000000 65.000000 50.000000 401.000000 65.000000 66.000000 65.000000 27.300000 4.000000 0.000000
75% 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 2.000000 1.000000 1.000000 1.000000 ... 1.500000 80.000000 50.000000 601.000000 91.000000 90.000000 85.000000 64.800000 5.000000 0.000000
max 4.000000 4.000000 2.000000 4.000000 4.000000 4.000000 4.000000 4.000000 4.000000 4.000000 ... 14.500000 255.000000 100.000000 801.000000 194.000000 230.000000 180.000000 999.900000 7.000000 1.000000

8 rows × 34 columns

5 Questions

In this section, we pose questions that we will attempt to answer from the data.

5.1 How Many Pokemons Exist with an Attack Value Greater than 150

Lets explore the distribution of Pokemon attack values.

sns.boxplot(data=pokemon, x="attack")
<AxesSubplot:xlabel='attack'>

Now lets see the Pokemons that meet this condition.

pokemon.loc[pokemon['attack'] > 150, 'name'].shape
(16,)

We see there are 16 pokemons that meet this condition. Note that we can also use the query method to get the same result.

pokemon.query('attack > 150').shape
(16, 41)

5.2 Select Pokemons with a Speed of 10 or Less

We use both the query method and the loc method to extract the data. Note that in both cases we get the 6 results.

pokemon.loc[pokemon['speed'] <= 10, ["name", "speed"]]
name speed
212 Shuckle 5
327 Trapinch 10
437 Bonsly 10
445 Munchlax 5
596 Ferroseed 10
770 Pyukumuku 5
pokemon.query("speed <= 10")[["name", "speed"]]
name speed
212 Shuckle 5
327 Trapinch 10
437 Bonsly 10
445 Munchlax 5
596 Ferroseed 10
770 Pyukumuku 5

5.3 How Many Pokemons have a Special Defense (sp_defense) Value of 25 or Less?

As usual, we follow the same rules, either using loc or query methods.

pokemon.loc[pokemon['sp_defense'] <= 25, ['name', 'sp_defense']].shape
(17, 2)

The query method gives the same 17 results.

pokemon.query('sp_defense <= 25').shape
(17, 41)

5.4 Select all the Legendary Pokemon

Note the variable legendary is a Boolean variable. hence, we can just query as follows.

pokemon['is_legendary'].sum()
70

Here, we get all the legendary column. The sum function will add up the booleans to return the 70 pokemons that are legendary. We could do this the old way.

pokemon.loc[pokemon['is_legendary'] == True].shape
(70, 41)

We can simplify this filter as follows:

pokemon.loc[pokemon['is_legendary']]
abilities against_bug against_dark against_dragon against_electric against_fairy against_fight against_fire against_flying against_ghost ... percentage_male pokedex_number sp_attack sp_defense speed type1 type2 weight_kg generation is_legendary
0 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 1 65 65 45 grass poison 6.9 1 0
0 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 1 65 65 45 grass poison 6.9 1 0
0 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 1 65 65 45 grass poison 6.9 1 0
0 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 1 65 65 45 grass poison 6.9 1 0
0 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 1 65 65 45 grass poison 6.9 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 2 80 80 60 grass poison 13.0 1 0
1 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 2 80 80 60 grass poison 13.0 1 0
1 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 2 80 80 60 grass poison 13.0 1 0
1 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 2 80 80 60 grass poison 13.0 1 0
1 ['Overgrow', 'Chlorophyll'] 1.0 1.0 1.0 0.5 0.5 0.5 2.0 2.0 1.0 ... 88.1 2 80 80 60 grass poison 13.0 1 0

801 rows × 41 columns

This works because is+_legendary is already a boolean value.

pokemon.query('is_legendary == True').shape
(70, 41)

5.5 Find the Outlier Based on Attack and Defense Values.

Based on the scatter plot below, we see there is an outlier in the bottom right corner. This is our target in this section.

sns.scatterplot(data=pokemon, x="defense", y="attack")
<AxesSubplot:xlabel='defense', ylabel='attack'>

Lets sort the data to get this value.

pokemon.sort_values(by="defense", ascending=False)[
    ["attack", "defense"]].head()
attack defense
305 140 230
212 10 230
207 125 230
376 100 200
712 117 184

6 Advanced Selection

In this section, we get deeper into selection of data. This will involve a selection that has more than one condition (using & or |). As in the previous section, we pose questions and attempt to solve them.

6.1 How Many Fire-Flying Pokemons are There?

Note that putting the round brackers () is very important, otherwise the query will not work as it gets ambigous.

# pokemon.head()[['type1']]

pokemon.loc[(pokemon['type1'] == 'fire') & (pokemon['type2'] == 'flying')]
abilities against_bug against_dark against_dragon against_electric against_fairy against_fight against_fire against_flying against_ghost ... percentage_male pokedex_number sp_attack sp_defense speed type1 type2 weight_kg generation is_legendary
5 ['Blaze', 'Solar Power'] 0.25 1.0 1.0 2.0 0.5 0.5 0.5 1.0 1.0 ... 88.1 6 159 115 100 fire flying 90.5 1 0
145 ['Pressure', 'Flame Body'] 0.25 1.0 1.0 2.0 0.5 0.5 0.5 1.0 1.0 ... NaN 146 125 85 90 fire flying 60.0 1 1
249 ['Pressure', 'Regenerator'] 0.25 1.0 1.0 2.0 0.5 0.5 0.5 1.0 1.0 ... NaN 250 110 154 90 fire flying 199.0 2 1
661 ['Flame Body', 'Gale Wings'] 0.25 1.0 1.0 2.0 0.5 0.5 0.5 1.0 1.0 ... 50.0 662 56 52 84 fire flying 16.0 6 0
662 ['Flame Body', 'Gale Wings'] 0.25 1.0 1.0 2.0 0.5 0.5 0.5 1.0 1.0 ... 50.0 663 74 69 126 fire flying 24.5 6 0
740 ['Dancer'] 0.25 1.0 1.0 2.0 0.5 0.5 0.5 1.0 1.0 ... 24.6 741 98 70 93 fire flying 3.4 7 0

6 rows × 41 columns

6.2 How Many Poison Pokemons do we Have Across Both Types

Here, let us start with the query method. Note that variable names are not in quotes in this method, but the filtering conditions are. Again, if variable names have spaces, we must sorround them with back ticks, ``.

pokemon.query("type1 == 'poison' | type2 == 'poison'")
abilities against_bug against_dark against_dragon against_electric against_fairy against_fight against_fire against_flying against_ghost ... percentage_male pokedex_number sp_attack sp_defense speed type1 type2 weight_kg generation is_legendary
0 ['Overgrow', 'Chlorophyll'] 1.00 1.0 1.0 0.5 0.50 0.50 2.0 2.0 1.0 ... 88.1 1 65 65 45 grass poison 6.9 1 0
1 ['Overgrow', 'Chlorophyll'] 1.00 1.0 1.0 0.5 0.50 0.50 2.0 2.0 1.0 ... 88.1 2 80 80 60 grass poison 13.0 1 0
2 ['Overgrow', 'Chlorophyll'] 1.00 1.0 1.0 0.5 0.50 0.50 2.0 2.0 1.0 ... 88.1 3 122 120 80 grass poison 100.0 1 0
12 ['Shield Dust', 'Run Away'] 0.50 1.0 1.0 1.0 0.50 0.25 2.0 2.0 1.0 ... 50.0 13 20 20 50 bug poison 3.2 1 0
13 ['Shed Skin'] 0.50 1.0 1.0 1.0 0.50 0.25 2.0 2.0 1.0 ... 50.0 14 25 25 35 bug poison 10.0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
746 ['Merciless', 'Limber', 'Regenerator'] 0.50 1.0 1.0 2.0 0.50 0.50 0.5 1.0 1.0 ... 50.0 747 43 52 45 poison water 8.0 7 0
747 ['Merciless', 'Limber', 'Regenerator'] 0.50 1.0 1.0 2.0 0.50 0.50 0.5 1.0 1.0 ... 50.0 748 53 142 35 poison water 14.5 7 0
756 ['Corrosion', 'Oblivious'] 0.25 1.0 1.0 1.0 0.25 0.50 0.5 1.0 1.0 ... 88.1 757 71 40 77 poison fire 4.8 7 0
757 ['Corrosion', 'Oblivious'] 0.25 1.0 1.0 1.0 0.25 0.50 0.5 1.0 1.0 ... 0.0 758 111 60 117 poison fire 22.2 7 0
792 ['Beast Boost'] 0.50 1.0 1.0 1.0 0.50 1.00 0.5 0.5 1.0 ... NaN 793 127 131 103 rock poison 55.5 7 1

64 rows × 41 columns

Now lets do the same with the loc method.

pokemon.loc[(pokemon['type1'] == 'poison') | (pokemon['type2'] == 'poison')]
abilities against_bug against_dark against_dragon against_electric against_fairy against_fight against_fire against_flying against_ghost ... percentage_male pokedex_number sp_attack sp_defense speed type1 type2 weight_kg generation is_legendary
0 ['Overgrow', 'Chlorophyll'] 1.00 1.0 1.0 0.5 0.50 0.50 2.0 2.0 1.0 ... 88.1 1 65 65 45 grass poison 6.9 1 0
1 ['Overgrow', 'Chlorophyll'] 1.00 1.0 1.0 0.5 0.50 0.50 2.0 2.0 1.0 ... 88.1 2 80 80 60 grass poison 13.0 1 0
2 ['Overgrow', 'Chlorophyll'] 1.00 1.0 1.0 0.5 0.50 0.50 2.0 2.0 1.0 ... 88.1 3 122 120 80 grass poison 100.0 1 0
12 ['Shield Dust', 'Run Away'] 0.50 1.0 1.0 1.0 0.50 0.25 2.0 2.0 1.0 ... 50.0 13 20 20 50 bug poison 3.2 1 0
13 ['Shed Skin'] 0.50 1.0 1.0 1.0 0.50 0.25 2.0 2.0 1.0 ... 50.0 14 25 25 35 bug poison 10.0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
746 ['Merciless', 'Limber', 'Regenerator'] 0.50 1.0 1.0 2.0 0.50 0.50 0.5 1.0 1.0 ... 50.0 747 43 52 45 poison water 8.0 7 0
747 ['Merciless', 'Limber', 'Regenerator'] 0.50 1.0 1.0 2.0 0.50 0.50 0.5 1.0 1.0 ... 50.0 748 53 142 35 poison water 14.5 7 0
756 ['Corrosion', 'Oblivious'] 0.25 1.0 1.0 1.0 0.25 0.50 0.5 1.0 1.0 ... 88.1 757 71 40 77 poison fire 4.8 7 0
757 ['Corrosion', 'Oblivious'] 0.25 1.0 1.0 1.0 0.25 0.50 0.5 1.0 1.0 ... 0.0 758 111 60 117 poison fire 22.2 7 0
792 ['Beast Boost'] 0.50 1.0 1.0 1.0 0.50 1.00 0.5 0.5 1.0 ... NaN 793 127 131 103 rock poison 55.5 7 1

64 rows × 41 columns

There are 64 such Pokemons.

6.3 Which Pokemon of Type1 == Ice has the Strongest defense?

Here we pick the ice type pokemons, then sort them in descending mode. We see that smoochum has the best defence.

pokemon.loc[(pokemon["type1"] == "ice")].sort_values(
    by="defense", ascending=True)[['name', 'type1']].head()
name type1
237 Smoochum ice
123 Jynx ice
612 Cubchoo ice
219 Swinub ice
224 Delibird ice

6.4 Which is the Most Common Type1 Pokemon is_legendary?

Here, we pick the type 1 pokemons that are legendary and count the values. We see that the psychic is the most common pokemon.

pokemon.loc[pokemon['is_legendary'] == 1]['type1'].value_counts()
psychic     17
dragon       7
water        6
steel        6
electric     5
fire         5
rock         4
grass        4
normal       3
dark         3
bug          3
ice          2
ground       2
ghost        1
flying       1
fairy        1
Name: type1, dtype: int64

We can combine this to create a bar graph.

pokemon.loc[pokemon['is_legendary'] == 1]['type1'].value_counts().plot(
    kind="bar", color="steelblue", title="Pokemons by type 1")
<AxesSubplot:title={'center':'Pokemons by type 1'}>

6.5 What is the most Powerful Pokemon by Attack for the first 3 Generations , of Type Water?

Here, I get the first three generations of type water and sort the attack values by attack. Gyarados comes up on top.

pokemon.loc[pokemon['generation'].isin([1, 2, 3])][pokemon['type1'] == "water"][[
    "name", "attack"]].sort_values(by="attack", ascending=False).head()
name attack
129 Gyarados 155
381 Kyogre 150
259 Swampert 150
318 Sharpedo 140
98 Kingler 130

7 Conclusion

In this project, I analyzed Pokemon data. The purpose was to illustrate the basics of data analysis using the Python language, especially the subsetting of data.