1 Background

In this project, I analyze Pokemon data. The data is available on Kaggle. The project was part of a course created by FreeCodeCamp. The course is available on YouTube on this link. The course is project based.

Tip

Please visit my rpubs site to see more data projects. Alternatively, copy and paste the link <www.rpubs.com/Karuitha> into your browser. You can also view my linkedin site for my skills and education.My Tableau public profile contains my data visualizations.

Important

Skills & Technologies Applied: Python, Quarto, Data Science, and Machine Learning.

Important

Python is a widely-used programming language known for its simplicity and readability. Created by Guido van Rossum, Python emphasizes code clarity and comes with an extensive standard library. It supports multiple programming paradigms and has a thriving ecosystem of third-party libraries and frameworks. Python is popular for web development, scientific computing, data analysis, and machine learning. Its clean syntax and active community make it an excellent choice for developers of all levels to build a variety of applications efficiently.

2 Objective

The purpose is to illustrate the basics of data analysis using the Python language.

3 Pokemon Data

I start by loading packages.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

Next, I read the data.

pokemon = pd.read_csv("pokemon.csv")
pokemon.head()

	abilities	against_bug	against_dark	against_dragon	against_electric	against_fairy	against_fight	against_fire	against_flying	against_ghost	...	percentage_male	pokedex_number	sp_attack	sp_defense	speed	type1	type2	weight_kg	generation
0	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	1	65	65	45	grass	poison	6.9	1
1	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	2	80	80	60	grass	poison	13.0	1
2	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	3	122	120	80	grass	poison	100.0	1
3	['Blaze', 'Solar Power']	0.5	1.0	1.0	1.0	0.5	1.0	0.5	1.0	1.0	...	88.1	4	60	50	65	fire	NaN	8.5	1
4	['Blaze', 'Solar Power']	0.5	1.0	1.0	1.0	0.5	1.0	0.5	1.0	1.0	...	88.1	5	80	65	80	fire	NaN	19.0	1

5 rows × 41 columns

Let us look at the columns and rows in the data.

pokemon.shape

(801, 41)

The data has 801 rows and 41 columns. Let us see the names of the columns.

pokemon.columns

Index(['abilities', 'against_bug', 'against_dark', 'against_dragon',
       'against_electric', 'against_fairy', 'against_fight', 'against_fire',
       'against_flying', 'against_ghost', 'against_grass', 'against_ground',
       'against_ice', 'against_normal', 'against_poison', 'against_psychic',
       'against_rock', 'against_steel', 'against_water', 'attack',
       'base_egg_steps', 'base_happiness', 'base_total', 'capture_rate',
       'classfication', 'defense', 'experience_growth', 'height_m', 'hp',
       'japanese_name', 'name', 'percentage_male', 'pokedex_number',
       'sp_attack', 'sp_defense', 'speed', 'type1', 'type2', 'weight_kg',
       'generation', 'is_legendary'],
      dtype='object')

4 Exploratory Data Analysis

We explore the data by getting the basic information about the data.

pokemon.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 801 entries, 0 to 800
Data columns (total 41 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   abilities          801 non-null    object 
 1   against_bug        801 non-null    float64
 2   against_dark       801 non-null    float64
 3   against_dragon     801 non-null    float64
 4   against_electric   801 non-null    float64
 5   against_fairy      801 non-null    float64
 6   against_fight      801 non-null    float64
 7   against_fire       801 non-null    float64
 8   against_flying     801 non-null    float64
 9   against_ghost      801 non-null    float64
 10  against_grass      801 non-null    float64
 11  against_ground     801 non-null    float64
 12  against_ice        801 non-null    float64
 13  against_normal     801 non-null    float64
 14  against_poison     801 non-null    float64
 15  against_psychic    801 non-null    float64
 16  against_rock       801 non-null    float64
 17  against_steel      801 non-null    float64
 18  against_water      801 non-null    float64
 19  attack             801 non-null    int64  
 20  base_egg_steps     801 non-null    int64  
 21  base_happiness     801 non-null    int64  
 22  base_total         801 non-null    int64  
 23  capture_rate       801 non-null    object 
 24  classfication      801 non-null    object 
 25  defense            801 non-null    int64  
 26  experience_growth  801 non-null    int64  
 27  height_m           781 non-null    float64
 28  hp                 801 non-null    int64  
 29  japanese_name      801 non-null    object 
 30  name               801 non-null    object 
 31  percentage_male    703 non-null    float64
 32  pokedex_number     801 non-null    int64  
 33  sp_attack          801 non-null    int64  
 34  sp_defense         801 non-null    int64  
 35  speed              801 non-null    int64  
 36  type1              801 non-null    object 
 37  type2              417 non-null    object 
 38  weight_kg          781 non-null    float64
 39  generation         801 non-null    int64  
 40  is_legendary       801 non-null    int64  
dtypes: float64(21), int64(13), object(7)
memory usage: 256.7+ KB

Next, lets look at the summary of the data.

pokemon.describe()

	against_bug	against_dark	against_dragon	against_electric	against_fairy	against_fight	against_fire	against_flying	against_ghost	against_grass	...	height_m	hp	percentage_male	pokedex_number	sp_attack	sp_defense	speed	weight_kg	generation	is_legendary
count	801.000000	801.000000	801.000000	801.000000	801.000000	801.000000	801.000000	801.000000	801.000000	801.000000	...	781.000000	801.000000	703.000000	801.000000	801.000000	801.000000	801.000000	781.000000	801.000000	801.000000
mean	0.996255	1.057116	0.968789	1.073970	1.068976	1.065543	1.135456	1.192884	0.985019	1.034020	...	1.163892	68.958801	55.155761	401.000000	71.305868	70.911361	66.334582	61.378105	3.690387	0.087391
std	0.597248	0.438142	0.353058	0.654962	0.522167	0.717251	0.691853	0.604488	0.558256	0.788896	...	1.080326	26.576015	20.261623	231.373075	32.353826	27.942501	28.907662	109.354766	1.930420	0.282583
min	0.250000	0.250000	0.000000	0.000000	0.250000	0.000000	0.250000	0.250000	0.000000	0.250000	...	0.100000	1.000000	0.000000	1.000000	10.000000	20.000000	5.000000	0.100000	1.000000	0.000000
25%	0.500000	1.000000	1.000000	0.500000	1.000000	0.500000	0.500000	1.000000	1.000000	0.500000	...	0.600000	50.000000	50.000000	201.000000	45.000000	50.000000	45.000000	9.000000	2.000000	0.000000
50%	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	65.000000	50.000000	401.000000	65.000000	66.000000	65.000000	27.300000	4.000000	0.000000
75%	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	2.000000	1.000000	1.000000	1.000000	...	1.500000	80.000000	50.000000	601.000000	91.000000	90.000000	85.000000	64.800000	5.000000	0.000000
max	4.000000	4.000000	2.000000	4.000000	4.000000	4.000000	4.000000	4.000000	4.000000	4.000000	...	14.500000	255.000000	100.000000	801.000000	194.000000	230.000000	180.000000	999.900000	7.000000	1.000000

8 rows × 34 columns

5 Questions

In this section, we pose questions that we will attempt to answer from the data.

5.1 How Many Pokemons Exist with an Attack Value Greater than 150

Lets explore the distribution of Pokemon attack values.

sns.boxplot(data=pokemon, x="attack")

<AxesSubplot:xlabel='attack'>

Now lets see the Pokemons that meet this condition.

pokemon.loc[pokemon['attack'] > 150, 'name'].shape

(16,)

We see there are 16 pokemons that meet this condition. Note that we can also use the query method to get the same result.

pokemon.query('attack > 150').shape

(16, 41)

5.2 Select Pokemons with a Speed of 10 or Less

We use both the query method and the loc method to extract the data. Note that in both cases we get the 6 results.

pokemon.loc[pokemon['speed'] <= 10, ["name", "speed"]]

	name	speed
212	Shuckle	5
327	Trapinch	10
437	Bonsly	10
445	Munchlax	5
596	Ferroseed	10
770	Pyukumuku	5

pokemon.query("speed <= 10")[["name", "speed"]]

	name	speed
212	Shuckle	5
327	Trapinch	10
437	Bonsly	10
445	Munchlax	5
596	Ferroseed	10
770	Pyukumuku	5

5.3 How Many Pokemons have a Special Defense (sp_defense) Value of 25 or Less?

As usual, we follow the same rules, either using loc or query methods.

pokemon.loc[pokemon['sp_defense'] <= 25, ['name', 'sp_defense']].shape

(17, 2)

The query method gives the same 17 results.

pokemon.query('sp_defense <= 25').shape

(17, 41)

5.4 Select all the Legendary Pokemon

Note the variable legendary is a Boolean variable. hence, we can just query as follows.

pokemon['is_legendary'].sum()

Here, we get all the legendary column. The sum function will add up the booleans to return the 70 pokemons that are legendary. We could do this the old way.

pokemon.loc[pokemon['is_legendary'] == True].shape

(70, 41)

We can simplify this filter as follows:

pokemon.loc[pokemon['is_legendary']]

	abilities	against_bug	against_dark	against_dragon	against_electric	against_fairy	against_fight	against_fire	against_flying	against_ghost	...	percentage_male	pokedex_number	sp_attack	sp_defense	speed	type1	type2	weight_kg	generation	is_legendary
0	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	1	65	65	45	grass	poison	6.9	1	0
0	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	1	65	65	45	grass	poison	6.9	1	0
0	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	1	65	65	45	grass	poison	6.9	1	0
0	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	1	65	65	45	grass	poison	6.9	1	0
0	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	1	65	65	45	grass	poison	6.9	1	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	2	80	80	60	grass	poison	13.0	1	0
1	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	2	80	80	60	grass	poison	13.0	1	0
1	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	2	80	80	60	grass	poison	13.0	1	0
1	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	2	80	80	60	grass	poison	13.0	1	0
1	['Overgrow', 'Chlorophyll']	1.0	1.0	1.0	0.5	0.5	0.5	2.0	2.0	1.0	...	88.1	2	80	80	60	grass	poison	13.0	1	0

801 rows × 41 columns

This works because is+_legendary is already a boolean value.

pokemon.query('is_legendary == True').shape

(70, 41)

5.5 Find the Outlier Based on Attack and Defense Values.

Based on the scatter plot below, we see there is an outlier in the bottom right corner. This is our target in this section.

sns.scatterplot(data=pokemon, x="defense", y="attack")

<AxesSubplot:xlabel='defense', ylabel='attack'>

Lets sort the data to get this value.

pokemon.sort_values(by="defense", ascending=False)[
    ["attack", "defense"]].head()

	attack	defense
305	140	230
212	10	230
207	125	230
376	100	200
712	117	184

6 Advanced Selection

In this section, we get deeper into selection of data. This will involve a selection that has more than one condition (using & or |). As in the previous section, we pose questions and attempt to solve them.

6.1 How Many Fire-Flying Pokemons are There?

Note that putting the round brackers () is very important, otherwise the query will not work as it gets ambigous.

# pokemon.head()[['type1']]

pokemon.loc[(pokemon['type1'] == 'fire') & (pokemon['type2'] == 'flying')]

	abilities	against_bug	against_dark	against_dragon	against_electric	against_fairy	against_fight	against_fire	against_flying	against_ghost	...	percentage_male	pokedex_number	sp_attack	sp_defense	speed	type1	type2	weight_kg	generation	is_legendary
5	['Blaze', 'Solar Power']	0.25	1.0	1.0	2.0	0.5	0.5	0.5	1.0	1.0	...	88.1	6	159	115	100	fire	flying	90.5	1	0
145	['Pressure', 'Flame Body']	0.25	1.0	1.0	2.0	0.5	0.5	0.5	1.0	1.0	...	NaN	146	125	85	90	fire	flying	60.0	1	1
249	['Pressure', 'Regenerator']	0.25	1.0	1.0	2.0	0.5	0.5	0.5	1.0	1.0	...	NaN	250	110	154	90	fire	flying	199.0	2	1
661	['Flame Body', 'Gale Wings']	0.25	1.0	1.0	2.0	0.5	0.5	0.5	1.0	1.0	...	50.0	662	56	52	84	fire	flying	16.0	6	0
662	['Flame Body', 'Gale Wings']	0.25	1.0	1.0	2.0	0.5	0.5	0.5	1.0	1.0	...	50.0	663	74	69	126	fire	flying	24.5	6	0
740	['Dancer']	0.25	1.0	1.0	2.0	0.5	0.5	0.5	1.0	1.0	...	24.6	741	98	70	93	fire	flying	3.4	7	0

6 rows × 41 columns

6.2 How Many Poison Pokemons do we Have Across Both Types

Here, let us start with the query method. Note that variable names are not in quotes in this method, but the filtering conditions are. Again, if variable names have spaces, we must sorround them with back ticks, ``.

pokemon.query("type1 == 'poison' | type2 == 'poison'")

	abilities	against_bug	against_dark	against_dragon	against_electric	against_fairy	against_fight	against_fire	against_flying	against_ghost	...	percentage_male	pokedex_number	sp_attack	sp_defense	speed	type1	type2	weight_kg	generation	is_legendary
0	['Overgrow', 'Chlorophyll']	1.00	1.0	1.0	0.5	0.50	0.50	2.0	2.0	1.0	...	88.1	1	65	65	45	grass	poison	6.9	1	0
1	['Overgrow', 'Chlorophyll']	1.00	1.0	1.0	0.5	0.50	0.50	2.0	2.0	1.0	...	88.1	2	80	80	60	grass	poison	13.0	1	0
2	['Overgrow', 'Chlorophyll']	1.00	1.0	1.0	0.5	0.50	0.50	2.0	2.0	1.0	...	88.1	3	122	120	80	grass	poison	100.0	1	0
12	['Shield Dust', 'Run Away']	0.50	1.0	1.0	1.0	0.50	0.25	2.0	2.0	1.0	...	50.0	13	20	20	50	bug	poison	3.2	1	0
13	['Shed Skin']	0.50	1.0	1.0	1.0	0.50	0.25	2.0	2.0	1.0	...	50.0	14	25	25	35	bug	poison	10.0	1	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
746	['Merciless', 'Limber', 'Regenerator']	0.50	1.0	1.0	2.0	0.50	0.50	0.5	1.0	1.0	...	50.0	747	43	52	45	poison	water	8.0	7	0
747	['Merciless', 'Limber', 'Regenerator']	0.50	1.0	1.0	2.0	0.50	0.50	0.5	1.0	1.0	...	50.0	748	53	142	35	poison	water	14.5	7	0
756	['Corrosion', 'Oblivious']	0.25	1.0	1.0	1.0	0.25	0.50	0.5	1.0	1.0	...	88.1	757	71	40	77	poison	fire	4.8	7	0
757	['Corrosion', 'Oblivious']	0.25	1.0	1.0	1.0	0.25	0.50	0.5	1.0	1.0	...	0.0	758	111	60	117	poison	fire	22.2	7	0
792	['Beast Boost']	0.50	1.0	1.0	1.0	0.50	1.00	0.5	0.5	1.0	...	NaN	793	127	131	103	rock	poison	55.5	7	1

64 rows × 41 columns

Now lets do the same with the loc method.

pokemon.loc[(pokemon['type1'] == 'poison') | (pokemon['type2'] == 'poison')]

	abilities	against_bug	against_dark	against_dragon	against_electric	against_fairy	against_fight	against_fire	against_flying	against_ghost	...	percentage_male	pokedex_number	sp_attack	sp_defense	speed	type1	type2	weight_kg	generation	is_legendary
0	['Overgrow', 'Chlorophyll']	1.00	1.0	1.0	0.5	0.50	0.50	2.0	2.0	1.0	...	88.1	1	65	65	45	grass	poison	6.9	1	0
1	['Overgrow', 'Chlorophyll']	1.00	1.0	1.0	0.5	0.50	0.50	2.0	2.0	1.0	...	88.1	2	80	80	60	grass	poison	13.0	1	0
2	['Overgrow', 'Chlorophyll']	1.00	1.0	1.0	0.5	0.50	0.50	2.0	2.0	1.0	...	88.1	3	122	120	80	grass	poison	100.0	1	0
12	['Shield Dust', 'Run Away']	0.50	1.0	1.0	1.0	0.50	0.25	2.0	2.0	1.0	...	50.0	13	20	20	50	bug	poison	3.2	1	0
13	['Shed Skin']	0.50	1.0	1.0	1.0	0.50	0.25	2.0	2.0	1.0	...	50.0	14	25	25	35	bug	poison	10.0	1	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
746	['Merciless', 'Limber', 'Regenerator']	0.50	1.0	1.0	2.0	0.50	0.50	0.5	1.0	1.0	...	50.0	747	43	52	45	poison	water	8.0	7	0
747	['Merciless', 'Limber', 'Regenerator']	0.50	1.0	1.0	2.0	0.50	0.50	0.5	1.0	1.0	...	50.0	748	53	142	35	poison	water	14.5	7	0
756	['Corrosion', 'Oblivious']	0.25	1.0	1.0	1.0	0.25	0.50	0.5	1.0	1.0	...	88.1	757	71	40	77	poison	fire	4.8	7	0
757	['Corrosion', 'Oblivious']	0.25	1.0	1.0	1.0	0.25	0.50	0.5	1.0	1.0	...	0.0	758	111	60	117	poison	fire	22.2	7	0
792	['Beast Boost']	0.50	1.0	1.0	1.0	0.50	1.00	0.5	0.5	1.0	...	NaN	793	127	131	103	rock	poison	55.5	7	1

64 rows × 41 columns

There are 64 such Pokemons.

6.3 Which Pokemon of Type1 == Ice has the Strongest defense?

Here we pick the ice type pokemons, then sort them in descending mode. We see that smoochum has the best defence.

pokemon.loc[(pokemon["type1"] == "ice")].sort_values(
    by="defense", ascending=True)[['name', 'type1']].head()

	name	type1
237	Smoochum	ice
123	Jynx	ice
612	Cubchoo	ice
219	Swinub	ice
224	Delibird	ice

6.4 Which is the Most Common Type1 Pokemon is_legendary?

Here, we pick the type 1 pokemons that are legendary and count the values. We see that the psychic is the most common pokemon.

pokemon.loc[pokemon['is_legendary'] == 1]['type1'].value_counts()

psychic     17
dragon       7
water        6
steel        6
electric     5
fire         5
rock         4
grass        4
normal       3
dark         3
bug          3
ice          2
ground       2
ghost        1
flying       1
fairy        1
Name: type1, dtype: int64

We can combine this to create a bar graph.

pokemon.loc[pokemon['is_legendary'] == 1]['type1'].value_counts().plot(
    kind="bar", color="steelblue", title="Pokemons by type 1")

<AxesSubplot:title={'center':'Pokemons by type 1'}>

6.5 What is the most Powerful Pokemon by Attack for the first 3 Generations , of Type Water?

Here, I get the first three generations of type water and sort the attack values by attack. Gyarados comes up on top.

pokemon.loc[pokemon['generation'].isin([1, 2, 3])][pokemon['type1'] == "water"][[
    "name", "attack"]].sort_values(by="attack", ascending=False).head()

	name	attack
129	Gyarados	155
381	Kyogre	150
259	Swampert	150
318	Sharpedo	140
98	Kingler	130

7 Conclusion

In this project, I analyzed Pokemon data. The purpose was to illustrate the basics of data analysis using the Python language, especially the subsetting of data.