title: “EDA_Phyton” author: “Andrea B” date: “2023-11-16” output: pdf_document: pdf_print: paged —
Este conjunto de datos incluye 721 Pokémon, incluido su número, nombre, primer y segundo tipo, y estadísticas básicas: HP, Ataque, Defensa, Ataque Especial, Defensa Especial y Velocidad. Ha sido de gran utilidad para enseñar estadística a los niños.
Para la visualización del conjunto de datos vamos a cargar la siguiente libreria:
library(readr)
Importar la base de datos del archivo Pokemon
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
Pokemon=pd.read_csv(r"C:\Users\Andrea\Desktop\Pokemon.csv")
print(Pokemon)
## # Name Type 1 ... Speed Generation Legendary
## 0 1 Bulbasaur Grass ... 45 1 False
## 1 2 Ivysaur Grass ... 60 1 False
## 2 3 Venusaur Grass ... 80 1 False
## 3 3 VenusaurMega Venusaur Grass ... 80 1 False
## 4 4 Charmander Fire ... 65 1 False
## .. ... ... ... ... ... ... ...
## 795 719 Diancie Rock ... 50 6 True
## 796 719 DiancieMega Diancie Rock ... 110 6 True
## 797 720 HoopaHoopa Confined Psychic ... 70 6 True
## 798 720 HoopaHoopa Unbound Psychic ... 80 6 True
## 799 721 Volcanion Fire ... 70 6 True
##
## [800 rows x 13 columns]
A continuación se realizará el conteo del número de pokemon que se encuentran en cada generación.
Pokemon.value_counts('Type 1')
## Type 1
## Water 112
## Normal 98
## Grass 70
## Bug 69
## Psychic 57
## Fire 52
## Electric 44
## Rock 44
## Ghost 32
## Ground 32
## Dragon 32
## Dark 31
## Poison 28
## Fighting 27
## Steel 27
## Ice 24
## Fairy 17
## Flying 4
## Name: count, dtype: int64
Pokemon.describe()
## # Total HP ... Sp. Def Speed Generation
## count 800.000000 800.00000 800.000000 ... 800.000000 800.000000 800.00000
## mean 362.813750 435.10250 69.258750 ... 71.902500 68.277500 3.32375
## std 208.343798 119.96304 25.534669 ... 27.828916 29.060474 1.66129
## min 1.000000 180.00000 1.000000 ... 20.000000 5.000000 1.00000
## 25% 184.750000 330.00000 50.000000 ... 50.000000 45.000000 2.00000
## 50% 364.500000 450.00000 65.000000 ... 70.000000 65.000000 3.00000
## 75% 539.250000 515.00000 80.000000 ... 90.000000 90.000000 5.00000
## max 721.000000 780.00000 255.000000 ... 230.000000 180.000000 6.00000
##
## [8 rows x 9 columns]
Pokemon["Total"].plot(kind='hist',figsize=(10,8))
Dentro del analisis realizado se relaciona el codigo a continuacion del conteo de los pokemon legendarios arrojando los siguientes datos:
Pokemon["Attack"].isin(["True", "False"])
## 0 False
## 1 False
## 2 False
## 3 False
## 4 False
## ...
## 795 False
## 796 False
## 797 False
## 798 False
## 799 False
## Name: Attack, Length: 800, dtype: bool
Pokemon.info()
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 800 entries, 0 to 799
## Data columns (total 13 columns):
## # Column Non-Null Count Dtype
## --- ------ -------------- -----
## 0 # 800 non-null int64
## 1 Name 800 non-null object
## 2 Type 1 800 non-null object
## 3 Type 2 414 non-null object
## 4 Total 800 non-null int64
## 5 HP 800 non-null int64
## 6 Attack 800 non-null int64
## 7 Defense 800 non-null int64
## 8 Sp. Atk 800 non-null int64
## 9 Sp. Def 800 non-null int64
## 10 Speed 800 non-null int64
## 11 Generation 800 non-null int64
## 12 Legendary 800 non-null bool
## dtypes: bool(1), int64(9), object(3)
## memory usage: 75.9+ KB
Pokemon["Attack"].sum()
## 63201
Pokemon.loc[Pokemon["Generation"]].shape
## (800, 13)
Dentro del analisis de datos de Pokemon vamos a realizar la comparación del tipo más común de pokemon legendario.
Pokemon Legendario
Pokemon.loc[Pokemon["Legendary"],"Type 1"].value_counts()
## Type 1
## Psychic 14
## Dragon 12
## Fire 5
## Electric 4
## Water 4
## Rock 4
## Steel 4
## Ground 4
## Grass 3
## Ice 2
## Normal 2
## Ghost 2
## Dark 2
## Flying 2
## Fairy 1
## Name: count, dtype: int64
Pokemon.loc[Pokemon["Legendary"],"Type 1"].value_counts().plot(kind="bar")
Pokemon No Legendario
ordinary = Pokemon[Pokemon["Legendary"] == False].reset_index(drop=True)
print(ordinary.shape)
## (735, 13)
ordinary.head()
## # Name Type 1 ... Speed Generation Legendary
## 0 1 Bulbasaur Grass ... 45 1 False
## 1 2 Ivysaur Grass ... 60 1 False
## 2 3 Venusaur Grass ... 80 1 False
## 3 3 VenusaurMega Venusaur Grass ... 80 1 False
## 4 4 Charmander Fire ... 65 1 False
##
## [5 rows x 13 columns]
ordinary = Pokemon[Pokemon["Legendary"] == False].value_counts()
ordinary = Pokemon[Pokemon["Legendary"] == False].value_counts().plot(kind="bar")
Durante el desarrollo de este proyecto se pudo visualizar como estan cada uno catalogados cada uno de los pokemons a continuación se muestra un grafico de acuerdo a su categoria.
Pokemon["Type 1"].value_counts().plot(kind="pie",autopct="%1.1f%%",cmap='tab20c',figsize=(10,8))
Su distribución total.
Pokemon.head()
## # Name Type 1 ... Speed Generation Legendary
## 0 1 Bulbasaur Grass ... 45 1 False
## 1 2 Ivysaur Grass ... 60 1 False
## 2 3 Venusaur Grass ... 80 1 False
## 3 3 VenusaurMega Venusaur Grass ... 80 1 False
## 4 4 Charmander Fire ... 65 1 False
##
## [5 rows x 13 columns]
Pokemon["Total"].plot(kind='box',vert=False,figsize=(10,5))
Distribución de pokemon legendarios.
Pokemon["Legendary"].value_counts().plot(kind="pie",autopct="%1.1f%%",cmap="Set3",figsize=(10,8))
Podemos visualizar cual es el pokemon mas poderoso de las 3 primeras generaciones, del tipo agua.
Pokemon["Generation"].value_counts(sort=False).plot(kind="bar")
(
(Pokemon["Generation"]==1)|
(Pokemon["Generation"]==2) |
(Pokemon["Generation"]==3)
).sum()
## 432
Pokemon.loc[
(Pokemon["Type 1"]=="Water")&
Pokemon["Generation"].isin([1,2,3])
].sort_values(by="Total",ascending=False).head()
## # Name Type 1 ... Speed Generation Legendary
## 422 382 KyogrePrimal Kyogre Water ... 90 3 True
## 421 382 Kyogre Water ... 90 3 True
## 141 130 GyaradosMega Gyarados Water ... 81 1 False
## 283 260 SwampertMega Swampert Water ... 70 3 False
## 12 9 BlastoiseMega Blastoise Water ... 78 1 False
##
## [5 rows x 13 columns]
El Pokémon legendario ultrapoderoso se muestra a continuación.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(14, 7))
sns.scatterplot(data=Pokemon, x="Defense", y="Attack", hue='Legendary', ax=ax)
ax.annotate(
"Who's this guy?", xy=(140, 150), xytext=(160, 150), color='red',
arrowprops=dict(arrowstyle="->", color='red')
)
sns.set_palette("husl", 8)
ax = sns.distplot(Pokemon['Attack'])
## <string>:1: UserWarning:
##
## `distplot` is a deprecated function and will be removed in seaborn v0.14.0.
##
## Please adapt your code to use either `displot` (a figure-level function with
## similar flexibility) or `histplot` (an axes-level function for histograms).
##
## For a guide to updating your code to use the new functions, please see
## https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
##
## C:\Users\Andrea\AppData\Local\Programs\Python\PYTHON~1\Lib\site-packages\seaborn\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
## if pd.api.types.is_categorical_dtype(vector):
## C:\Users\Andrea\AppData\Local\Programs\Python\PYTHON~1\Lib\site-packages\seaborn\_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
## with pd.option_context('mode.use_inf_as_na', True):
ax.set_title("Pokemon Attack Distribution", fontdict={'fontsize': 16})
sns.set_palette("husl", 8)
ax = sns.distplot(Pokemon['Defense'])
## <string>:1: UserWarning:
##
## `distplot` is a deprecated function and will be removed in seaborn v0.14.0.
##
## Please adapt your code to use either `displot` (a figure-level function with
## similar flexibility) or `histplot` (an axes-level function for histograms).
##
## For a guide to updating your code to use the new functions, please see
## https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
##
## C:\Users\Andrea\AppData\Local\Programs\Python\PYTHON~1\Lib\site-packages\seaborn\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
## if pd.api.types.is_categorical_dtype(vector):
## C:\Users\Andrea\AppData\Local\Programs\Python\PYTHON~1\Lib\site-packages\seaborn\_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
## with pd.option_context('mode.use_inf_as_na', True):
ax.set_title("Pokemon Defense Distribution", fontdict={'fontsize': 16})
mean_attack_generation = Pokemon.groupby("Generation")["Attack"].mean().sort_values()
print(mean_attack_generation)
## Generation
## 2 72.028302
## 6 75.804878
## 1 76.638554
## 3 81.625000
## 5 82.066667
## 4 82.867769
## Name: Attack, dtype: float64
mean_defense_generation = Pokemon.groupby("Generation")["Defense"].mean().sort_values()
print(mean_defense_generation)
## Generation
## 1 70.861446
## 5 72.327273
## 2 73.386792
## 3 74.100000
## 6 76.682927
## 4 78.132231
## Name: Defense, dtype: float64