DS Salaries Analysis

Cargar datos

Mostrar el código
from kaggle.api.kaggle_api_extended import KaggleApi 
from zipfile import ZipFile
import os
api = KaggleApi() 
api.authenticate()
api.dataset_download_files('ruchi798/data-science-job-salaries')
zf = ZipFile('data-science-job-salaries.zip') 
#los data extraída se guardará en la siguiente carpeta: 
zf.extractall('data/') 
zf.close()
os.remove("data-science-job-salaries.zip")
Mostrar el código
import pandas as pd 
data=pd.read_csv('data/ds_salaries.csv')
data = data.iloc[:,1:]
data.head()
work_year experience_level employment_type job_title salary salary_currency salary_in_usd employee_residence remote_ratio company_location company_size
0 2020 MI FT Data Scientist 70000 EUR 79833 DE 0 DE L
1 2020 SE FT Machine Learning Scientist 260000 USD 260000 JP 0 JP S
2 2020 SE FT Big Data Engineer 85000 GBP 109024 GB 50 GB M
3 2020 MI FT Product Data Analyst 20000 USD 20000 HN 0 HN S
4 2020 SE FT Machine Learning Engineer 150000 USD 150000 US 50 US L

Análisis Exploratorio de datos

str(data)
data.info()
Mostrar el código
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   work_year           607 non-null    int64 
 1   experience_level    607 non-null    object
 2   employment_type     607 non-null    object
 3   job_title           607 non-null    object
 4   salary              607 non-null    int64 
 5   salary_currency     607 non-null    object
 6   salary_in_usd       607 non-null    int64 
 7   employee_residence  607 non-null    object
 8   remote_ratio        607 non-null    int64 
 9   company_location    607 non-null    object
 10  company_size        607 non-null    object
dtypes: int64(4), object(7)
memory usage: 52.3+ KB
Mostrar el código
data.describe()
work_year salary salary_in_usd remote_ratio
count 607.000000 6.070000e+02 607.000000 607.00000
mean 2021.405272 3.240001e+05 112297.869852 70.92257
std 0.692133 1.544357e+06 70957.259411 40.70913
min 2020.000000 4.000000e+03 2859.000000 0.00000
25% 2021.000000 7.000000e+04 62726.000000 50.00000
50% 2022.000000 1.150000e+05 101570.000000 100.00000
75% 2022.000000 1.650000e+05 150000.000000 100.00000
max 2022.000000 3.040000e+07 600000.000000 100.00000

The echo: false option disables the printing of code (only output is displayed).