Salarios de Científicos de Datos 2023 de kaggle

Para realizar este análisis se utilizó un conjunto de datos que contiene información relevante acerca de los salarios de los Científicos de Datos en el año 2023. El conjunto de datos abarca diversas variables clave que describen distintos aspectos de su empleo y remuneración:

Este conjunto de datos ofrece una visión integral sobre las tendencias de remuneración en el campo de la ciencia de datos y permite realizar análisis detallados sobre cómo variables como la experiencia, el tipo de empleo y la ubicación geográfica, entre otras, pueden influir en los salarios de estos profesionales.

1.Analisis descriptivo

Carga de librerias

## Loading required package: carData
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
## 
##     recode
## The following object is masked from 'package:MASS':
## 
##     select
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Cargar el conjunto de datos
salaries <- read.csv("ds_salaries.csv")

### 1. Análisis Exploratorio de Datos e Informaciones Subyacentes ####
# Ver las primeras filas
head(salaries)
##   work_year experience_level employment_type                job_title salary
## 1      2023               SE              FT Principal Data Scientist  80000
## 2      2023               MI              CT              ML Engineer  30000
## 3      2023               MI              CT              ML Engineer  25500
## 4      2023               SE              FT           Data Scientist 175000
## 5      2023               SE              FT           Data Scientist 120000
## 6      2023               SE              FT        Applied Scientist 222200
##   salary_currency salary_in_usd employee_residence remote_ratio
## 1             EUR         85847                 ES          100
## 2             USD         30000                 US          100
## 3             USD         25500                 US          100
## 4             USD        175000                 CA          100
## 5             USD        120000                 CA          100
## 6             USD        222200                 US            0
##   company_location company_size
## 1               ES            L
## 2               US            S
## 3               US            S
## 4               CA            M
## 5               CA            M
## 6               US            L
# Estadísticas descriptivas
summary(salaries)
##    work_year    experience_level   employment_type     job_title        
##  Min.   :2020   Length:3755        Length:3755        Length:3755       
##  1st Qu.:2022   Class :character   Class :character   Class :character  
##  Median :2022   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2022                                                           
##  3rd Qu.:2023                                                           
##  Max.   :2023                                                           
##      salary         salary_currency    salary_in_usd    employee_residence
##  Min.   :    6000   Length:3755        Min.   :  5132   Length:3755       
##  1st Qu.:  100000   Class :character   1st Qu.: 95000   Class :character  
##  Median :  138000   Mode  :character   Median :135000   Mode  :character  
##  Mean   :  190696                      Mean   :137570                     
##  3rd Qu.:  180000                      3rd Qu.:175000                     
##  Max.   :30400000                      Max.   :450000                     
##   remote_ratio    company_location   company_size      
##  Min.   :  0.00   Length:3755        Length:3755       
##  1st Qu.:  0.00   Class :character   Class :character  
##  Median :  0.00   Mode  :character   Mode  :character  
##  Mean   : 46.27                                        
##  3rd Qu.:100.00                                        
##  Max.   :100.00

Histogramas para las variables numéricas

hist(salaries$salary_in_usd, breaks=50, main="Distribución de Salarios en USD", xlab="Salario (USD)", ylab="Frecuencia", xaxt="n")
axis(1, at=seq(0, max(salaries$salary_in_usd), by=20000), las=2, cex.axis=0.7, labels=format(seq(0, max(salaries$salary_in_usd), by=20000), big.mark=",", scientific=FALSE))

Distribucion positiva asimetrica con posibles outliers hacia el extremo derecho.

Boxplots para comparar salarios por nivel de experiencia y tipo de empleo

boxplot(salary_in_usd ~ experience_level, data = salaries, yaxt = "n",
        main = "Distribución de Salarios por Nivel de Experiencia",
        xlab = "Nivel de Experiencia",
        ylab = "Salario en USD")

 
axis(2, at = axTicks(2), labels = format(axTicks(2), big.mark = ",", scientific = FALSE))

Boxplots para comparar Distribución de Salarios por Tipo de Empleo

boxplot(salary_in_usd ~ employment_type, data = salaries, yaxt = 'n',
        main = "Distribución de Salarios por Tipo de Empleo",
        xlab = "Tipo de Empleo",
        ylab = "Salario en USD")

 
salario_ticks <- pretty(salaries$salary_in_usd)  
axis(2, at = salario_ticks, labels = format(salario_ticks, big.mark = ",", scientific = FALSE))

Pruebas de normalidad para ‘salary_in_usd’

shapiro.test(salaries$salary_in_usd)
## 
##  Shapiro-Wilk normality test
## 
## data:  salaries$salary_in_usd
## W = 0.98273, p-value < 2.2e-16

W = 0.98273 la prueba de shapiro prueba que esta cerca uno en su normalidad

p-value < 2.2e-16 lleva a rechazar la hipótesis nula.

Dado que el valor p es extremadamente bajo, rechazamos la hipótesis nula y concluimos que hay

evidencia estadísticamente significativa de que la distribución de los salarios no sigue una distribución normal.

Frecuencia de categorías

table(salaries$experience_level)
## 
##   EN   EX   MI   SE 
##  320  114  805 2516

EN: Hay 320 entradas para individuos con un nivel de experiencia de principiante (“Entry level”).

EX: Hay 114 entradas para individuos con un nivel de experiencia de experto (“Expert level”).

MI: Hay 805 entradas para individuos con un nivel de experiencia intermedio (“Mid level”).

SE: Hay 2516 entradas para individuos con un nivel de experiencia senior (“Senior level”).

Esto indica que la mayoría de los datos corresponden a personas con un nivel de experiencia “Senior”, seguido por aquellos con un nivel de experiencia “Mid”. Los niveles “Entry” y “Expert” tienen significativamente menos representación en este conjunto de datos.

prop.table(table(salaries$employment_type))
## 
##          CT          FL          FT          PT 
## 0.002663116 0.002663116 0.990146471 0.004527297

CT (“Contract”): Constituye aproximadamente el 0.266% del conjunto de datos.

FL (“Freelance”): También representa aproximadamente el 0.266% del conjunto de datos.

FT (“Full-time”): Es la categoría dominante, comprendiendo aproximadamente el 99.01% del conjunto de datos.

PT (“Part-time”): Constituye aproximadamente el 0.453% del conjunto de datos.

Estas proporciones muestran que la gran mayoría de los datos corresponden a empleados a tiempo completo (“Full-time”). Los contratos, trabajos freelance y a tiempo parcial tienen una representación mucho menor.

Tendencias a lo largo del tiempo

ggplot(salaries, aes(x = work_year, y = salary_in_usd)) + 
  geom_line(stat = "summary", fun = "mean")

Se puede apreciar una tendencia a la alza de los salary_in_usd

Relaciones entre variables salario, segun la experiencia.

ggplot(salaries, aes(x = experience_level, y = salary_in_usd)) + 
  geom_boxplot() +
  scale_y_continuous(labels = comma_format())

Se puede apreciar outlier mas llamativos en experience_level.

MI (Mid-level): Nivel intermedio. SE (Senior-level): Nivel senior o avanzado.

Análisis de subgrupos experiencia, segun promedio salario usd

salaries %>%
  group_by(experience_level) %>%
  summarise(Mean_Salary = mean(salary_in_usd, na.rm = TRUE))
## # A tibble: 4 × 2
##   experience_level Mean_Salary
##   <chr>                  <dbl>
## 1 EN                    78546.
## 2 EX                   194931.
## 3 MI                   104526.
## 4 SE                   153051.

Se ve claramente que EX (Executive-level): Nivel directivo o ejecutivo tiene un mayor salary_in_usd promedio.

Detección de outliers

salaries$job_category <- case_when(
  salaries$job_title %in% c('Data Scientist', 'Senior Data Scientist', 'Junior Data Scientist', 'Lead Data Scientist', 'Principal Data Scientist', 'Data Science Lead', 'Data Science Manager', 'Data Science Tech Lead', 'Data Science Consultant', 'Data Science Engineer', 'Staff Data Scientist') ~ 'Data Scientist',
  salaries$job_title %in% c('Data Engineer', 'Senior Data Engineer', 'Lead Data Engineer', 'Principal Data Engineer', 'Data DevOps Engineer', 'Data Infrastructure Engineer', 'Data Operations Engineer', 'Data Quality Engineer', 'Software Data Engineer') ~ 'Data Engineer',
  salaries$job_title %in% c('Machine Learning Engineer', 'Senior Machine Learning Engineer', 'Lead Machine Learning Engineer', 'Principal Machine Learning Engineer', 'Machine Learning Infrastructure Engineer', 'Machine Learning Manager', 'Machine Learning Research Engineer', 'Machine Learning Researcher', 'Machine Learning Scientist', 'Machine Learning Software Engineer') ~ 'Machine Learning Professional',
  salaries$job_title %in% c('Data Analyst', 'Senior Data Analyst', 'Lead Data Analyst', 'Principal Data Analyst', 'Data Analytics Consultant', 'Data Analytics Engineer', 'Data Analytics Lead', 'Data Analytics Manager', 'Data Analytics Specialist', 'Business Data Analyst', 'Finance Data Analyst', 'Financial Data Analyst', 'Marketing Data Analyst', 'Product Data Analyst', 'BI Analyst', 'BI Data Analyst', 'BI Developer', 'Insight Analyst', 'Staff Data Analyst') ~ 'Data Analyst',
  salaries$job_title %in% c('Data Architect', 'Senior Data Architect', 'Lead Data Architect', 'Principal Data Architect', 'Cloud Data Architect', 'Big Data Architect', 'Big Data Engineer', 'Cloud Data Engineer', 'Cloud Database Engineer', 'Azure Data Engineer', 'ETL Developer', 'ETL Engineer', 'Data Manager', 'Data Lead', 'Data Modeler', 'Data Operations Analyst', 'Data Specialist', 'Data Strategist', 'Head of Data', 'Head of Data Science', 'Director of Data Science', 'NLP Engineer', 'Power BI Developer', 'Research Engineer', 'Research Scientist', 'Compliance Data Analyst', 'Computer Vision Engineer', 'Computer Vision Software Engineer', 'Deep Learning Engineer', 'Deep Learning Researcher', 'ML Engineer', 'MLOps Engineer', 'Machine Learning Developer', '3D Computer Vision Researcher', 'AI Developer', 'AI Programmer', 'AI Scientist', 'Analytics Engineer', 'Applied Data Scientist', 'Applied Machine Learning Engineer', 'Applied Machine Learning Scientist', 'Applied Scientist', 'Autonomous Vehicle Technician', 'Marketing Data Engineer', 'Product Data Scientist', 'Manager Data Management', 'Staff Data Scientist') ~ 'Other',
  TRUE ~ 'Unknown'  
)

ggplot(salaries, aes(x = job_category, y = salary_in_usd)) +
  geom_boxplot() +
  scale_y_continuous(labels = scales::comma_format()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Se puede apreciar valores atipicos(outliers) en Data Analyst , data Scientist, y other, se tendra que revisar los valores atipicos.

2. Identificación, Estimación y Validación del Modelo

Construir un modelo lineal y ver modelo

model <- lm(salary_in_usd ~ experience_level + employment_type + job_title + remote_ratio + company_location + company_size, data = salaries)
# Resumen del modelo
#summary(model)

Chequeo de la colinealidad

vif(model)
##                          GVIF Df GVIF^(1/(2*Df))
## experience_level     2.023672  3        1.124665
## employment_type      3.150707  3        1.210788
## job_title         8769.100430 92        1.050580
## remote_ratio         1.117212  1        1.056983
## company_location 10786.523771 71        1.067580
## company_size         2.127758  2        1.207760
# Identificación de outliers
plot(model, which = 1)

Se chequeo la colinealidad, y job_title tiene un VIF extremadamente alto (879.10403), lo que sugiere que está altamente correlacionada con otras variables en el modelo y podría estar causando problemas de multicolinealidad.

Se chequeo los valores atipicos (outliers), y un patrón en forma de embudo, que sugiere que la varianza de los residuos podría aumentar con los valores ajustados, una señal de heterocedasticidad.

Análisis de Residuos

Verificar homocedasticidad y normalidad de los residuos

Homocedasticidad y Normalidad de residuos

#data frame con los valores ajustados y los residuos estandarizados.

residuals_df <- data.frame(
  Fitted = fitted(model),
  Residuals = rstandard(model)
)


ggplot(residuals_df, aes(x = Fitted, y = Residuals)) +
  geom_point() +
  geom_smooth(method = "loess", color = "red") +
  scale_x_continuous(labels = scales::comma) +
  labs(x = "Valores Ajustados", y = "Residuos Estandarizados", title = "Normalidad de residuos")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 21 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 21 rows containing missing values (`geom_point()`).

Puede haber problemas con la normalidad de los residuos, como lo indican las desviaciones, como lo sugiere la tendencia no horizontal.

3. Evaluar la importancia de las variables

  summary(model)
## 
## Call:
## lm(formula = salary_in_usd ~ experience_level + employment_type + 
##     job_title + remote_ratio + company_location + company_size, 
##     data = salaries)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -142770  -28986   -4855   22917  374890 
## 
## Coefficients:
##                                                     Estimate Std. Error t value
## (Intercept)                                         40922.53   50598.28   0.809
## experience_levelEX                                  85863.27    5826.97  14.736
## experience_levelMI                                  19836.33    3451.68   5.747
## experience_levelSE                                  45753.73    3256.03  14.052
## employment_typeFL                                  -50903.92   26146.89  -1.947
## employment_typeFT                                   -1858.64   17998.88  -0.103
## employment_typePT                                  -19565.68   22052.18  -0.887
## job_titleAI Developer                              113718.59   42033.93   2.705
## job_titleAI Programmer                               2202.74   52821.35   0.042
## job_titleAI Scientist                               42513.47   39179.25   1.085
## job_titleAnalytics Engineer                         24183.06   38374.39   0.630
## job_titleApplied Data Scientist                     56789.61   41354.10   1.373
## job_titleApplied Machine Learning Engineer          27507.25   50933.46   0.540
## job_titleApplied Machine Learning Scientist         26527.06   40579.43   0.654
## job_titleApplied Scientist                          59883.91   38673.05   1.548
## job_titleAutonomous Vehicle Technician              39349.80   77759.88   0.506
## job_titleAzure Data Engineer                        31834.64   62506.90   0.509
## job_titleBI Analyst                                  9323.83   41144.25   0.227
## job_titleBI Data Analyst                            -6371.31   40182.43  -0.159
## job_titleBI Data Engineer                          -29052.08   60920.66  -0.477
## job_titleBI Developer                                1558.63   40318.41   0.039
## job_titleBig Data Architect                         25798.45   50870.09   0.507
## job_titleBig Data Engineer                          25251.64   41272.17   0.612
## job_titleBusiness Data Analyst                       2807.35   39229.61   0.072
## job_titleBusiness Intelligence Engineer             37239.29   44846.83   0.830
## job_titleCloud Data Architect                      115791.68   60904.06   1.901
## job_titleCloud Data Engineer                        70412.91   53170.83   1.324
## job_titleCloud Database Engineer                    20524.17   43626.95   0.470
## job_titleCompliance Data Analyst                   -59623.68   65531.96  -0.910
## job_titleComputer Vision Engineer                   57393.35   39684.33   1.446
## job_titleComputer Vision Software Engineer          18275.26   43906.61   0.416
## job_titleData Analyst                               -6138.17   38104.38  -0.161
## job_titleData Analytics Consultant                  20852.07   51858.36   0.402
## job_titleData Analytics Engineer                     5522.40   43300.62   0.128
## job_titleData Analytics Lead                       144045.70   52995.80   2.718
## job_titleData Analytics Manager                     14738.93   39408.23   0.374
## job_titleData Analytics Specialist                 -41910.71   50716.20  -0.826
## job_titleData Architect                             28723.88   38389.12   0.748
## job_titleData DevOps Engineer                       15398.90   61711.63   0.250
## job_titleData Engineer                              21075.10   38105.00   0.553
## job_titleData Infrastructure Engineer               56822.49   42704.81   1.331
## job_titleData Lead                                  75589.29   50716.20   1.490
## job_titleData Management Specialist                  5049.79   67467.84   0.075
## job_titleData Manager                               -3282.80   39092.23  -0.084
## job_titleData Modeler                              -18010.71   50716.20  -0.355
## job_titleData Operations Analyst                   -45295.76   44853.63  -1.010
## job_titleData Operations Engineer                  -21191.35   40922.59  -0.518
## job_titleData Quality Analyst                      -44843.99   42210.69  -1.062
## job_titleData Science Consultant                     -106.74   39399.57  -0.003
## job_titleData Science Engineer                      -8079.04   43780.30  -0.185
## job_titleData Science Lead                          44975.63   41628.62   1.080
## job_titleData Science Manager                       62655.75   38591.15   1.624
## job_titleData Science Tech Lead                    240791.68   60904.06   3.954
## job_titleData Scientist                             26172.47   38112.87   0.687
## job_titleData Scientist Lead                        52765.62   51830.54   1.018
## job_titleData Specialist                            -5221.25   40158.44  -0.130
## job_titleData Strategist                           -35683.03   50955.83  -0.700
## job_titleDeep Learning Engineer                     22522.52   42752.12   0.527
## job_titleDeep Learning Researcher                   37340.04   61269.80   0.609
## job_titleDirector of Data Science                   62702.23   40996.16   1.529
## job_titleETL Developer                              14346.77   40988.02   0.350
## job_titleETL Engineer                               11284.45   50849.06   0.222
## job_titleFinance Data Analyst                      -23533.96   61039.04  -0.386
## job_titleFinancial Data Analyst                      -562.12   46984.48  -0.012
## job_titleHead of Data                               69427.65   41409.76   1.677
## job_titleHead of Data Science                       32948.17   41436.29   0.795
## job_titleHead of Machine Learning                   -6336.93   60951.27  -0.104
## job_titleInsight Analyst                           -15428.94   50820.90  -0.304
## job_titleLead Data Analyst                          -3815.25   43510.63  -0.088
## job_titleLead Data Engineer                         33501.53   43850.68   0.764
## job_titleLead Data Scientist                        40190.42   41450.07   0.970
## job_titleLead Machine Learning Engineer             30383.25   47055.36   0.646
## job_titleMachine Learning Developer                 12926.39   43230.21   0.299
## job_titleMachine Learning Engineer                  42041.42   38198.43   1.101
## job_titleMachine Learning Infrastructure Engineer   48987.64   40756.75   1.202
## job_titleMachine Learning Manager                   27837.24   46949.44   0.593
## job_titleMachine Learning Research Engineer         13905.57   44927.68   0.310
## job_titleMachine Learning Researcher                28144.30   43033.98   0.654
## job_titleMachine Learning Scientist                 60633.28   39349.46   1.541
## job_titleMachine Learning Software Engineer         85937.64   41276.16   2.082
## job_titleManager Data Management                    -8155.87   60917.64  -0.134
## job_titleMarketing Data Analyst                     90305.29   53171.70   1.698
## job_titleMarketing Data Engineer                    87194.55   68176.93   1.279
## job_titleML Engineer                                46202.36   38928.25   1.187
## job_titleMLOps Engineer                             20111.59   44866.09   0.448
## job_titleNLP Engineer                               44065.81   42469.34   1.038
## job_titlePower BI Developer                          8626.34   60819.12   0.142
## job_titlePrincipal Data Analyst                     23741.43   50920.42   0.466
## job_titlePrincipal Data Architect                   -3329.95   60776.55  -0.055
## job_titlePrincipal Data Engineer                    58519.16   50793.65   1.152
## job_titlePrincipal Data Scientist                   94728.30   41806.98   2.266
## job_titlePrincipal Machine Learning Engineer        56844.13   60917.64   0.933
## job_titleProduct Data Analyst                        8353.10   44701.64   0.187
## job_titleProduct Data Scientist                    -16476.74   68119.34  -0.242
## job_titleResearch Engineer                          51821.92   38906.24   1.332
## job_titleResearch Scientist                         56254.01   38463.74   1.463
## job_titleSoftware Data Engineer                     38064.99   54763.64   0.695
## job_titleStaff Data Analyst                       -141792.57   61154.49  -2.319
## job_titleStaff Data Scientist                      -31664.45   63207.64  -0.501
## remote_ratio                                          -21.05      16.82  -1.252
## company_locationAL                                 -49465.71   66971.40  -0.739
## company_locationAM                                 -26642.17   55034.34  -0.484
## company_locationAR                                 -17667.72   39812.74  -0.444
## company_locationAS                                 -15866.72   41906.24  -0.379
## company_locationAT                                 -24228.88   34697.39  -0.698
## company_locationAU                                  13960.85   31612.73   0.442
## company_locationBA                                  -7430.57   57955.60  -0.128
## company_locationBE                                  -5875.03   36674.85  -0.160
## company_locationBO                                 -78039.42   58277.70  -1.339
## company_locationBR                                 -36837.72   30807.91  -1.196
## company_locationBS                                  -4247.23   90001.77  -0.047
## company_locationCA                                  30215.47   28590.97   1.057
## company_locationCF                                 -29615.50   43791.20  -0.676
## company_locationCH                                   8300.04   35303.73   0.235
## company_locationCL                                 -42929.79   55148.61  -0.778
## company_locationCN                                   4682.10   55445.26   0.084
## company_locationCO                                 -28363.63   37179.08  -0.763
## company_locationCR                                  37340.47   67074.00   0.557
## company_locationCZ                                 -59335.83   40474.22  -1.466
## company_locationDE                                   1247.34   28780.76   0.043
## company_locationDK                                 -34747.35   39231.59  -0.886
## company_locationDZ                                  51873.19   56727.54   0.914
## company_locationEE                                 -58514.57   44380.25  -1.318
## company_locationEG                                 -96278.54   56002.60  -1.719
## company_locationES                                 -38479.82   28639.95  -1.344
## company_locationFI                                 -70183.39   39952.77  -1.757
## company_locationFR                                 -21210.60   29278.86  -0.724
## company_locationGB                                   1664.78   28313.33   0.059
## company_locationGH                                 -47114.22   56583.13  -0.833
## company_locationGR                                 -16344.16   30957.25  -0.528
## company_locationHK                                 -20010.69   55154.27  -0.363
## company_locationHN                                 -22953.85   60031.85  -0.382
## company_locationHR                                   6208.11   39124.96   0.159
## company_locationHU                                 -47691.32   43791.52  -1.089
## company_locationID                                 -23738.76   43804.62  -0.542
## company_locationIE                                  11011.87   32973.08   0.334
## company_locationIL                                 165472.09   44248.33   3.740
## company_locationIN                                 -41228.78   28731.48  -1.435
## company_locationIQ                                  73361.64   58634.92   1.251
## company_locationIR                                  68394.93   58591.22   1.167
## company_locationIT                                 -21574.41   39520.98  -0.546
## company_locationJP                                  28594.42   34282.33   0.834
## company_locationKE                                  -3150.84   44934.70  -0.070
## company_locationLT                                  13186.74   43781.90   0.301
## company_locationLU                                 -21220.47   39604.69  -0.536
## company_locationLV                                 -62628.01   36742.73  -1.705
## company_locationMA                                 -89649.62   57389.00  -1.562
## company_locationMD                                 -41852.39   57356.78  -0.730
## company_locationMK                                -122179.01   57964.18  -2.108
## company_locationMT                                 -50553.87   55137.88  -0.917
## company_locationMX                                  -4810.68   31838.21  -0.151
## company_locationMY                                 -23131.46   55216.38  -0.419
## company_locationNG                                  52664.69   37213.12   1.415
## company_locationNL                                 -15599.82   31276.03  -0.499
## company_locationNZ                                  32032.76   58976.03   0.543
## company_locationPH                                  -2275.09   55086.25  -0.041
## company_locationPK                                 -46107.52   37883.63  -1.217
## company_locationPL                                 -43009.43   35484.50  -1.212
## company_locationPR                                  51579.07   36739.75   1.404
## company_locationPT                                 -55330.78   30780.42  -1.798
## company_locationRO                                 -27557.80   44494.86  -0.619
## company_locationRU                                 -39803.51   40136.68  -0.992
## company_locationSE                                   9354.43   45439.55   0.206
## company_locationSG                                 -58235.99   41434.85  -1.405
## company_locationSI                                 -40000.82   36964.75  -1.082
## company_locationSK                                -116218.17   66181.31  -1.756
## company_locationTH                                 -50007.00   39971.50  -1.251
## company_locationTR                                 -69637.22   35225.33  -1.977
## company_locationUA                                 -67667.13   38508.31  -1.757
## company_locationUS                                  50443.14   28115.27   1.794
## company_locationVN                                 -48138.98   55214.29  -0.872
## company_sizeM                                        1649.94    2809.86   0.587
## company_sizeS                                      -24299.47    5249.44  -4.629
##                                                   Pr(>|t|)    
## (Intercept)                                       0.418699    
## experience_levelEX                                 < 2e-16 ***
## experience_levelMI                                9.85e-09 ***
## experience_levelSE                                 < 2e-16 ***
## employment_typeFL                                 0.051631 .  
## employment_typeFT                                 0.917759    
## employment_typePT                                 0.375007    
## job_titleAI Developer                             0.006855 ** 
## job_titleAI Programmer                            0.966739    
## job_titleAI Scientist                             0.277950    
## job_titleAnalytics Engineer                       0.528612    
## job_titleApplied Data Scientist                   0.169760    
## job_titleApplied Machine Learning Engineer        0.589188    
## job_titleApplied Machine Learning Scientist       0.513343    
## job_titleApplied Scientist                        0.121598    
## job_titleAutonomous Vehicle Technician            0.612858    
## job_titleAzure Data Engineer                      0.610575    
## job_titleBI Analyst                               0.820738    
## job_titleBI Data Analyst                          0.874025    
## job_titleBI Data Engineer                         0.633474    
## job_titleBI Developer                             0.969165    
## job_titleBig Data Architect                       0.612085    
## job_titleBig Data Engineer                        0.540688    
## job_titleBusiness Data Analyst                    0.942954    
## job_titleBusiness Intelligence Engineer           0.406387    
## job_titleCloud Data Architect                     0.057354 .  
## job_titleCloud Data Engineer                      0.185496    
## job_titleCloud Database Engineer                  0.638064    
## job_titleCompliance Data Analyst                  0.362967    
## job_titleComputer Vision Engineer                 0.148195    
## job_titleComputer Vision Software Engineer        0.677267    
## job_titleData Analyst                             0.872033    
## job_titleData Analytics Consultant                0.687637    
## job_titleData Analytics Engineer                  0.898523    
## job_titleData Analytics Lead                      0.006598 ** 
## job_titleData Analytics Manager                   0.708422    
## job_titleData Analytics Specialist                0.408645    
## job_titleData Architect                           0.454371    
## job_titleData DevOps Engineer                     0.802965    
## job_titleData Engineer                            0.580243    
## job_titleData Infrastructure Engineer             0.183409    
## job_titleData Lead                                0.136197    
## job_titleData Management Specialist               0.940340    
## job_titleData Manager                             0.933080    
## job_titleData Modeler                             0.722515    
## job_titleData Operations Analyst                  0.312632    
## job_titleData Operations Engineer                 0.604602    
## job_titleData Quality Analyst                     0.288133    
## job_titleData Science Consultant                  0.997838    
## job_titleData Science Engineer                    0.853603    
## job_titleData Science Lead                        0.280036    
## job_titleData Science Manager                     0.104554    
## job_titleData Science Tech Lead                   7.85e-05 ***
## job_titleData Scientist                           0.492310    
## job_titleData Scientist Lead                      0.308727    
## job_titleData Specialist                          0.896561    
## job_titleData Strategist                          0.483802    
## job_titleDeep Learning Engineer                   0.598354    
## job_titleDeep Learning Researcher                 0.542274    
## job_titleDirector of Data Science                 0.126237    
## job_titleETL Developer                            0.726342    
## job_titleETL Engineer                             0.824388    
## job_titleFinance Data Analyst                     0.699849    
## job_titleFinancial Data Analyst                   0.990455    
## job_titleHead of Data                             0.093708 .  
## job_titleHead of Data Science                     0.426577    
## job_titleHead of Machine Learning                 0.917201    
## job_titleInsight Analyst                          0.761454    
## job_titleLead Data Analyst                        0.930132    
## job_titleLead Data Engineer                       0.444923    
## job_titleLead Data Scientist                      0.332306    
## job_titleLead Machine Learning Engineer           0.518520    
## job_titleMachine Learning Developer               0.764948    
## job_titleMachine Learning Engineer                0.271142    
## job_titleMachine Learning Infrastructure Engineer 0.229462    
## job_titleMachine Learning Manager                 0.553272    
## job_titleMachine Learning Research Engineer       0.756952    
## job_titleMachine Learning Researcher              0.513153    
## job_titleMachine Learning Scientist               0.123431    
## job_titleMachine Learning Software Engineer       0.037412 *  
## job_titleManager Data Management                  0.893502    
## job_titleMarketing Data Analyst                   0.089525 .  
## job_titleMarketing Data Engineer                  0.200999    
## job_titleML Engineer                              0.235362    
## job_titleMLOps Engineer                           0.653994    
## job_titleNLP Engineer                             0.299530    
## job_titlePower BI Developer                       0.887218    
## job_titlePrincipal Data Analyst                   0.641068    
## job_titlePrincipal Data Architect                 0.956309    
## job_titlePrincipal Data Engineer                  0.249359    
## job_titlePrincipal Data Scientist                 0.023520 *  
## job_titlePrincipal Machine Learning Engineer      0.350815    
## job_titleProduct Data Analyst                     0.851778    
## job_titleProduct Data Scientist                   0.808887    
## job_titleResearch Engineer                        0.182955    
## job_titleResearch Scientist                       0.143686    
## job_titleSoftware Data Engineer                   0.487052    
## job_titleStaff Data Analyst                       0.020473 *  
## job_titleStaff Data Scientist                     0.616431    
## remote_ratio                                      0.210786    
## company_locationAL                                0.460193    
## company_locationAM                                0.628344    
## company_locationAR                                0.657235    
## company_locationAS                                0.704989    
## company_locationAT                                0.485040    
## company_locationAU                                0.658790    
## company_locationBA                                0.897989    
## company_locationBE                                0.872739    
## company_locationBO                                0.180625    
## company_locationBR                                0.231884    
## company_locationBS                                0.962364    
## company_locationCA                                0.290666    
## company_locationCF                                0.498901    
## company_locationCH                                0.814142    
## company_locationCL                                0.436362    
## company_locationCN                                0.932707    
## company_locationCO                                0.445578    
## company_locationCR                                0.577763    
## company_locationCZ                                0.142732    
## company_locationDE                                0.965433    
## company_locationDK                                0.375840    
## company_locationDZ                                0.360554    
## company_locationEE                                0.187427    
## company_locationEG                                0.085668 .  
## company_locationES                                0.179172    
## company_locationFI                                0.079061 .  
## company_locationFR                                0.468847    
## company_locationGB                                0.953116    
## company_locationGH                                0.405095    
## company_locationGR                                0.597561    
## company_locationHK                                0.716766    
## company_locationHN                                0.702216    
## company_locationHR                                0.873935    
## company_locationHU                                0.276203    
## company_locationID                                0.587905    
## company_locationIE                                0.738425    
## company_locationIL                                0.000187 ***
## company_locationIN                                0.151383    
## company_locationIQ                                0.210958    
## company_locationIR                                0.243157    
## company_locationIT                                0.585170    
## company_locationJP                                0.404288    
## company_locationKE                                0.944102    
## company_locationLT                                0.763286    
## company_locationLU                                0.592125    
## company_locationLV                                0.088374 .  
## company_locationMA                                0.118343    
## company_locationMD                                0.465630    
## company_locationMK                                0.035114 *  
## company_locationMT                                0.359276    
## company_locationMX                                0.879907    
## company_locationMY                                0.675297    
## company_locationNG                                0.157091    
## company_locationNL                                0.617966    
## company_locationNZ                                0.587061    
## company_locationPH                                0.967059    
## company_locationPK                                0.223653    
## company_locationPL                                0.225568    
## company_locationPR                                0.160434    
## company_locationPT                                0.072325 .  
## company_locationRO                                0.535727    
## company_locationRU                                0.321411    
## company_locationSE                                0.836908    
## company_locationSG                                0.159964    
## company_locationSI                                0.279266    
## company_locationSK                                0.079164 .  
## company_locationTH                                0.210992    
## company_locationTR                                0.048129 *  
## company_locationUA                                0.078968 .  
## company_locationUS                                0.072873 .  
## company_locationVN                                0.383345    
## company_sizeM                                     0.557108    
## company_sizeS                                     3.81e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 47370 on 3582 degrees of freedom
## Multiple R-squared:  0.4616, Adjusted R-squared:  0.4357 
## F-statistic: 17.85 on 172 and 3582 DF,  p-value: < 2.2e-16

Los coeficientes para experience_levelEX, experience_levelMI, y experience_levelSE son significativos y tienen valores p muy bajos. Esto sugiere que el nivel de experiencia tiene un impacto considerable en el salario

Tambien en los job_title job_titleAI Developer, job_titleData Analytics Lead, y job_titleData Science Tech Lead, tienen coeficientes positivos significativos.por ende tendria sueldo mas altos

4. Realización de Predicciones

Predicciones 1

new_data1 <- data.frame(experience_level = "SE", employment_type = "FT", job_title = "Data Scientist", remote_ratio = 100, company_location = "US", company_size = "L")
predicted_salary1 <- predict(model, newdata = new_data1)
predicted_salary1
##        1 
## 159328.3

Primera Predicción (predicted_salary1): El salario estimado para un perfil con nivel de experiencia “SE” (Senior), tipo de empleo “FT” (Tiempo Completo), título de trabajo “Data Scientist”, trabajando remotamente (100%), en una empresa grande en EE.UU. (“US”) y de tamaño “L” (Grande), es aproximadamente 159,328.3 USD.

Predicciones 2

new_data2 <- data.frame(experience_level = "MI", employment_type = "PT", job_title = "Data Analyst", remote_ratio = 50, company_location = "CA", company_size = "S")
predicted_salary2 <- predict(model, newdata = new_data2)

Segunda Predicción (predicted_salary2): Para un perfil con menos experiencia “MI” (Medio), en un empleo a tiempo parcial “PT”, título de trabajo “Data Analyst”, con un ratio de trabajo remoto de 50%, en una empresa pequeña en Canadá (“CA”) y de tamaño “S” (Pequeño), el salario estimado es aproximadamente 39,918.56 USD.