Para realizar este análisis se utilizó un conjunto de datos que contiene información relevante acerca de los salarios de los Científicos de Datos en el año 2023. El conjunto de datos abarca diversas variables clave que describen distintos aspectos de su empleo y remuneración:
Este conjunto de datos ofrece una visión integral sobre las tendencias de remuneración en el campo de la ciencia de datos y permite realizar análisis detallados sobre cómo variables como la experiencia, el tipo de empleo y la ubicación geográfica, entre otras, pueden influir en los salarios de estos profesionales.
Carga de librerias
## Loading required package: carData
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Cargar el conjunto de datos
salaries <- read.csv("ds_salaries.csv")
### 1. Análisis Exploratorio de Datos e Informaciones Subyacentes ####
# Ver las primeras filas
head(salaries)
## work_year experience_level employment_type job_title salary
## 1 2023 SE FT Principal Data Scientist 80000
## 2 2023 MI CT ML Engineer 30000
## 3 2023 MI CT ML Engineer 25500
## 4 2023 SE FT Data Scientist 175000
## 5 2023 SE FT Data Scientist 120000
## 6 2023 SE FT Applied Scientist 222200
## salary_currency salary_in_usd employee_residence remote_ratio
## 1 EUR 85847 ES 100
## 2 USD 30000 US 100
## 3 USD 25500 US 100
## 4 USD 175000 CA 100
## 5 USD 120000 CA 100
## 6 USD 222200 US 0
## company_location company_size
## 1 ES L
## 2 US S
## 3 US S
## 4 CA M
## 5 CA M
## 6 US L
# Estadísticas descriptivas
summary(salaries)
## work_year experience_level employment_type job_title
## Min. :2020 Length:3755 Length:3755 Length:3755
## 1st Qu.:2022 Class :character Class :character Class :character
## Median :2022 Mode :character Mode :character Mode :character
## Mean :2022
## 3rd Qu.:2023
## Max. :2023
## salary salary_currency salary_in_usd employee_residence
## Min. : 6000 Length:3755 Min. : 5132 Length:3755
## 1st Qu.: 100000 Class :character 1st Qu.: 95000 Class :character
## Median : 138000 Mode :character Median :135000 Mode :character
## Mean : 190696 Mean :137570
## 3rd Qu.: 180000 3rd Qu.:175000
## Max. :30400000 Max. :450000
## remote_ratio company_location company_size
## Min. : 0.00 Length:3755 Length:3755
## 1st Qu.: 0.00 Class :character Class :character
## Median : 0.00 Mode :character Mode :character
## Mean : 46.27
## 3rd Qu.:100.00
## Max. :100.00
hist(salaries$salary_in_usd, breaks=50, main="Distribución de Salarios en USD", xlab="Salario (USD)", ylab="Frecuencia", xaxt="n")
axis(1, at=seq(0, max(salaries$salary_in_usd), by=20000), las=2, cex.axis=0.7, labels=format(seq(0, max(salaries$salary_in_usd), by=20000), big.mark=",", scientific=FALSE))
Distribucion positiva asimetrica con posibles outliers hacia el extremo derecho.
boxplot(salary_in_usd ~ experience_level, data = salaries, yaxt = "n",
main = "Distribución de Salarios por Nivel de Experiencia",
xlab = "Nivel de Experiencia",
ylab = "Salario en USD")
axis(2, at = axTicks(2), labels = format(axTicks(2), big.mark = ",", scientific = FALSE))
boxplot(salary_in_usd ~ employment_type, data = salaries, yaxt = 'n',
main = "Distribución de Salarios por Tipo de Empleo",
xlab = "Tipo de Empleo",
ylab = "Salario en USD")
salario_ticks <- pretty(salaries$salary_in_usd)
axis(2, at = salario_ticks, labels = format(salario_ticks, big.mark = ",", scientific = FALSE))
shapiro.test(salaries$salary_in_usd)
##
## Shapiro-Wilk normality test
##
## data: salaries$salary_in_usd
## W = 0.98273, p-value < 2.2e-16
W = 0.98273 la prueba de shapiro prueba que esta cerca uno en su normalidad
p-value < 2.2e-16 lleva a rechazar la hipótesis nula.
Dado que el valor p es extremadamente bajo, rechazamos la hipótesis nula y concluimos que hay
evidencia estadísticamente significativa de que la distribución de los salarios no sigue una distribución normal.
table(salaries$experience_level)
##
## EN EX MI SE
## 320 114 805 2516
EN: Hay 320 entradas para individuos con un nivel de experiencia de principiante (“Entry level”).
EX: Hay 114 entradas para individuos con un nivel de experiencia de experto (“Expert level”).
MI: Hay 805 entradas para individuos con un nivel de experiencia intermedio (“Mid level”).
SE: Hay 2516 entradas para individuos con un nivel de experiencia senior (“Senior level”).
Esto indica que la mayoría de los datos corresponden a personas con un nivel de experiencia “Senior”, seguido por aquellos con un nivel de experiencia “Mid”. Los niveles “Entry” y “Expert” tienen significativamente menos representación en este conjunto de datos.
prop.table(table(salaries$employment_type))
##
## CT FL FT PT
## 0.002663116 0.002663116 0.990146471 0.004527297
CT (“Contract”): Constituye aproximadamente el 0.266% del conjunto de datos.
FL (“Freelance”): También representa aproximadamente el 0.266% del conjunto de datos.
FT (“Full-time”): Es la categoría dominante, comprendiendo aproximadamente el 99.01% del conjunto de datos.
PT (“Part-time”): Constituye aproximadamente el 0.453% del conjunto de datos.
Estas proporciones muestran que la gran mayoría de los datos corresponden a empleados a tiempo completo (“Full-time”). Los contratos, trabajos freelance y a tiempo parcial tienen una representación mucho menor.
ggplot(salaries, aes(x = work_year, y = salary_in_usd)) +
geom_line(stat = "summary", fun = "mean")
Se puede apreciar una tendencia a la alza de los salary_in_usd
ggplot(salaries, aes(x = experience_level, y = salary_in_usd)) +
geom_boxplot() +
scale_y_continuous(labels = comma_format())
Se puede apreciar outlier mas llamativos en experience_level.
MI (Mid-level): Nivel intermedio. SE (Senior-level): Nivel senior o avanzado.
salaries %>%
group_by(experience_level) %>%
summarise(Mean_Salary = mean(salary_in_usd, na.rm = TRUE))
## # A tibble: 4 × 2
## experience_level Mean_Salary
## <chr> <dbl>
## 1 EN 78546.
## 2 EX 194931.
## 3 MI 104526.
## 4 SE 153051.
Se ve claramente que EX (Executive-level): Nivel directivo o ejecutivo tiene un mayor salary_in_usd promedio.
salaries$job_category <- case_when(
salaries$job_title %in% c('Data Scientist', 'Senior Data Scientist', 'Junior Data Scientist', 'Lead Data Scientist', 'Principal Data Scientist', 'Data Science Lead', 'Data Science Manager', 'Data Science Tech Lead', 'Data Science Consultant', 'Data Science Engineer', 'Staff Data Scientist') ~ 'Data Scientist',
salaries$job_title %in% c('Data Engineer', 'Senior Data Engineer', 'Lead Data Engineer', 'Principal Data Engineer', 'Data DevOps Engineer', 'Data Infrastructure Engineer', 'Data Operations Engineer', 'Data Quality Engineer', 'Software Data Engineer') ~ 'Data Engineer',
salaries$job_title %in% c('Machine Learning Engineer', 'Senior Machine Learning Engineer', 'Lead Machine Learning Engineer', 'Principal Machine Learning Engineer', 'Machine Learning Infrastructure Engineer', 'Machine Learning Manager', 'Machine Learning Research Engineer', 'Machine Learning Researcher', 'Machine Learning Scientist', 'Machine Learning Software Engineer') ~ 'Machine Learning Professional',
salaries$job_title %in% c('Data Analyst', 'Senior Data Analyst', 'Lead Data Analyst', 'Principal Data Analyst', 'Data Analytics Consultant', 'Data Analytics Engineer', 'Data Analytics Lead', 'Data Analytics Manager', 'Data Analytics Specialist', 'Business Data Analyst', 'Finance Data Analyst', 'Financial Data Analyst', 'Marketing Data Analyst', 'Product Data Analyst', 'BI Analyst', 'BI Data Analyst', 'BI Developer', 'Insight Analyst', 'Staff Data Analyst') ~ 'Data Analyst',
salaries$job_title %in% c('Data Architect', 'Senior Data Architect', 'Lead Data Architect', 'Principal Data Architect', 'Cloud Data Architect', 'Big Data Architect', 'Big Data Engineer', 'Cloud Data Engineer', 'Cloud Database Engineer', 'Azure Data Engineer', 'ETL Developer', 'ETL Engineer', 'Data Manager', 'Data Lead', 'Data Modeler', 'Data Operations Analyst', 'Data Specialist', 'Data Strategist', 'Head of Data', 'Head of Data Science', 'Director of Data Science', 'NLP Engineer', 'Power BI Developer', 'Research Engineer', 'Research Scientist', 'Compliance Data Analyst', 'Computer Vision Engineer', 'Computer Vision Software Engineer', 'Deep Learning Engineer', 'Deep Learning Researcher', 'ML Engineer', 'MLOps Engineer', 'Machine Learning Developer', '3D Computer Vision Researcher', 'AI Developer', 'AI Programmer', 'AI Scientist', 'Analytics Engineer', 'Applied Data Scientist', 'Applied Machine Learning Engineer', 'Applied Machine Learning Scientist', 'Applied Scientist', 'Autonomous Vehicle Technician', 'Marketing Data Engineer', 'Product Data Scientist', 'Manager Data Management', 'Staff Data Scientist') ~ 'Other',
TRUE ~ 'Unknown'
)
ggplot(salaries, aes(x = job_category, y = salary_in_usd)) +
geom_boxplot() +
scale_y_continuous(labels = scales::comma_format()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Se puede apreciar valores atipicos(outliers) en Data Analyst , data Scientist, y other, se tendra que revisar los valores atipicos.
Construir un modelo lineal y ver modelo
model <- lm(salary_in_usd ~ experience_level + employment_type + job_title + remote_ratio + company_location + company_size, data = salaries)
# Resumen del modelo
#summary(model)
vif(model)
## GVIF Df GVIF^(1/(2*Df))
## experience_level 2.023672 3 1.124665
## employment_type 3.150707 3 1.210788
## job_title 8769.100430 92 1.050580
## remote_ratio 1.117212 1 1.056983
## company_location 10786.523771 71 1.067580
## company_size 2.127758 2 1.207760
# Identificación de outliers
plot(model, which = 1)
Se chequeo la colinealidad, y job_title tiene un VIF extremadamente alto (879.10403), lo que sugiere que está altamente correlacionada con otras variables en el modelo y podría estar causando problemas de multicolinealidad.
Se chequeo los valores atipicos (outliers), y un patrón en forma de embudo, que sugiere que la varianza de los residuos podría aumentar con los valores ajustados, una señal de heterocedasticidad.
Verificar homocedasticidad y normalidad de los residuos
#data frame con los valores ajustados y los residuos estandarizados.
residuals_df <- data.frame(
Fitted = fitted(model),
Residuals = rstandard(model)
)
ggplot(residuals_df, aes(x = Fitted, y = Residuals)) +
geom_point() +
geom_smooth(method = "loess", color = "red") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Valores Ajustados", y = "Residuos Estandarizados", title = "Normalidad de residuos")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 21 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 21 rows containing missing values (`geom_point()`).
Puede haber problemas con la normalidad de los residuos, como lo indican las desviaciones, como lo sugiere la tendencia no horizontal.
summary(model)
##
## Call:
## lm(formula = salary_in_usd ~ experience_level + employment_type +
## job_title + remote_ratio + company_location + company_size,
## data = salaries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -142770 -28986 -4855 22917 374890
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 40922.53 50598.28 0.809
## experience_levelEX 85863.27 5826.97 14.736
## experience_levelMI 19836.33 3451.68 5.747
## experience_levelSE 45753.73 3256.03 14.052
## employment_typeFL -50903.92 26146.89 -1.947
## employment_typeFT -1858.64 17998.88 -0.103
## employment_typePT -19565.68 22052.18 -0.887
## job_titleAI Developer 113718.59 42033.93 2.705
## job_titleAI Programmer 2202.74 52821.35 0.042
## job_titleAI Scientist 42513.47 39179.25 1.085
## job_titleAnalytics Engineer 24183.06 38374.39 0.630
## job_titleApplied Data Scientist 56789.61 41354.10 1.373
## job_titleApplied Machine Learning Engineer 27507.25 50933.46 0.540
## job_titleApplied Machine Learning Scientist 26527.06 40579.43 0.654
## job_titleApplied Scientist 59883.91 38673.05 1.548
## job_titleAutonomous Vehicle Technician 39349.80 77759.88 0.506
## job_titleAzure Data Engineer 31834.64 62506.90 0.509
## job_titleBI Analyst 9323.83 41144.25 0.227
## job_titleBI Data Analyst -6371.31 40182.43 -0.159
## job_titleBI Data Engineer -29052.08 60920.66 -0.477
## job_titleBI Developer 1558.63 40318.41 0.039
## job_titleBig Data Architect 25798.45 50870.09 0.507
## job_titleBig Data Engineer 25251.64 41272.17 0.612
## job_titleBusiness Data Analyst 2807.35 39229.61 0.072
## job_titleBusiness Intelligence Engineer 37239.29 44846.83 0.830
## job_titleCloud Data Architect 115791.68 60904.06 1.901
## job_titleCloud Data Engineer 70412.91 53170.83 1.324
## job_titleCloud Database Engineer 20524.17 43626.95 0.470
## job_titleCompliance Data Analyst -59623.68 65531.96 -0.910
## job_titleComputer Vision Engineer 57393.35 39684.33 1.446
## job_titleComputer Vision Software Engineer 18275.26 43906.61 0.416
## job_titleData Analyst -6138.17 38104.38 -0.161
## job_titleData Analytics Consultant 20852.07 51858.36 0.402
## job_titleData Analytics Engineer 5522.40 43300.62 0.128
## job_titleData Analytics Lead 144045.70 52995.80 2.718
## job_titleData Analytics Manager 14738.93 39408.23 0.374
## job_titleData Analytics Specialist -41910.71 50716.20 -0.826
## job_titleData Architect 28723.88 38389.12 0.748
## job_titleData DevOps Engineer 15398.90 61711.63 0.250
## job_titleData Engineer 21075.10 38105.00 0.553
## job_titleData Infrastructure Engineer 56822.49 42704.81 1.331
## job_titleData Lead 75589.29 50716.20 1.490
## job_titleData Management Specialist 5049.79 67467.84 0.075
## job_titleData Manager -3282.80 39092.23 -0.084
## job_titleData Modeler -18010.71 50716.20 -0.355
## job_titleData Operations Analyst -45295.76 44853.63 -1.010
## job_titleData Operations Engineer -21191.35 40922.59 -0.518
## job_titleData Quality Analyst -44843.99 42210.69 -1.062
## job_titleData Science Consultant -106.74 39399.57 -0.003
## job_titleData Science Engineer -8079.04 43780.30 -0.185
## job_titleData Science Lead 44975.63 41628.62 1.080
## job_titleData Science Manager 62655.75 38591.15 1.624
## job_titleData Science Tech Lead 240791.68 60904.06 3.954
## job_titleData Scientist 26172.47 38112.87 0.687
## job_titleData Scientist Lead 52765.62 51830.54 1.018
## job_titleData Specialist -5221.25 40158.44 -0.130
## job_titleData Strategist -35683.03 50955.83 -0.700
## job_titleDeep Learning Engineer 22522.52 42752.12 0.527
## job_titleDeep Learning Researcher 37340.04 61269.80 0.609
## job_titleDirector of Data Science 62702.23 40996.16 1.529
## job_titleETL Developer 14346.77 40988.02 0.350
## job_titleETL Engineer 11284.45 50849.06 0.222
## job_titleFinance Data Analyst -23533.96 61039.04 -0.386
## job_titleFinancial Data Analyst -562.12 46984.48 -0.012
## job_titleHead of Data 69427.65 41409.76 1.677
## job_titleHead of Data Science 32948.17 41436.29 0.795
## job_titleHead of Machine Learning -6336.93 60951.27 -0.104
## job_titleInsight Analyst -15428.94 50820.90 -0.304
## job_titleLead Data Analyst -3815.25 43510.63 -0.088
## job_titleLead Data Engineer 33501.53 43850.68 0.764
## job_titleLead Data Scientist 40190.42 41450.07 0.970
## job_titleLead Machine Learning Engineer 30383.25 47055.36 0.646
## job_titleMachine Learning Developer 12926.39 43230.21 0.299
## job_titleMachine Learning Engineer 42041.42 38198.43 1.101
## job_titleMachine Learning Infrastructure Engineer 48987.64 40756.75 1.202
## job_titleMachine Learning Manager 27837.24 46949.44 0.593
## job_titleMachine Learning Research Engineer 13905.57 44927.68 0.310
## job_titleMachine Learning Researcher 28144.30 43033.98 0.654
## job_titleMachine Learning Scientist 60633.28 39349.46 1.541
## job_titleMachine Learning Software Engineer 85937.64 41276.16 2.082
## job_titleManager Data Management -8155.87 60917.64 -0.134
## job_titleMarketing Data Analyst 90305.29 53171.70 1.698
## job_titleMarketing Data Engineer 87194.55 68176.93 1.279
## job_titleML Engineer 46202.36 38928.25 1.187
## job_titleMLOps Engineer 20111.59 44866.09 0.448
## job_titleNLP Engineer 44065.81 42469.34 1.038
## job_titlePower BI Developer 8626.34 60819.12 0.142
## job_titlePrincipal Data Analyst 23741.43 50920.42 0.466
## job_titlePrincipal Data Architect -3329.95 60776.55 -0.055
## job_titlePrincipal Data Engineer 58519.16 50793.65 1.152
## job_titlePrincipal Data Scientist 94728.30 41806.98 2.266
## job_titlePrincipal Machine Learning Engineer 56844.13 60917.64 0.933
## job_titleProduct Data Analyst 8353.10 44701.64 0.187
## job_titleProduct Data Scientist -16476.74 68119.34 -0.242
## job_titleResearch Engineer 51821.92 38906.24 1.332
## job_titleResearch Scientist 56254.01 38463.74 1.463
## job_titleSoftware Data Engineer 38064.99 54763.64 0.695
## job_titleStaff Data Analyst -141792.57 61154.49 -2.319
## job_titleStaff Data Scientist -31664.45 63207.64 -0.501
## remote_ratio -21.05 16.82 -1.252
## company_locationAL -49465.71 66971.40 -0.739
## company_locationAM -26642.17 55034.34 -0.484
## company_locationAR -17667.72 39812.74 -0.444
## company_locationAS -15866.72 41906.24 -0.379
## company_locationAT -24228.88 34697.39 -0.698
## company_locationAU 13960.85 31612.73 0.442
## company_locationBA -7430.57 57955.60 -0.128
## company_locationBE -5875.03 36674.85 -0.160
## company_locationBO -78039.42 58277.70 -1.339
## company_locationBR -36837.72 30807.91 -1.196
## company_locationBS -4247.23 90001.77 -0.047
## company_locationCA 30215.47 28590.97 1.057
## company_locationCF -29615.50 43791.20 -0.676
## company_locationCH 8300.04 35303.73 0.235
## company_locationCL -42929.79 55148.61 -0.778
## company_locationCN 4682.10 55445.26 0.084
## company_locationCO -28363.63 37179.08 -0.763
## company_locationCR 37340.47 67074.00 0.557
## company_locationCZ -59335.83 40474.22 -1.466
## company_locationDE 1247.34 28780.76 0.043
## company_locationDK -34747.35 39231.59 -0.886
## company_locationDZ 51873.19 56727.54 0.914
## company_locationEE -58514.57 44380.25 -1.318
## company_locationEG -96278.54 56002.60 -1.719
## company_locationES -38479.82 28639.95 -1.344
## company_locationFI -70183.39 39952.77 -1.757
## company_locationFR -21210.60 29278.86 -0.724
## company_locationGB 1664.78 28313.33 0.059
## company_locationGH -47114.22 56583.13 -0.833
## company_locationGR -16344.16 30957.25 -0.528
## company_locationHK -20010.69 55154.27 -0.363
## company_locationHN -22953.85 60031.85 -0.382
## company_locationHR 6208.11 39124.96 0.159
## company_locationHU -47691.32 43791.52 -1.089
## company_locationID -23738.76 43804.62 -0.542
## company_locationIE 11011.87 32973.08 0.334
## company_locationIL 165472.09 44248.33 3.740
## company_locationIN -41228.78 28731.48 -1.435
## company_locationIQ 73361.64 58634.92 1.251
## company_locationIR 68394.93 58591.22 1.167
## company_locationIT -21574.41 39520.98 -0.546
## company_locationJP 28594.42 34282.33 0.834
## company_locationKE -3150.84 44934.70 -0.070
## company_locationLT 13186.74 43781.90 0.301
## company_locationLU -21220.47 39604.69 -0.536
## company_locationLV -62628.01 36742.73 -1.705
## company_locationMA -89649.62 57389.00 -1.562
## company_locationMD -41852.39 57356.78 -0.730
## company_locationMK -122179.01 57964.18 -2.108
## company_locationMT -50553.87 55137.88 -0.917
## company_locationMX -4810.68 31838.21 -0.151
## company_locationMY -23131.46 55216.38 -0.419
## company_locationNG 52664.69 37213.12 1.415
## company_locationNL -15599.82 31276.03 -0.499
## company_locationNZ 32032.76 58976.03 0.543
## company_locationPH -2275.09 55086.25 -0.041
## company_locationPK -46107.52 37883.63 -1.217
## company_locationPL -43009.43 35484.50 -1.212
## company_locationPR 51579.07 36739.75 1.404
## company_locationPT -55330.78 30780.42 -1.798
## company_locationRO -27557.80 44494.86 -0.619
## company_locationRU -39803.51 40136.68 -0.992
## company_locationSE 9354.43 45439.55 0.206
## company_locationSG -58235.99 41434.85 -1.405
## company_locationSI -40000.82 36964.75 -1.082
## company_locationSK -116218.17 66181.31 -1.756
## company_locationTH -50007.00 39971.50 -1.251
## company_locationTR -69637.22 35225.33 -1.977
## company_locationUA -67667.13 38508.31 -1.757
## company_locationUS 50443.14 28115.27 1.794
## company_locationVN -48138.98 55214.29 -0.872
## company_sizeM 1649.94 2809.86 0.587
## company_sizeS -24299.47 5249.44 -4.629
## Pr(>|t|)
## (Intercept) 0.418699
## experience_levelEX < 2e-16 ***
## experience_levelMI 9.85e-09 ***
## experience_levelSE < 2e-16 ***
## employment_typeFL 0.051631 .
## employment_typeFT 0.917759
## employment_typePT 0.375007
## job_titleAI Developer 0.006855 **
## job_titleAI Programmer 0.966739
## job_titleAI Scientist 0.277950
## job_titleAnalytics Engineer 0.528612
## job_titleApplied Data Scientist 0.169760
## job_titleApplied Machine Learning Engineer 0.589188
## job_titleApplied Machine Learning Scientist 0.513343
## job_titleApplied Scientist 0.121598
## job_titleAutonomous Vehicle Technician 0.612858
## job_titleAzure Data Engineer 0.610575
## job_titleBI Analyst 0.820738
## job_titleBI Data Analyst 0.874025
## job_titleBI Data Engineer 0.633474
## job_titleBI Developer 0.969165
## job_titleBig Data Architect 0.612085
## job_titleBig Data Engineer 0.540688
## job_titleBusiness Data Analyst 0.942954
## job_titleBusiness Intelligence Engineer 0.406387
## job_titleCloud Data Architect 0.057354 .
## job_titleCloud Data Engineer 0.185496
## job_titleCloud Database Engineer 0.638064
## job_titleCompliance Data Analyst 0.362967
## job_titleComputer Vision Engineer 0.148195
## job_titleComputer Vision Software Engineer 0.677267
## job_titleData Analyst 0.872033
## job_titleData Analytics Consultant 0.687637
## job_titleData Analytics Engineer 0.898523
## job_titleData Analytics Lead 0.006598 **
## job_titleData Analytics Manager 0.708422
## job_titleData Analytics Specialist 0.408645
## job_titleData Architect 0.454371
## job_titleData DevOps Engineer 0.802965
## job_titleData Engineer 0.580243
## job_titleData Infrastructure Engineer 0.183409
## job_titleData Lead 0.136197
## job_titleData Management Specialist 0.940340
## job_titleData Manager 0.933080
## job_titleData Modeler 0.722515
## job_titleData Operations Analyst 0.312632
## job_titleData Operations Engineer 0.604602
## job_titleData Quality Analyst 0.288133
## job_titleData Science Consultant 0.997838
## job_titleData Science Engineer 0.853603
## job_titleData Science Lead 0.280036
## job_titleData Science Manager 0.104554
## job_titleData Science Tech Lead 7.85e-05 ***
## job_titleData Scientist 0.492310
## job_titleData Scientist Lead 0.308727
## job_titleData Specialist 0.896561
## job_titleData Strategist 0.483802
## job_titleDeep Learning Engineer 0.598354
## job_titleDeep Learning Researcher 0.542274
## job_titleDirector of Data Science 0.126237
## job_titleETL Developer 0.726342
## job_titleETL Engineer 0.824388
## job_titleFinance Data Analyst 0.699849
## job_titleFinancial Data Analyst 0.990455
## job_titleHead of Data 0.093708 .
## job_titleHead of Data Science 0.426577
## job_titleHead of Machine Learning 0.917201
## job_titleInsight Analyst 0.761454
## job_titleLead Data Analyst 0.930132
## job_titleLead Data Engineer 0.444923
## job_titleLead Data Scientist 0.332306
## job_titleLead Machine Learning Engineer 0.518520
## job_titleMachine Learning Developer 0.764948
## job_titleMachine Learning Engineer 0.271142
## job_titleMachine Learning Infrastructure Engineer 0.229462
## job_titleMachine Learning Manager 0.553272
## job_titleMachine Learning Research Engineer 0.756952
## job_titleMachine Learning Researcher 0.513153
## job_titleMachine Learning Scientist 0.123431
## job_titleMachine Learning Software Engineer 0.037412 *
## job_titleManager Data Management 0.893502
## job_titleMarketing Data Analyst 0.089525 .
## job_titleMarketing Data Engineer 0.200999
## job_titleML Engineer 0.235362
## job_titleMLOps Engineer 0.653994
## job_titleNLP Engineer 0.299530
## job_titlePower BI Developer 0.887218
## job_titlePrincipal Data Analyst 0.641068
## job_titlePrincipal Data Architect 0.956309
## job_titlePrincipal Data Engineer 0.249359
## job_titlePrincipal Data Scientist 0.023520 *
## job_titlePrincipal Machine Learning Engineer 0.350815
## job_titleProduct Data Analyst 0.851778
## job_titleProduct Data Scientist 0.808887
## job_titleResearch Engineer 0.182955
## job_titleResearch Scientist 0.143686
## job_titleSoftware Data Engineer 0.487052
## job_titleStaff Data Analyst 0.020473 *
## job_titleStaff Data Scientist 0.616431
## remote_ratio 0.210786
## company_locationAL 0.460193
## company_locationAM 0.628344
## company_locationAR 0.657235
## company_locationAS 0.704989
## company_locationAT 0.485040
## company_locationAU 0.658790
## company_locationBA 0.897989
## company_locationBE 0.872739
## company_locationBO 0.180625
## company_locationBR 0.231884
## company_locationBS 0.962364
## company_locationCA 0.290666
## company_locationCF 0.498901
## company_locationCH 0.814142
## company_locationCL 0.436362
## company_locationCN 0.932707
## company_locationCO 0.445578
## company_locationCR 0.577763
## company_locationCZ 0.142732
## company_locationDE 0.965433
## company_locationDK 0.375840
## company_locationDZ 0.360554
## company_locationEE 0.187427
## company_locationEG 0.085668 .
## company_locationES 0.179172
## company_locationFI 0.079061 .
## company_locationFR 0.468847
## company_locationGB 0.953116
## company_locationGH 0.405095
## company_locationGR 0.597561
## company_locationHK 0.716766
## company_locationHN 0.702216
## company_locationHR 0.873935
## company_locationHU 0.276203
## company_locationID 0.587905
## company_locationIE 0.738425
## company_locationIL 0.000187 ***
## company_locationIN 0.151383
## company_locationIQ 0.210958
## company_locationIR 0.243157
## company_locationIT 0.585170
## company_locationJP 0.404288
## company_locationKE 0.944102
## company_locationLT 0.763286
## company_locationLU 0.592125
## company_locationLV 0.088374 .
## company_locationMA 0.118343
## company_locationMD 0.465630
## company_locationMK 0.035114 *
## company_locationMT 0.359276
## company_locationMX 0.879907
## company_locationMY 0.675297
## company_locationNG 0.157091
## company_locationNL 0.617966
## company_locationNZ 0.587061
## company_locationPH 0.967059
## company_locationPK 0.223653
## company_locationPL 0.225568
## company_locationPR 0.160434
## company_locationPT 0.072325 .
## company_locationRO 0.535727
## company_locationRU 0.321411
## company_locationSE 0.836908
## company_locationSG 0.159964
## company_locationSI 0.279266
## company_locationSK 0.079164 .
## company_locationTH 0.210992
## company_locationTR 0.048129 *
## company_locationUA 0.078968 .
## company_locationUS 0.072873 .
## company_locationVN 0.383345
## company_sizeM 0.557108
## company_sizeS 3.81e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 47370 on 3582 degrees of freedom
## Multiple R-squared: 0.4616, Adjusted R-squared: 0.4357
## F-statistic: 17.85 on 172 and 3582 DF, p-value: < 2.2e-16
Los coeficientes para experience_levelEX, experience_levelMI, y experience_levelSE son significativos y tienen valores p muy bajos. Esto sugiere que el nivel de experiencia tiene un impacto considerable en el salario
Tambien en los job_title job_titleAI Developer, job_titleData Analytics Lead, y job_titleData Science Tech Lead, tienen coeficientes positivos significativos.por ende tendria sueldo mas altos
new_data1 <- data.frame(experience_level = "SE", employment_type = "FT", job_title = "Data Scientist", remote_ratio = 100, company_location = "US", company_size = "L")
predicted_salary1 <- predict(model, newdata = new_data1)
predicted_salary1
## 1
## 159328.3
Primera Predicción (predicted_salary1): El salario estimado para un perfil con nivel de experiencia “SE” (Senior), tipo de empleo “FT” (Tiempo Completo), título de trabajo “Data Scientist”, trabajando remotamente (100%), en una empresa grande en EE.UU. (“US”) y de tamaño “L” (Grande), es aproximadamente 159,328.3 USD.
new_data2 <- data.frame(experience_level = "MI", employment_type = "PT", job_title = "Data Analyst", remote_ratio = 50, company_location = "CA", company_size = "S")
predicted_salary2 <- predict(model, newdata = new_data2)
Segunda Predicción (predicted_salary2): Para un perfil con menos experiencia “MI” (Medio), en un empleo a tiempo parcial “PT”, título de trabajo “Data Analyst”, con un ratio de trabajo remoto de 50%, en una empresa pequeña en Canadá (“CA”) y de tamaño “S” (Pequeño), el salario estimado es aproximadamente 39,918.56 USD.