Data Science is a field combining statistics, programming, and domain expertise to analyze and extract insights from data.
The data science field is rapidly growing, with salaries for these jobs showing
significant variation. This project aims to explore which factors, such as job
title, experience, and company size, most impact data scientists' salaries.
In data science can vary widely based on different factors. This project
identifies the factors with the greatest influence on salary levels, including
job title, experience, and company size.
1. Explore how job title, experience level, and company size affect salaries in data science.
2. Highlight the most important factors for securing higher salaries in data science.
library(readxl) # For reading Excel files
library(dplyr) # For data manipulation
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # For visualization
library(scales) # For formatting axes
library(tidyr) # For reshaping data
data <- read_excel("ds_salary.xlsx")
summary(data)
## work_year experience_level job_title salary
## Min. :2020 Length:3040 Length:3040 Min. : 5679
## 1st Qu.:2022 Class :character Class :character 1st Qu.:113900
## Median :2023 Mode :character Mode :character Median :145000
## Mean :2022 Mean :151822
## 3rd Qu.:2023 3rd Qu.:185000
## Max. :2023 Max. :450000
## company_size
## Length:3040
## Class :character
## Mode :character
##
##
##
Interpretation: Senior-Level and Executive positions tend to have higher median salaries.
Variations are also observed across job titles, experience levels and company sizes.
top_titles <- data %>%
count(job_title, sort = TRUE) %>%
top_n(10, n) %>%
pull(job_title)
data_top_titles <- data %>%
filter(job_title %in% top_titles) %>%
group_by(job_title) %>%
filter(between(salary, quantile(salary, 0.25) - 1.5 * IQR(salary),
quantile(salary, 0.75) + 1.5 * IQR(salary))) %>%
ungroup()
# Order job titles by median salary
job_title_order <- data_top_titles %>%
group_by(job_title) %>%
summarize(median_salary = median(salary, na.rm = TRUE)) %>%
arrange(median_salary) %>%
pull(job_title)
data_top_titles$job_title <- factor(data_top_titles$job_title, levels = job_title_order)
# Plot
ggplot(data_top_titles, aes(x = job_title, y = salary)) +
geom_boxplot(fill = "skyblue", outlier.shape = NA) +
scale_y_continuous(labels = dollar_format(prefix = "$", suffix = "k", scale = 1e-3),
breaks = seq(0, 400000, by = 50000)) +
labs(title = "Average Salary by Top Job Titles (Ascending Order)",
x = "Job Title",
y = "Salary (USD)") +
theme_minimal()
Interpretation: Certain job titles (e.g., "Principal Data Scientist") show
consistently higher salaries. Specialization in these roles leads to greater
compensation.
data_experience <- data %>%
group_by(experience_level) %>%
filter(between(salary, quantile(salary, 0.25) - 1.5 * IQR(salary),
quantile(salary, 0.75) + 1.5 * IQR(salary))) %>%
ungroup()
# Order experience levels by median salary
experience_order <- data_experience %>%
group_by(experience_level) %>%
summarize(median_salary = median(salary, na.rm = TRUE)) %>%
arrange(median_salary) %>%
pull(experience_level)
data_experience$experience_level <- factor(data_experience$experience_level, levels = experience_order)
# Plot
ggplot(data_experience, aes(x = experience_level, y = salary)) +
geom_boxplot(fill = "skyblue", outlier.shape = NA) +
scale_y_continuous(labels = dollar_format(prefix = "$", suffix = "k", scale = 1e-3),
breaks = seq(0, 400000, by = 50000)) +
labs(title = "Salary by Experience Level (Ascending Order)",
x = "Experience Level",
y = "Salary (USD)") +
theme_minimal()
Interpretation: Salaries increase significantly with experience, with Executive
and Senior-Level roles earning the most.
df <- data
numeric_data <- df %>%
dplyr::select(where(is.numeric))
print(numeric_data)
## # A tibble: 3,040 × 2
## work_year salary
## <dbl> <dbl>
## 1 2023 30000
## 2 2023 25500
## 3 2023 222200
## 4 2023 136000
## 5 2023 147100
## 6 2023 90700
## 7 2023 130000
## 8 2023 100000
## 9 2023 213660
## 10 2023 130760
## # ℹ 3,030 more rows
Interpretation: Useful for understanding relationships but doesn’t imply causation.
# Regression Model
model <- lm(salary ~ experience_level + job_title + company_size, data = df)
summary(model)
##
## Call:
## lm(formula = salary ~ experience_level + job_title + company_size,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -145029 -31417 -5695 25341 291207
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 176857.7 48622.7 3.637
## experience_levelEX 88961.2 6507.8 13.670
## experience_levelMI 22198.1 4340.8 5.114
## experience_levelSE 46958.6 3942.2 11.912
## job_titleAI Scientist -74959.2 51746.8 -1.449
## job_titleAnalytics Engineer -66277.2 48647.6 -1.362
## job_titleApplied Data Scientist 22437.2 55948.0 0.401
## job_titleApplied Machine Learning Engineer -70000.0 68350.4 -1.024
## job_titleApplied Machine Learning Scientist -67263.1 51303.1 -1.311
## job_titleApplied Scientist -29149.9 48846.5 -0.597
## job_titleBI Analyst -105792.9 52283.4 -2.023
## job_titleBI Data Analyst -113836.7 52276.3 -2.178
## job_titleBI Data Engineer -117801.9 68488.1 -1.720
## job_titleBI Developer -87617.3 50218.6 -1.745
## job_titleBig Data Engineer -106857.7 68557.0 -1.559
## job_titleBusiness Data Analyst -91671.3 51768.8 -1.771
## job_titleBusiness Intelligence Engineer -50610.5 54094.1 -0.936
## job_titleCloud Data Architect 26183.7 68493.4 0.382
## job_titleCloud Data Engineer -24505.4 68782.2 -0.356
## job_titleCloud Database Engineer -69571.7 53010.4 -1.312
## job_titleComputer Vision Engineer -7941.6 50380.6 -0.158
## job_titleComputer Vision Software Engineer -85900.4 59323.4 -1.448
## job_titleData Analyst -97106.7 48414.6 -2.006
## job_titleData Analytics Consultant -86801.3 59352.5 -1.462
## job_titleData Analytics Engineer -76555.8 59294.9 -1.291
## job_titleData Analytics Lead 181183.7 68493.4 2.645
## job_titleData Analytics Manager -76936.4 49496.1 -1.554
## job_titleData Analytics Specialist -129760.5 59246.5 -2.190
## job_titleData Architect -60678.9 48636.6 -1.248
## job_titleData Engineer -68101.8 48405.7 -1.407
## job_titleData Infrastructure Engineer -33201.8 52210.2 -0.636
## job_titleData Lead -12260.5 59246.5 -0.207
## job_titleData Manager -89011.3 49362.8 -1.803
## job_titleData Modeler -105860.5 59246.5 -1.787
## job_titleData Operations Analyst -134198.0 54094.1 -2.481
## job_titleData Operations Engineer -111419.2 50700.0 -2.198
## job_titleData Quality Analyst -141897.2 52263.9 -2.715
## job_titleData Science Consultant -99407.7 50069.8 -1.985
## job_titleData Science Engineer -104760.5 59246.5 -1.768
## job_titleData Science Lead -27757.9 52261.3 -0.531
## job_titleData Science Manager -24693.6 48843.0 -0.506
## job_titleData Science Tech Lead 151183.7 68493.4 2.207
## job_titleData Scientist -61698.6 48418.5 -1.274
## job_titleData Scientist Lead -40816.3 68493.4 -0.596
## job_titleData Specialist -94419.3 50070.3 -1.886
## job_titleDeep Learning Engineer -58901.0 54079.3 -1.089
## job_titleDirector of Data Science 28084.0 54378.1 0.516
## job_titleETL Developer -73576.8 51324.3 -1.434
## job_titleFinancial Data Analyst -90304.5 55869.4 -1.616
## job_titleHead of Data -26422.0 52444.2 -0.504
## job_titleHead of Data Science -43004.4 53160.7 -0.809
## job_titleLead Data Analyst -98976.0 55925.6 -1.770
## job_titleLead Data Engineer -38773.9 55906.0 -0.694
## job_titleLead Data Scientist -31090.0 55984.1 -0.555
## job_titleMachine Learning Developer 3142.3 68557.0 0.046
## job_titleMachine Learning Engineer -43615.3 48491.4 -0.899
## job_titleMachine Learning Infrastructure Engineer -34137.5 51699.2 -0.660
## job_titleMachine Learning Manager -49760.5 59246.5 -0.840
## job_titleMachine Learning Researcher -80000.0 54035.8 -1.481
## job_titleMachine Learning Scientist -30515.0 49525.0 -0.616
## job_titleMachine Learning Software Engineer -7360.5 53003.6 -0.139
## job_titleManager Data Management -98816.3 68493.4 -1.443
## job_titleML Engineer -39608.4 49229.5 -0.805
## job_titleMLOps Engineer -71000.0 54035.8 -1.314
## job_titleNLP Engineer -35757.4 53004.0 -0.675
## job_titlePrincipal Data Analyst -54760.5 68396.6 -0.801
## job_titlePrincipal Data Engineer -31788.4 59277.0 -0.536
## job_titlePrincipal Data Scientist 37200.9 54190.8 0.686
## job_titlePrincipal Machine Learning Engineer -33816.3 68493.4 -0.494
## job_titleProduct Data Analyst -68901.0 59233.0 -1.163
## job_titleResearch Engineer -40492.0 49153.5 -0.824
## job_titleResearch Scientist -34874.5 48787.0 -0.715
## job_titleStaff Data Scientist -119760.5 68396.6 -1.751
## company_sizeM 944.2 3470.6 0.272
## company_sizeS -39310.9 7675.0 -5.122
## Pr(>|t|)
## (Intercept) 0.00028 ***
## experience_levelEX < 2e-16 ***
## experience_levelMI 3.36e-07 ***
## experience_levelSE < 2e-16 ***
## job_titleAI Scientist 0.14756
## job_titleAnalytics Engineer 0.17318
## job_titleApplied Data Scientist 0.68842
## job_titleApplied Machine Learning Engineer 0.30586
## job_titleApplied Machine Learning Scientist 0.18993
## job_titleApplied Scientist 0.55071
## job_titleBI Analyst 0.04312 *
## job_titleBI Data Analyst 0.02951 *
## job_titleBI Data Engineer 0.08553 .
## job_titleBI Developer 0.08114 .
## job_titleBig Data Engineer 0.11918
## job_titleBusiness Data Analyst 0.07670 .
## job_titleBusiness Intelligence Engineer 0.34955
## job_titleCloud Data Architect 0.70228
## job_titleCloud Data Engineer 0.72166
## job_titleCloud Database Engineer 0.18948
## job_titleComputer Vision Engineer 0.87476
## job_titleComputer Vision Software Engineer 0.14772
## job_titleData Analyst 0.04498 *
## job_titleData Analytics Consultant 0.14372
## job_titleData Analytics Engineer 0.19677
## job_titleData Analytics Lead 0.00821 **
## job_titleData Analytics Manager 0.12020
## job_titleData Analytics Specialist 0.02859 *
## job_titleData Architect 0.21228
## job_titleData Engineer 0.15956
## job_titleData Infrastructure Engineer 0.52487
## job_titleData Lead 0.83607
## job_titleData Manager 0.07146 .
## job_titleData Modeler 0.07407 .
## job_titleData Operations Analyst 0.01316 *
## job_titleData Operations Engineer 0.02805 *
## job_titleData Quality Analyst 0.00667 **
## job_titleData Science Consultant 0.04719 *
## job_titleData Science Engineer 0.07713 .
## job_titleData Science Lead 0.59536
## job_titleData Science Manager 0.61319
## job_titleData Science Tech Lead 0.02737 *
## job_titleData Scientist 0.20267
## job_titleData Scientist Lead 0.55128
## job_titleData Specialist 0.05943 .
## job_titleDeep Learning Engineer 0.27617
## job_titleDirector of Data Science 0.60557
## job_titleETL Developer 0.15180
## job_titleFinancial Data Analyst 0.10612
## job_titleHead of Data 0.61443
## job_titleHead of Data Science 0.41861
## job_titleLead Data Analyst 0.07687 .
## job_titleLead Data Engineer 0.48802
## job_titleLead Data Scientist 0.57871
## job_titleMachine Learning Developer 0.96345
## job_titleMachine Learning Engineer 0.36849
## job_titleMachine Learning Infrastructure Engineer 0.50911
## job_titleMachine Learning Manager 0.40104
## job_titleMachine Learning Researcher 0.13885
## job_titleMachine Learning Scientist 0.53784
## job_titleMachine Learning Software Engineer 0.88956
## job_titleManager Data Management 0.14921
## job_titleML Engineer 0.42113
## job_titleMLOps Engineer 0.18897
## job_titleNLP Engineer 0.49997
## job_titlePrincipal Data Analyst 0.42341
## job_titlePrincipal Data Engineer 0.59181
## job_titlePrincipal Data Scientist 0.49246
## job_titlePrincipal Machine Learning Engineer 0.62154
## job_titleProduct Data Analyst 0.24483
## job_titleResearch Engineer 0.41013
## job_titleResearch Scientist 0.47477
## job_titleStaff Data Scientist 0.08005 .
## company_sizeM 0.78560
## company_sizeS 3.22e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 48330 on 2965 degrees of freedom
## Multiple R-squared: 0.2738, Adjusted R-squared: 0.2556
## F-statistic: 15.1 on 74 and 2965 DF, p-value: < 2.2e-16
# Multiple Regression Model
model <- lm(salary ~ experience_level + job_title + company_size, data = df)
summary(model)
##
## Call:
## lm(formula = salary ~ experience_level + job_title + company_size,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -145029 -31417 -5695 25341 291207
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 176857.7 48622.7 3.637
## experience_levelEX 88961.2 6507.8 13.670
## experience_levelMI 22198.1 4340.8 5.114
## experience_levelSE 46958.6 3942.2 11.912
## job_titleAI Scientist -74959.2 51746.8 -1.449
## job_titleAnalytics Engineer -66277.2 48647.6 -1.362
## job_titleApplied Data Scientist 22437.2 55948.0 0.401
## job_titleApplied Machine Learning Engineer -70000.0 68350.4 -1.024
## job_titleApplied Machine Learning Scientist -67263.1 51303.1 -1.311
## job_titleApplied Scientist -29149.9 48846.5 -0.597
## job_titleBI Analyst -105792.9 52283.4 -2.023
## job_titleBI Data Analyst -113836.7 52276.3 -2.178
## job_titleBI Data Engineer -117801.9 68488.1 -1.720
## job_titleBI Developer -87617.3 50218.6 -1.745
## job_titleBig Data Engineer -106857.7 68557.0 -1.559
## job_titleBusiness Data Analyst -91671.3 51768.8 -1.771
## job_titleBusiness Intelligence Engineer -50610.5 54094.1 -0.936
## job_titleCloud Data Architect 26183.7 68493.4 0.382
## job_titleCloud Data Engineer -24505.4 68782.2 -0.356
## job_titleCloud Database Engineer -69571.7 53010.4 -1.312
## job_titleComputer Vision Engineer -7941.6 50380.6 -0.158
## job_titleComputer Vision Software Engineer -85900.4 59323.4 -1.448
## job_titleData Analyst -97106.7 48414.6 -2.006
## job_titleData Analytics Consultant -86801.3 59352.5 -1.462
## job_titleData Analytics Engineer -76555.8 59294.9 -1.291
## job_titleData Analytics Lead 181183.7 68493.4 2.645
## job_titleData Analytics Manager -76936.4 49496.1 -1.554
## job_titleData Analytics Specialist -129760.5 59246.5 -2.190
## job_titleData Architect -60678.9 48636.6 -1.248
## job_titleData Engineer -68101.8 48405.7 -1.407
## job_titleData Infrastructure Engineer -33201.8 52210.2 -0.636
## job_titleData Lead -12260.5 59246.5 -0.207
## job_titleData Manager -89011.3 49362.8 -1.803
## job_titleData Modeler -105860.5 59246.5 -1.787
## job_titleData Operations Analyst -134198.0 54094.1 -2.481
## job_titleData Operations Engineer -111419.2 50700.0 -2.198
## job_titleData Quality Analyst -141897.2 52263.9 -2.715
## job_titleData Science Consultant -99407.7 50069.8 -1.985
## job_titleData Science Engineer -104760.5 59246.5 -1.768
## job_titleData Science Lead -27757.9 52261.3 -0.531
## job_titleData Science Manager -24693.6 48843.0 -0.506
## job_titleData Science Tech Lead 151183.7 68493.4 2.207
## job_titleData Scientist -61698.6 48418.5 -1.274
## job_titleData Scientist Lead -40816.3 68493.4 -0.596
## job_titleData Specialist -94419.3 50070.3 -1.886
## job_titleDeep Learning Engineer -58901.0 54079.3 -1.089
## job_titleDirector of Data Science 28084.0 54378.1 0.516
## job_titleETL Developer -73576.8 51324.3 -1.434
## job_titleFinancial Data Analyst -90304.5 55869.4 -1.616
## job_titleHead of Data -26422.0 52444.2 -0.504
## job_titleHead of Data Science -43004.4 53160.7 -0.809
## job_titleLead Data Analyst -98976.0 55925.6 -1.770
## job_titleLead Data Engineer -38773.9 55906.0 -0.694
## job_titleLead Data Scientist -31090.0 55984.1 -0.555
## job_titleMachine Learning Developer 3142.3 68557.0 0.046
## job_titleMachine Learning Engineer -43615.3 48491.4 -0.899
## job_titleMachine Learning Infrastructure Engineer -34137.5 51699.2 -0.660
## job_titleMachine Learning Manager -49760.5 59246.5 -0.840
## job_titleMachine Learning Researcher -80000.0 54035.8 -1.481
## job_titleMachine Learning Scientist -30515.0 49525.0 -0.616
## job_titleMachine Learning Software Engineer -7360.5 53003.6 -0.139
## job_titleManager Data Management -98816.3 68493.4 -1.443
## job_titleML Engineer -39608.4 49229.5 -0.805
## job_titleMLOps Engineer -71000.0 54035.8 -1.314
## job_titleNLP Engineer -35757.4 53004.0 -0.675
## job_titlePrincipal Data Analyst -54760.5 68396.6 -0.801
## job_titlePrincipal Data Engineer -31788.4 59277.0 -0.536
## job_titlePrincipal Data Scientist 37200.9 54190.8 0.686
## job_titlePrincipal Machine Learning Engineer -33816.3 68493.4 -0.494
## job_titleProduct Data Analyst -68901.0 59233.0 -1.163
## job_titleResearch Engineer -40492.0 49153.5 -0.824
## job_titleResearch Scientist -34874.5 48787.0 -0.715
## job_titleStaff Data Scientist -119760.5 68396.6 -1.751
## company_sizeM 944.2 3470.6 0.272
## company_sizeS -39310.9 7675.0 -5.122
## Pr(>|t|)
## (Intercept) 0.00028 ***
## experience_levelEX < 2e-16 ***
## experience_levelMI 3.36e-07 ***
## experience_levelSE < 2e-16 ***
## job_titleAI Scientist 0.14756
## job_titleAnalytics Engineer 0.17318
## job_titleApplied Data Scientist 0.68842
## job_titleApplied Machine Learning Engineer 0.30586
## job_titleApplied Machine Learning Scientist 0.18993
## job_titleApplied Scientist 0.55071
## job_titleBI Analyst 0.04312 *
## job_titleBI Data Analyst 0.02951 *
## job_titleBI Data Engineer 0.08553 .
## job_titleBI Developer 0.08114 .
## job_titleBig Data Engineer 0.11918
## job_titleBusiness Data Analyst 0.07670 .
## job_titleBusiness Intelligence Engineer 0.34955
## job_titleCloud Data Architect 0.70228
## job_titleCloud Data Engineer 0.72166
## job_titleCloud Database Engineer 0.18948
## job_titleComputer Vision Engineer 0.87476
## job_titleComputer Vision Software Engineer 0.14772
## job_titleData Analyst 0.04498 *
## job_titleData Analytics Consultant 0.14372
## job_titleData Analytics Engineer 0.19677
## job_titleData Analytics Lead 0.00821 **
## job_titleData Analytics Manager 0.12020
## job_titleData Analytics Specialist 0.02859 *
## job_titleData Architect 0.21228
## job_titleData Engineer 0.15956
## job_titleData Infrastructure Engineer 0.52487
## job_titleData Lead 0.83607
## job_titleData Manager 0.07146 .
## job_titleData Modeler 0.07407 .
## job_titleData Operations Analyst 0.01316 *
## job_titleData Operations Engineer 0.02805 *
## job_titleData Quality Analyst 0.00667 **
## job_titleData Science Consultant 0.04719 *
## job_titleData Science Engineer 0.07713 .
## job_titleData Science Lead 0.59536
## job_titleData Science Manager 0.61319
## job_titleData Science Tech Lead 0.02737 *
## job_titleData Scientist 0.20267
## job_titleData Scientist Lead 0.55128
## job_titleData Specialist 0.05943 .
## job_titleDeep Learning Engineer 0.27617
## job_titleDirector of Data Science 0.60557
## job_titleETL Developer 0.15180
## job_titleFinancial Data Analyst 0.10612
## job_titleHead of Data 0.61443
## job_titleHead of Data Science 0.41861
## job_titleLead Data Analyst 0.07687 .
## job_titleLead Data Engineer 0.48802
## job_titleLead Data Scientist 0.57871
## job_titleMachine Learning Developer 0.96345
## job_titleMachine Learning Engineer 0.36849
## job_titleMachine Learning Infrastructure Engineer 0.50911
## job_titleMachine Learning Manager 0.40104
## job_titleMachine Learning Researcher 0.13885
## job_titleMachine Learning Scientist 0.53784
## job_titleMachine Learning Software Engineer 0.88956
## job_titleManager Data Management 0.14921
## job_titleML Engineer 0.42113
## job_titleMLOps Engineer 0.18897
## job_titleNLP Engineer 0.49997
## job_titlePrincipal Data Analyst 0.42341
## job_titlePrincipal Data Engineer 0.59181
## job_titlePrincipal Data Scientist 0.49246
## job_titlePrincipal Machine Learning Engineer 0.62154
## job_titleProduct Data Analyst 0.24483
## job_titleResearch Engineer 0.41013
## job_titleResearch Scientist 0.47477
## job_titleStaff Data Scientist 0.08005 .
## company_sizeM 0.78560
## company_sizeS 3.22e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 48330 on 2965 degrees of freedom
## Multiple R-squared: 0.2738, Adjusted R-squared: 0.2556
## F-statistic: 15.1 on 74 and 2965 DF, p-value: < 2.2e-16
Interpretation: Provides insights into causal relationships while accounting for multiple factors. It helps identify which variables are most influential and quantify their effects on salary.
Our analysis showed that variables: experience level, job title, and company size were statistically significant predictors of salary (p-value < 0.05). These factors had the most substantial impact on salary variations, while company size played a lesser role.
Experience level has the highest influence on data science salary.
Specialized and senior roles offer higher compensation.
Job title has a moderate influence on salaries.
- Prioritize Skill Building: Support learning in specialized fields like machine learning to enhance career growth and salary potential.
- Define Salary Ranges: Structure pay based on experience and role to attract skilled talent and stay competitive.