Data science jobs have become more common in recent years. Salaries in this field can vary depending on the country, experience level, and type of work. This project examines global salary patterns in data science jobs using real job data.
2026-02-10
Data science jobs have become more common in recent years. Salaries in this field can vary depending on the country, experience level, and type of work. This project examines global salary patterns in data science jobs using real job data.
In this project, we study how location and remote work are related to salary differences in data science jobs.
To address this problem, we focus on the following questions:
The dataset used in this project contains job listings related to data science roles. Each observation represents a single job record. The data include information about job title, salary, experience level, company location, and remote work status. The dataset was obtained from a publicly available online source.
jobs <- jobs_and_salaries nrow(jobs) ## Number of observations
## [1] 9355
ncol(jobs) ## Number of variables
## [1] 12
The data set contains 9355 observations and 12 variables.
## [1] 0
## [1] "work_year" "job_title" "job_category" ## [4] "salary_currency" "salary" "salary_in_usd" ## [7] "employee_residence" "experience_level" "employment_type" ## [10] "work_setting" "company_location" "company_size"
Before analysis, we checked the data for missing values and reviewed the structure of variables. We removed observations with non positive salary values using salaries converted to U.S. dollars. These steps ensure that the dataset is clean and ready for further analysis.
The following code creates a histogram to show the distribution of salaries in the dataset. It uses the cleaned data and the salary_in_usd variable, so all salaries are in the same currency. The histogram groups salaries into bins and shows how many jobs fall into each bin.
ggplot(jobs_clean, aes(x = salary_in_usd)) +
geom_histogram(
bins = 50,
color = "navy",
fill = "lightblue") +
scale_x_continuous(labels = scales::comma) +
labs(
x = "Salary (USD)",
y = "Count",
title = "Distribution of Salaries")
This plot shows how salaries differ by experience level. Higher experience levels generally have higher median salaries and a wider range of pay.
This plot compares salaries across on site, hybrid, and remote roles, showing differences in median pay and variability.
country_salary <- jobs_clean %>%
group_by(company_location) %>%
summarise(
median_salary = median(salary_in_usd, na.rm = TRUE), .groups = "drop")
plot_ly(
data = country_salary,
type = "choropleth",
locations = ~company_location,
locationmode = "country names",
z = ~median_salary,
text = ~paste0(
company_location,
"<br>Median salary: $",
scales::comma(median_salary)),
hoverinfo = "text",
colorscale = "Blues",
colorbar = list(title = "Median Salary (USD)")) %>%
layout (
geo = list(projection= list(type = "equirectangular")))This interactive map shows how median salaries for data science jobs vary across countries.
This interactive map shows where data science jobs are most concentrated globally.
This analysis shows that salary differences in data science jobs are closely related to experience level, work setting, and location. Salaries increase consistently with higher experience levels, with senior roles earning the highest pay. Hybrid roles tend to have lower salaries, while remote and in person positions generally offer higher pay. Median salaries are similar across most countries, although a few countries show higher values. Job opportunities are concentrated in a small number of countries, indicating clear geographic patterns in availability. Overall, experience and location play an important role in salary differences in this dataset.
The complete dataset and more information about it can be found at the link below.