2026-02-10

Introduction

Data science jobs have become more common in recent years. Salaries in this field can vary depending on the country, experience level, and type of work. This project examines global salary patterns in data science jobs using real job data.

What is the problem?

In this project, we study how location and remote work are related to salary differences in data science jobs.
To address this problem, we focus on the following questions:

  • How do salaries differ across countries?
  • Are remote jobs paid differently than on-site jobs?
  • Are there geographic patterns in job availability?

Data Set Description

The dataset used in this project contains job listings related to data science roles. Each observation represents a single job record. The data include information about job title, salary, experience level, company location, and remote work status. The dataset was obtained from a publicly available online source.

jobs <- jobs_and_salaries
nrow(jobs) ## Number of observations
## [1] 9355
ncol(jobs) ## Number of variables
## [1] 12

The data set contains 9355 observations and 12 variables.

Data Verification and Cleaning

## [1] 0
##  [1] "work_year"          "job_title"          "job_category"      
##  [4] "salary_currency"    "salary"             "salary_in_usd"     
##  [7] "employee_residence" "experience_level"   "employment_type"   
## [10] "work_setting"       "company_location"   "company_size"

Before analysis, we checked the data for missing values and reviewed the structure of variables. We removed observations with non positive salary values using salaries converted to U.S. dollars. These steps ensure that the dataset is clean and ready for further analysis.

Exploring the Data

The following code creates a histogram to show the distribution of salaries in the dataset. It uses the cleaned data and the salary_in_usd variable, so all salaries are in the same currency. The histogram groups salaries into bins and shows how many jobs fall into each bin.

ggplot(jobs_clean, aes(x = salary_in_usd)) +
  geom_histogram(
    bins = 50,
    color = "navy",
    fill = "lightblue") +
  scale_x_continuous(labels = scales::comma) +
  labs(
    x = "Salary (USD)",
    y = "Count",
    title = "Distribution of Salaries")

Distribution of Salaries

Salaries by Experience Level


This plot shows how salaries differ by experience level. Higher experience levels generally have higher median salaries and a wider range of pay.

Salaries by Work Setting


This plot compares salaries across on site, hybrid, and remote roles, showing differences in median pay and variability.

Creating the Salary Map

country_salary <- jobs_clean %>%
  group_by(company_location) %>%
  summarise(
    median_salary = median(salary_in_usd, na.rm = TRUE), .groups = "drop")
plot_ly(
  data = country_salary,
  type = "choropleth",
  locations = ~company_location,
  locationmode = "country names",
  z = ~median_salary,
  text = ~paste0(
    company_location,
    "<br>Median salary: $",
    scales::comma(median_salary)),
  hoverinfo = "text",
  colorscale = "Blues",
  colorbar = list(title = "Median Salary (USD)")) %>%
  layout (
    geo = list(projection= list(type = "equirectangular")))

Median Salary by Country

This interactive map shows how median salaries for data science jobs vary across countries.

Job Availability by Country


This interactive map shows where data science jobs are most concentrated globally.

Key Findings

  • Most salaries in our dataset fall between 100,000 and 200,000 dollars.
  • Higher experience levels generally have higher median salaries.
  • Median pay for in-person and remote jobs are fairly similar.
  • Median salaries are similar across most countries, with a few countries showing higher values.
  • Job availability is concentrated in a small number of countries, while most countries show fewer jobs.

Conclusion

This analysis shows that salary differences in data science jobs are closely related to experience level, work setting, and location. Salaries increase consistently with higher experience levels, with senior roles earning the highest pay. Hybrid roles tend to have lower salaries, while remote and in person positions generally offer higher pay. Median salaries are similar across most countries, although a few countries show higher values. Job opportunities are concentrated in a small number of countries, indicating clear geographic patterns in availability. Overall, experience and location play an important role in salary differences in this dataset.

Data Source