What Factors Best Predict Data Science Job Salaries?

Ali Yigit Ozdemir

Economic Question & Motivation

Economic Question

What factors best predict data science job salaries?

Motivation

  • Data science is a rapidly growing field
  • Salaries vary significantly across workers
  • Understanding salary determinants is important for workers and firms
  • Related to labor economics and wage determination

Dataset Description

Source: Kaggle Data Science Job Salaries

Outcome Variable: salary_in_usd

Key Predictors

  • Experience level
  • Employment type
  • Job title
  • Company size
  • Remote ratio
  • Company location

Dataset Size: 607 observations

Probability Analysis

Summary Statistics

  • Mean Salary: $112,298
  • Median Salary: $101,570

Observation

  • Mean > Median
  • Salary distribution is right-skewed

Interpretation

A small number of very high salaries increase the average salary.

Distribution Analysis

Original Distribution

  • Right-skewed
  • Several high-income outliers

Log Transformation

  • More symmetric distribution
  • Better approximation of a normal shape

Conclusion

Salary appears approximately log-normal.

Models

Model 1: Simple Linear Regression

Predictors:

  • Experience level
  • Employment type
  • Company size
  • Remote ratio

Model 2: Full Linear Regression

Additional predictors:

  • Job title
  • Company location

Model Comparison

Metric Model 1 Model 2
RMSE 51,913 38,697
0.255 0.568

Result: Model 2 performs better.

  • Lower prediction error
  • Higher explanatory power

Cross-Validation Results

5-Fold Cross-Validation

  • RMSE: 55,540
  • R²: 0.445

Interpretation

  • Performance is weaker than the test set
  • Possible mild overfitting
  • Model remains useful for prediction

Economic Interpretation

Main Findings

  • Experience level matters
  • Job title matters
  • Company location matters
  • Company size matters

Example

  • Senior workers earn more than entry-level workers
  • Some job titles are associated with higher salaries
  • US-based positions tend to have higher salaries

Limitations & Future Research

Limitations

  • Only 607 observations
  • No education information
  • No exact years of experience variable
  • No purchasing power adjustment

Future Research

Does remote work reduce salary differences between countries?

Conclusion

Answer to the Research Question

Data science salaries are influenced by:

  • Experience level
  • Job title
  • Company location
  • Company size
  • Employment characteristics

Final Result

The full linear regression model provided the best predictive performance.

Thank you.