What Factors Best Predict Data Science Job Salaries?
Economic Question & Motivation
Economic Question
What factors best predict data science job salaries?
Motivation
- Data science is a rapidly growing field
- Salaries vary significantly across workers
- Understanding salary determinants is important for workers and firms
- Related to labor economics and wage determination
Dataset Description
Source: Kaggle Data Science Job Salaries
Outcome Variable: salary_in_usd
Key Predictors
- Experience level
- Employment type
- Job title
- Company size
- Remote ratio
- Company location
Dataset Size: 607 observations
Probability Analysis
Summary Statistics
- Mean Salary: $112,298
- Median Salary: $101,570
Observation
- Mean > Median
- Salary distribution is right-skewed
Interpretation
A small number of very high salaries increase the average salary.
Distribution Analysis
Original Distribution
- Right-skewed
- Several high-income outliers
Log Transformation
- More symmetric distribution
- Better approximation of a normal shape
Conclusion
Salary appears approximately log-normal.
Models
Model 1: Simple Linear Regression
Predictors:
- Experience level
- Employment type
- Company size
- Remote ratio
Model 2: Full Linear Regression
Additional predictors:
- Job title
- Company location
Model Comparison
| RMSE |
51,913 |
38,697 |
| R² |
0.255 |
0.568 |
Result: Model 2 performs better.
- Lower prediction error
- Higher explanatory power
Cross-Validation Results
5-Fold Cross-Validation
Interpretation
- Performance is weaker than the test set
- Possible mild overfitting
- Model remains useful for prediction
Economic Interpretation
Main Findings
- Experience level matters
- Job title matters
- Company location matters
- Company size matters
Example
- Senior workers earn more than entry-level workers
- Some job titles are associated with higher salaries
- US-based positions tend to have higher salaries
Limitations & Future Research
Limitations
- Only 607 observations
- No education information
- No exact years of experience variable
- No purchasing power adjustment
Future Research
Does remote work reduce salary differences between countries?
Conclusion
Answer to the Research Question
Data science salaries are influenced by:
- Experience level
- Job title
- Company location
- Company size
- Employment characteristics
Final Result
The full linear regression model provided the best predictive performance.
Thank you.