Exploring Data Science Salaries

Edward Harvey, Yecheng Cao, Xinyang Liu

2023-05-09

Exploring Data Science Salaries

As we and fellow members of our class pursue careers related to data, we thought it would be interesting to the relationship between salary and various job characteristics. Significant findings could potentially influence career decisions for ourselves, our peers, and our future employers. In particular we wanted to explore the relationship between salary and:

-Fully remote vs. not fully remote work

-Level of experience

-Full time work vs. part time or freelance/contract work

-Geographic location

In addition, we wanted to develop an App so that you can see your future salary in data science based on your choices on various job characteristics.

Data description

AI, ML, Data salaries: Salary trends in AI, ML, Data around the world from 2020-2023

The data source is available with a worldwide contribution size, being collected and updated continuously from 2020 to the present time (usually on a weekly basis). The dataset is published in the public domain, users can access and download the dataset easily.

Downloaded from Kaggle.com

The data

experience_level employment_type job_title salary_currency salary_in_usd employee_residence company_location company_size remote_status
MI FT Other USD 258000 US US L Fully Remote
SE FT Data Scientist USD 225000 US US M Not Fully Remote
SE FT Data Scientist USD 156400 US US M Not Fully Remote
SE FT Data Engineer USD 190000 US US M Fully Remote
SE FT Data Engineer USD 150000 US US M Fully Remote
SE FT Data Scientist USD 196000 US US M Not Fully Remote
SE FT Data Scientist USD 121000 US US M Not Fully Remote
SE FT Data Scientist USD 219000 US US M Not Fully Remote
SE FT Data Scientist USD 141000 US US M Not Fully Remote
SE FT Data Engineer USD 230000 US US M Not Fully Remote
SE FT Data Engineer USD 206000 US US M Not Fully Remote
SE FT Other USD 192000 US US M Not Fully Remote
SE FT Other USD 164000 US US M Not Fully Remote
MI FT Machine Learning Engineer USD 300000 Other Other M Not Fully Remote
MI FT Machine Learning Engineer USD 260000 Other Other M Not Fully Remote
SE FT Data Analyst USD 147000 US US M Fully Remote
SE FT Data Analyst USD 92000 US US M Fully Remote
SE FT Machine Learning Engineer USD 200000 US US M Not Fully Remote

Exploring the data

We took a first look at the data to check for outliers in salary

There are some salaries that are surprisingly low, but perhaps not outliers in the sense that they are obviously data errors.

Inspecting lower salary values

salary_in_usd company_location salary_currency employment_type
7799 Other BRL FT
7500 Other USD CT
7000 Other USD FT
6359 Other INR FT
6304 Other EUR FT
6270 Other BRL FT
6072 Other INR FT
6072 Other INR FT
5882 Other INR FT
5723 Other INR FT
5707 Other INR FT
5679 US INR FT
5409 Other INR FT
5409 Other INR PT
5132 Other CZK FT

Filtering the data, we see that many of these are from countries with lower incomes, such as India, The Philippines, Brazil, and the Czech Republic, or contract work in the US or Europe. While there are a few surprisingly low incomes in the US and Europe, we will not exclude them.

A comparison of company size

Comparing Company Location

Comparing experience level

Violin Plot Takeaways

Heatmap in Shiny

Shiny applications not supported in static R Markdown documents

Heatmap conclusions (part 1)

Upon holding the experience_level constant on the X-axis, the observations listed below emerge:

  1. (Y-axis: experience_level) Expert Executive-level (EX) and Intermediate Senior-level (SE) professionals earn the highest average salaries, whereas Entry-level (EN) individuals receive the lowest.

  2. (Y-axis: company_location) In US, GB, and other countries, EX-level employees have the highest earnings, while SE-level professionals in Canada earn even more than EX-level workers.

  3. (Y-axis: company_size) In terms of company size, employees in medium-sized firms have the highest overall earnings. Large organizations offer higher salaries to entry-level, junior-level, and senior-level workers than small companies, but expert-level employees in small firms earn more than those in large organizations.

Heatmap conclusions (part 2)

  1. (Y-axis: remote_ratio) Data scientists with minimal remote work (less than 20%) have the highest overall salaries, followed by those working entirely remotely (over 80%). Expert-level employees, however, earn the most in fully remote companies. Partially remote companies provide the lowest compensation.

  2. (Y-axis: employment_type) EX and SE-level employees command the highest salaries in Contract (CT) roles, while EN and MI-level workers earn the most in Full-time (FT) positions. Part-time (PT) and Freelance (FL) employees generally receive significantly lower pay.

  3. (Y-axis: job_title) With regard to job titles, expert-level data engineers, other professionals, and data scientists are the highest earners. Data analysts typically receive the lowest compensation. Data engineers enjoy near-top salaries across all positions, from entry-level to expert-level.

Multi-way Analysis of Variance (ANOVA)

Analysis of Variance Table
Df Sum Sq Mean Sq F value Pr(>F)
company_location 3 3.824557e+12 1.274852e+12 507.67 0.00
company_size 2 8.471606e+10 4.235803e+10 16.87 0.00
employment_type 3 5.981473e+10 1.993824e+10 7.94 0.00
experience_level 3 1.464579e+12 4.881931e+11 194.41 0.00
job_title 12 9.444936e+11 7.870780e+10 31.34 0.00
remote_status 1 4.792904e+09 4.792904e+09 1.91 0.17

Salary Prediction in Shiny App

Shiny applications not supported in static R Markdown documents

Considerations for further analysis