Hello,
The “Data Science Salaries” dataset from Kaggle provides valuable insights into the compensation trends and variations in the field of data science from 2020 to 2023. This dataset encompasses a comprehensive collection of salary information from various industries, organizations, and geographic regions, enabling data professionals, researchers, and organizations to analyze and understand the prevailing salary landscape in the data science domain during this four-year period.
By examining this dataset, one can gain a deeper understanding of the factors influencing data science salaries, such as job roles, experience levels, educational backgrounds, and geographical locations. The dataset serves as a valuable resource for individuals seeking career guidance, companies aiming to benchmark their compensation strategies, and researchers investigating the evolving dynamics of the data science job market.
For any comments, please contact:
Duncan Kabiito Matovu,
Mobile +256787755590; Email:
duncanmatovu@gmail.com
Indeed there are some extreme values that we shall not remove at this point. I request we just dive in to interrogate further: What could be the hiring companies (different companies pay differently), what’s the level of expertise, Job title, etc
| Average salary (yearly) paid to data professionals by county | |
| Location | Salary in USD (average) |
|---|---|
| Israel | 217,332 |
| Puerto Rico | 167,500 |
| United States | 158,462 |
| Saudi Arabia | 134,999 |
| Canada | 134,550 |
| New Zealand | 125,000 |
| Australia | 122,134 |
| Bosnia and Herzegovina | 120,000 |
| Russian Federation | 119,500 |
| Ireland | 115,188 |
| Japan | 110,822 |
| United Kingdom | 108,425 |
| Switzerland | 101,659 |
| Algeria | 100,000 |
| China | 100,000 |
| Iran, Islamic Republic of | 100,000 |
| Iraq | 100,000 |
| United Arab Emirates | 100,000 |
| Sweden | 98,791 |
| Mexico | 94,865 |
| Lithuania | 94,812 |
| Germany | 92,568 |
| Norway | 88,462 |
| Kenya | 80,000 |
| France | 78,390 |
| Belgium | 76,865 |
| Croatia | 76,726 |
| Netherlands | 75,470 |
| Ukraine | 72,667 |
| Austria | 71,355 |
Job title | Level | Salary in USD (Average) |
|---|---|---|
AI Scientist | Expert | 417,937 |
Data Scientist | Intermediate | 119,059 |
Lead Machine Learning Engineer | Expert | 115,000 |
Job title | Level | Salary in USD (Average) |
|---|---|---|
Data Engineer | Expert | 167,500 |
Machine Learning Engineer | Expert | 167,500 |
I was quite amazed by the level of expertise and involvement in data prerequisites in United States and Canada
They both seem to have a versatile level of experts, but much of your focus should be placed on the number of people in the fields of Data Engineer to Machine learning Engineers
Job title | No. of experts |
|---|---|
Data Engineer | 558 |
Data Scientist | 470 |
Data Analyst | 363 |
Machine Learning Engineer | 217 |
Analytics Engineer | 109 |
Research Scientist | 81 |
Data Architect | 76 |
Data Science Manager | 53 |
ML Engineer | 51 |
Applied Scientist | 48 |
Job title | No. of experts |
|---|---|
Data Scientist | 27 |
Data Analyst | 16 |
Data Engineer | 12 |
Machine Learning Engineer | 8 |
Research Scientist | 5 |
Machine Learning Software Engineer | 4 |
Analytics Engineer | 2 |
Business Intelligence Developer | 2 |
Data Architect | 2 |
Data Strategist | 2 |
Of course this is a sample dataset but it sort of paints a picture
##
## Call:
## lm(formula = salary_in_usd ~ factor(expertise_level) + factor(year) +
## factor(job_title), data = new_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -128944 -33973 -4646 31100 353898
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 122806 15180 8.090 1.53e-15 ***
## factor(expertise_level)Expert -18694 10752 -1.739 0.082363 .
## factor(expertise_level)Intermediate -63759 11017 -5.787 9.26e-09 ***
## factor(expertise_level)Junior -88202 11739 -7.514 1.17e-13 ***
## factor(year)2021 -4337 12054 -0.360 0.719054
## factor(year)2022 18022 10560 1.707 0.088178 .
## factor(year)2023 33188 10502 3.160 0.001620 **
## factor(job_title)Data Modeler -4813 23525 -0.205 0.837924
## factor(job_title)Data Modeller -9183 37132 -0.247 0.804722
## factor(job_title)Data Science Lead 60136 16740 3.592 0.000342 ***
## factor(job_title)Data Scientist 30512 3230 9.446 < 2e-16 ***
## factor(job_title)Data Specialist 4931 12244 0.403 0.687257
## factor(job_title)Head of Data Science 47809 18179 2.630 0.008659 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 52220 on 1128 degrees of freedom
## Multiple R-squared: 0.319, Adjusted R-squared: 0.3118
## F-statistic: 44.04 on 12 and 1128 DF, p-value: < 2.2e-16
*** represents significance
The Adjusted R-squared indicates how much of the variation is explained by the model in respect to the dependent variable salaries across the independent variables used
##
## Call:
## lm(formula = salary_in_usd ~ factor(expertise_level) + factor(job_title) +
## factor(company_size), data = new_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -126437 -34682 -4508 29097 344208
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 132879.0 12090.6 10.990 < 2e-16 ***
## factor(expertise_level)Expert -19796.8 10767.1 -1.839 0.066232 .
## factor(expertise_level)Intermediate -66768.3 11009.3 -6.065 1.80e-09 ***
## factor(expertise_level)Junior -87624.2 11800.0 -7.426 2.20e-13 ***
## factor(job_title)Data Modeler -481.2 23537.3 -0.020 0.983694
## factor(job_title)Data Modeller -3707.0 37169.5 -0.100 0.920574
## factor(job_title)Data Science Lead 63646.8 16763.6 3.797 0.000154 ***
## factor(job_title)Data Scientist 30530.3 3237.4 9.430 < 2e-16 ***
## factor(job_title)Data Specialist 5200.7 12276.2 0.424 0.671911
## factor(job_title)Head of Data Science 46989.3 18184.9 2.584 0.009892 **
## factor(company_size)Medium 20648.4 4961.0 4.162 3.39e-05 ***
## factor(company_size)Small -21923.5 9018.6 -2.431 0.015216 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 52310 on 1129 degrees of freedom
## Multiple R-squared: 0.3161, Adjusted R-squared: 0.3094
## F-statistic: 47.44 on 11 and 1129 DF, p-value: < 2.2e-16
*** represents significance
The Adjusted R-squared indicates how much of the variation is explained by the model in respect to the dependent variable salaries across the independent variables used
## Likelihood ratio test
##
## Model 1: salary_in_usd ~ factor(expertise_level) + factor(year) + factor(job_title)
## Model 2: salary_in_usd ~ factor(expertise_level) + factor(job_title) +
## factor(company_size)
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 14 -14008
## 2 13 -14010 -1 4.9047 0.02678 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With a P -Value of 0.02678 which is less than 0.05 at 95% Confidence Interval, we shall reject the null hypothesis and conclude that model 2 provides better predictions than model 1