Introduction
This challenging project for DATA607 focuses on the team building, collaboration and leadership required to succeed in the field of data science and analytics. We are instructed to work closely in groups toward the task of answering a specific question: “Which are the most valued data science skills?” Our team quickly established a rapport and communications channel on #slack then set to the task with gusto and shared responsibility. We chose as our leader Duubar Villalobos Jimenez, and under his coordination we assigned a variety of tasks and deadlines to accomplish our goal. Of course, data is required. We chose to collect Data Science salary data from the website Paysa, which lists a wide variety of tech postings, the skills associated with each posting and several salary components. Details follow, but our analysis shows the Machine Learning is the most-requested skill in our sample of 390 job listings, while the highest-valued skill, measured by mean compensation, was Strategy - reflecting the higher pay for management, leadership and vision in this rapidly evolving profession.
Team members
Name | Team | |
---|---|---|
Pavan Akula | Team 3 | akulapavan@hotmail.com |
Ambra Baboni Alexander | Team 3 | ambra8due@hotmail.com |
Thomas Detzel | Team 3 | tomdetz@gmail.com |
Dilip Ganesan | Team 3 | dilipgan@gmail.com |
Kyle Gilde | Team 3 | kylegilde@gmail.com |
Raghunathan Rammnath | Team 3 | raghu74us@gmail.com |
Duubar Villalobos Jimenez | Team 3 | mydvtech@gmail.com |
Our Process
Workspace preparation
Create vector with all needed libraries.
load_packages <- c(
"knitr",
"RMySQL",
"tidyverse",
"tidyr",
"dplyr",
"stringr",
"plotly",
"htmlTable",
"stringr",
"prettydoc",
"shinythemes",
"treemap",
"data.tree",
"janitor",
"ggplot2",
"ggthemes",
"stats"
)
Organization and Communication
As a team, we had a brainstorm meetup in which some roles and lines of work were defined. Following are the most important agreements.
Github
We agreed to create a GitHub repository, D607-Group-Project. All team members had access and were able to post and read from a single repository location.
Slack
We agreed to use Slack as our Team Collaboration platform. From Slack we were able to perform live meetups with “join.me”, which allows screen sharing and presentations to update and explore specific topics and problems. For example, we were able to look at code and discuss problems and refinements.
Google Docs
We created a spreadsheet in google docs to list deadlines and responsibilities.
Data
After some preliminary research and analysis, we decided to collect data from current data science job postings. We identified several web sites that offered raw information on position, location, company, salary and skills. In the end selected Paysa (https://www.paysa.com) because it offered the most comprehesive set of variables.
Limitations
Please note that this data has been collected from a single source and relates to job postings extant on March 14, 2017. No assumption should be made for past or future data science skills or any other conclusion we might end up for this project. A different sample from the same source will likely produce different results.
Collection
We were unable to find data in a table, csv file or other structured format. In addition, the Paysa proprietors declined our request to provide sample data for the project. We attempted to scrape the data, but Paysa has designed its web pages to prevent scraping. Because we did not require a huge sample of data for this process, we cut and pasted Paysa data into a text file that we then cleaned, organized and refined. We imported that raw text data into a SQL database for permanent storage, then exported it to R to conduct our analysys. The import-export process also was conducted via R. In total, we collected 390 job postings that listed 95 overall skills. Because some of those skills were the same but named differently, we collapsed the list into 35 unique skills and a miscellaneous group of skills that appeared fewer than 10 times in the data.
We named our raw text file paysa.txt and uploaded it to GitHub.
Data Preparation
Once we had our paysa.txt file with our desired information, we proceeded to extract valuable information from it and created a data frame. The code was shared among us to continue further cleaning and tidying.
Import
This code reads our raw scraped text into R from GitHub.
url <- "https://raw.githubusercontent.com/dilipganesan/D607-Group-Project/patch-1/scripts/paysa.txt"
mystring = read_file(url, locale=default_locale())
Sample of the input file.
## Data Scientist Job Results, showing 6K recent job postings
## Update Saved SearchNotification Settings
## 0%
## MATCH
## Head of SBG Data Science Engineering Logo
## Head of SBG Data Science Engineering
## Intuit
## EXPECTED
## $338K
## MARKET SALARY
## Base Salary$253K
## Annual Bonus$86K
## Signing Bonus$24K
## Annual Equity$0
## APPLY NOW
## Head of SBG Data Science Engineering at Intuit in Mountain View, CA
## How you match this job:
## Add more skills to see how you match this job
## You can learn valuable new skills like: Distributed Systems, Big Data, Algorithms, Data Science, Strategy, Databases and more.
## Jobs at Intuit Jobs in Mountain View, CA
## 0%
## MATCH
## Principal Lead Data Scientist Logo
## Principal Lead Data Scientist
## Akamai Technologies
## EXPECTED
## $317K
## MARKET SALARY
## Base Salary$204K
## Annual Bonus$27K
## Annual Equity$86K
## Signing B
From text file to data frame
This code uses string handling to separate the text into an initial set of columns.
Sample of the initial data frame.
ID | applyposition | Base | Annual | signing | expected | skillset | location |
---|---|---|---|---|---|---|---|
1 | APPLY NOW Head of SBG Data Science Engineering at Intuit in Mountain View, CA | Base Salary$253K | Annual Bonus$86K | Signing Bonus$24K | EXPECTED $338K | You can learn valuable new skills like: Distributed Systems, Big Data, Algorithms, Data Science, Strategy, Databases and more. | Jobs in Mountain View, CA |
2 | APPLY NOW Principal Lead Data Scientist at Akamai Technologies in Santa Clara, CA | Base Salary$204K | Annual Bonus$27K | Signing Bonus$18K | EXPECTED $317K | You can learn valuable new skills like: Hadoop, Data Mining, Machine Learning, Python, Matlab, Ruby and more. | Jobs in Santa Clara, CA |
3 | APPLY NOW Principal Lead Data Scientist at Akamai Technologies in Santa Clara, CA | Base Salary$204K | Annual Bonus$27K | Signing Bonus$18K | EXPECTED $317K | You can learn valuable new skills like: Hadoop, Data Mining, Big Data, Algorithms, Machine Learning, Python and more. | Jobs in Santa Clara, CA |
4 | APPLY NOW Director, Data Scientist at Dropbox in San Francisco, CA | Base Salary$183K | Annual Bonus$0 | Signing Bonus$30K | EXPECTED $289K | You can learn valuable new skills like: Data Mining, Algorithms, Machine Learning, Data Science, SQL, Analytics and more. | Jobs in San Francisco, CA |
5 | APPLY NOW Principal Data Scientist at Microsoft in Bellevue, WA | Base Salary$200K | Annual Bonus$49K | Signing Bonus$19K | EXPECTED $289K | You can learn valuable new skills like: Hadoop, Data Mining, Optimization, Algorithms, MapReduce, C++ and more. | Jobs in Bellevue, WA |
6 | APPLY NOW Principal Data Scientist at Microsoft in Redmond, WA | Base Salary$198K | Annual Bonus$53K | Signing Bonus$19K | EXPECTED $287K | You can learn valuable new skills like: Algorithms, Machine Learning, C++, Python, Deep Learning, Data Science and more. | Jobs in Redmond, WA |
MySQL
Next, we stored this initial data set on a remote MySQL server. We provided clear instructions for all team members about how to read tables into R.
The remote server setup is at MySQL URL mydvtech.com. We chose a commercial site because in our day-to-day work we will be reading and storing company data in company portals. This was good practice or all. Step-by-step instructions for connecting using MySQL server and cPanel were created in a manual distributed to the team.
http://rpubs.com/dvillalobos/confMySQLcPanel
Writing data into MySQL
This code connects to our remote MySQL server and writes a data frame into a table.
writeMySQLTable <- function(my.data = NULL, myLocalTableName = NULL){
# Creating a schema if it doesn't exist by employing RMySQL() in R
mydbconnection <- dbConnect(MySQL(),
user = myLocalUser,
password = myLocalPassword,
host = myLocalHost)
MySQLcode <- paste0("CREATE SCHEMA IF NOT EXISTS ",myLocalMySQLSchema,";",sep="")
dbSendQuery(mydbconnection, MySQLcode)
# Write our data frame into MySQL
mydbconnection <- dbConnect(MySQL(),
user = myLocalUser,
password = myLocalPassword,
host = myLocalHost,
dbname = myLocalMySQLSchema)
myLocalTableName <- tolower(myLocalTableName)
MySQLcode <- paste0("DROP TABLE IF EXISTS ",myLocalTableName,";",sep="")
dbSendQuery(mydbconnection, MySQLcode)
dbWriteTable(mydbconnection, name= myLocalTableName , value= my.data)
# Closing connection with local Schema
dbDisconnect(mydbconnection)
# Close all other open connections we might have
lapply( dbListConnections( dbDriver( drv = "MySQL")), dbDisconnect)
}
Reading data from MySQL
For analysis and testing, we were encouraged to read from our MySQL instead of GitHub (our backup plan is we had problems with MySQL). One advantage of using a remote MySQL is the speed in terms of reading and transfering data; we noticed an incredible amount of resources employed when reading from our GitHub repository versus MySQL.
The following code connects to MySQL server and read the data stored into a table in a data frame into R.
readMySQLTable <- function(myLocalTableName = NULL){
# Connecting to a schema by employing RMySQL() in R
mydbconnection <- dbConnect(MySQL(),
user = myLocalUser,
password = myLocalPassword,
host = myLocalHost,
dbname = myLocalMySQLSchema)
# Check to see if our table exists? and read our data
myLocalTableName <- tolower(myLocalTableName)
if (dbExistsTable(mydbconnection, name = myLocalTableName) == TRUE){
slookup <- dbReadTable(mydbconnection, name = myLocalTableName)
}
# Closing connection with local Schema
dbDisconnect(mydbconnection)
#To close all open connections
lapply( dbListConnections( dbDriver( drv = "MySQL")), dbDisconnect)
return(slookup)
}
Tidying and transformation
We divided our tidying and transformation into several tasks as follows..
New cleaned table results.
ID | applyposition | Base | Annual | signing | expected | Skill1 | Skill2 | Skill3 | Skill4 | Skill5 | Skill6 | location |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Head of SBG Data Science Engineering at Intuit in Mountain View, CA | 253K | 86K | 24K | 338K | Distributed Systems | Big Data | Algorithms | Data Science | Strategy | Databases | Mountain View, CA |
2 | Principal Lead Data Scientist at Akamai Technologies in Santa Clara, CA | 204K | 27K | 18K | 317K | Hadoop | Data Mining | Machine Learning | Python | Matlab | Ruby | Santa Clara, CA |
3 | Principal Lead Data Scientist at Akamai Technologies in Santa Clara, CA | 204K | 27K | 18K | 317K | Hadoop | Data Mining | Big Data | Algorithms | Machine Learning | Python | Santa Clara, CA |
4 | Director, Data Scientist at Dropbox in San Francisco, CA | 183K | 30K | 289K | Data Mining | Algorithms | Machine Learning | Data Science | SQL | Analytics | San Francisco, CA | |
5 | Principal Data Scientist at Microsoft in Bellevue, WA | 200K | 49K | 19K | 289K | Hadoop | Data Mining | Optimization | Algorithms | MapReduce | C++ | Bellevue, WA |
6 | Principal Data Scientist at Microsoft in Redmond, WA | 198K | 53K | 19K | 287K | Algorithms | Machine Learning | C++ | Python | Deep Learning | Data Science | Redmond, WA |
Continuing with our clean up, we created new variable names, fixed dollar values and separated city and state.
Tidy table
Below is our tidy table with a total of 2217 skills. (In a subsequent step, we collapse repetitive skills (i.e., “C” and “C++”.)
ID | Position | Company | City | State | Skills | Type | Salary |
---|---|---|---|---|---|---|---|
1 | Head of SBG Data Science Engineering | Intuit | Mountain View | CA | Distributed Systems | Base Salary | 253000 |
2 | Principal Lead Data Scientist | Akamai Technologies | Santa Clara | CA | Hadoop | Base Salary | 204000 |
3 | Principal Lead Data Scientist | Akamai Technologies | Santa Clara | CA | Hadoop | Base Salary | 204000 |
4 | Director, Data Scientist | Dropbox | San Francisco | CA | Data Mining | Base Salary | 183000 |
5 | Principal Data Scientist | Microsoft | Bellevue | WA | Hadoop | Base Salary | 200000 |
6 | Principal Data Scientist | Microsoft | Redmond | WA | Algorithms | Base Salary | 198000 |
Analysis
Skills
From our gathered data we have a total of 72 skills.
Skills | Count | Percentage | Rank |
---|---|---|---|
Machine Learning | 260 | 11.73 % | 1 |
Data Science | 207 | 9.34 % | 2 |
Algorithms | 183 | 8.25 % | 3 |
Hadoop | 182 | 8.21 % | 4 |
Big Data | 177 | 7.98 % | 5 |
Python | 123 | 5.55 % | 6 |
Data Mining | 119 | 5.37 % | 7 |
Optimization | 97 | 4.38 % | 8 |
Analytics | 85 | 3.83 % | 9 |
C++ | 71 | 3.2 % | 10 |
Management | 61 | 2.75 % | 11 |
SQL | 56 | 2.53 % | 12 |
Statistics | 53 | 2.39 % | 13 |
Matlab | 40 | 1.8 % | 14 |
Scala | 38 | 1.71 % | 15 |
Product Management | 33 | 1.49 % | 16 |
MapReduce | 31 | 1.4 % | 17 |
Strategy | 27 | 1.22 % | 18 |
Architectures | 22 | 0.99 % | 19 |
Technical Leadership | 19 | 0.86 % | 20 |
Deep Learning | 18 | 0.81 % | 22 |
Distributed Systems | 18 | 0.81 % | 22 |
Information Retrieval | 18 | 0.81 % | 22 |
AWS | 17 | 0.77 % | 24 |
ETL | 16 | 0.72 % | 25 |
Relational Databases | 15 | 0.68 % | 26 |
User Experience | 14 | 0.63 % | 28 |
Windows | 14 | 0.63 % | 28 |
Java | 13 | 0.59 % | 30 |
Ruby | 13 | 0.59 % | 30 |
Scalability | 13 | 0.59 % | 30 |
REST | 11 | 0.5 % | 32 |
Computer Vision | 10 | 0.45 % | 34 |
Leadership | 10 | 0.45 % | 34 |
Software Design | 10 | 0.45 % | 34 |
Apache Spark | 8 | 0.36 % | 36 |
Databases | 8 | 0.36 % | 36 |
C | 7 | 0.32 % | 39 |
Search | 7 | 0.32 % | 39 |
Time Series Analysis | 7 | 0.32 % | 39 |
Architecture | 6 | 0.27 % | 42 |
Automation | 6 | 0.27 % | 42 |
Natural Language Processing | 6 | 0.27 % | 42 |
PHP | 6 | 0.27 % | 42 |
Android | 5 | 0.23 % | 46 |
Game Development | 5 | 0.23 % | 46 |
OS X | 5 | 0.23 % | 46 |
Mathematics | 4 | 0.18 % | 48 |
Scripting | 4 | 0.18 % | 48 |
Business Intelligence | 3 | 0.14 % | 52 |
Cassandra | 3 | 0.14 % | 52 |
Functional Programming | 3 | 0.14 % | 52 |
Go | 3 | 0.14 % | 52 |
MySQL | 3 | 0.14 % | 52 |
Enterprise Software | 2 | 0.09 % | 58 |
Image Processing | 2 | 0.09 % | 58 |
LAMP | 2 | 0.09 % | 58 |
Recommender Systems | 2 | 0.09 % | 58 |
Signal Processing | 2 | 0.09 % | 58 |
Tomcat | 2 | 0.09 % | 58 |
Algorithm Design | 1 | 0.05 % | 66 |
Data Science Scripting | 1 | 0.05 % | 66 |
EMPTY | 1 | 0.05 % | 66 |
Engineering Management | 1 | 0.05 % | 66 |
Firewalls | 1 | 0.05 % | 66 |
HTTP | 1 | 0.05 % | 66 |
Mathematical Modeling | 1 | 0.05 % | 66 |
Network Architecture | 1 | 0.05 % | 66 |
Optimization Data Science | 1 | 0.05 % | 66 |
Product Design Data Science | 1 | 0.05 % | 66 |
Test Driven Development | 1 | 0.05 % | 66 |
Web Services | 1 | 0.05 % | 66 |
Top 10 most desired skills by employers, by count of raw skill names.
Compensation (Exploratory Analysis)
The Paysa data lists Base Salary, Annual Bonus, Signing Bonus and a total called Expected Salary, providing a way to another way to measure value beyond raw counts.
Tree Analysis
Another way of Analysing the above data is by performing a tree analysis. This tree shows Total Salary for jobs associated with Machine Learning skills in Washington and California.
Following are the highest-paid jobs in the data.
Total Salary
Table
Position | Company | City | State | Type | Salary | Rank |
---|---|---|---|---|---|---|
Head of SBG Data Science Engineering | Intuit | Mountain View | CA | Expected Salary | 338000 | 1 |
Principal Data Science Manager | Microsoft | Redmond | WA | Expected Salary | 322000 | 2 |
Principal Lead Data Scientist | Akamai Technologies | Santa Clara | CA | Expected Salary | 317000 | 3 |
Data Science and Analytics Lead, Global Revenue Acceleration | Mountain View | CA | Expected Salary | 305000 | 4 | |
Director, Data Scientist | Dropbox | San Francisco | CA | Expected Salary | 289000 | 5 |
Principal Data Scientist | Microsoft | Bellevue | WA | Expected Salary | 289000 | 6 |
Principal Data Scientist | Microsoft | Redmond | WA | Expected Salary | 287000 | 7 |
Director | Unknown | Vienna | VA | Expected Salary | 285000 | 8 |
Sr. Principal Data Scientist (A/B Platform) | Coupang | Palo Alto | CA | Expected Salary | 281000 | 9 |
Manager of Data Science | Yelp | San Francisco | CA | Expected Salary | 278000 | 10 |
Chart
Top paid skills
The below combination of desired skills by employers will generate the top Total Salary of $338000.
Top Paid Skills | Listed in top most desired skills by employers |
---|---|
Distributed Systems | FALSE |
Big Data | TRUE |
Algorithms | TRUE |
Data Science | TRUE |
Strategy | FALSE |
Databases | FALSE |
The below tree will present the top two highest-paid salaries and respective skills.
Signing Bonus
Table
Position | Company | City | State | Type | Salary | Rank |
---|---|---|---|---|---|---|
Principal Data Science Engineer | OpenTable | San Francisco | CA | Signing Salary | 43000 | 1 |
Data Scientist, Population and Survey Sciences | Menlo Park | CA | Signing Salary | 42000 | 2 | |
Data Scientist, Infrastructure | Menlo Park | CA | Signing Salary | 42000 | 3 | |
Data Scientist- Consumer Insights | Menlo Park | CA | Signing Salary | 42000 | 4 | |
Data Visualization Scientist | Menlo Park | CA | Signing Salary | 42000 | 5 | |
Data Scientist | Menlo Park | CA | Signing Salary | 42000 | 6 | |
Data Scientist, Auction & Delivery | Menlo Park | CA | Signing Salary | 42000 | 7 | |
Software Development Mgr - Advertising Analytics and Data Science | Amazon | Seattle | WA | Signing Salary | 40000 | 8 |
Infrastructure Data Scientist & Strategy Analyst | Menlo Park | CA | Signing Salary | 40000 | 9 | |
Senior Data Scientist | Amazon | Seattle | WA | Signing Salary | 39000 | 10 |
Chart
Top paid skills
The below combination of desired skills by employers will generate the top Signing Bonus of $43000.
Top Paid Skills | Listed in top most desired skills by employers |
---|---|
Product Management | FALSE |
Hadoop | TRUE |
Software Design | FALSE |
Information Retrieval | FALSE |
Machine Learning | TRUE |
ETL | FALSE |
The below tree will present the top two highest-paid signing bonus and respective skills.
Annual Bonus
Table
Position | Company | City | State | Type | Salary | Rank |
---|---|---|---|---|---|---|
Head of SBG Data Science Engineering | Intuit | Mountain View | CA | Annual Salary | 86000 | 1 |
Principal Data Science Manager | Microsoft | Redmond | WA | Annual Salary | 59000 | 2 |
Principal Data Scientist | Microsoft | Redmond | WA | Annual Salary | 53000 | 3 |
Principal Data Scientist | Microsoft | Bellevue | WA | Annual Salary | 49000 | 4 |
Principal Data Scientist Architect | SAP | Palo Alto | CA | Annual Salary | 49000 | 5 |
Data Science and Analytics Lead, Global Revenue Acceleration | Mountain View | CA | Annual Salary | 48000 | 6 | |
Principal Data Science Engineer | OpenTable | San Francisco | CA | Annual Salary | 46000 | 7 |
Legal- Firmwide Initiatives Team OLO Data Scientist- VP | JPMorgan Chase | New York | NY | Annual Salary | 46000 | 8 |
CIB-Rapid Prototyping Data Scientist | Unknown | New York | NY | Annual Salary | 46000 | 9 |
Director | Unknown | Vienna | VA | Annual Salary | 45000 | 10 |
Chart
Top paid skills
The below combination of desired skills by employers will generate the top Annual Bonus of $86000.
Top Paid Skills | Listed in top most desired skills by employers |
---|---|
Distributed Systems | FALSE |
Big Data | TRUE |
Algorithms | TRUE |
Data Science | TRUE |
Strategy | FALSE |
Databases | FALSE |
The below tree will present the top two highest paid salaries and respective skills.
Base Salary
Table
Position | Company | City | State | Type | Salary | Rank |
---|---|---|---|---|---|---|
Head of Data Science and Engineering | Amazon | Seattle | WA | Base Salary | 265000 | 1 |
Head of SBG Data Science Engineering | Intuit | Mountain View | CA | Base Salary | 253000 | 2 |
Director | Unknown | Vienna | VA | Base Salary | 240000 | 3 |
Principal Lead Data Scientist | Akamai Technologies | Santa Clara | CA | Base Salary | 204000 | 4 |
Head of Data Science, Liquidity | First Republic Bank | San Francisco | CA | Base Salary | 202000 | 5 |
Principal Data Scientist | Microsoft | Bellevue | WA | Base Salary | 200000 | 6 |
Chief Data Scientist, Brilliant Manufacturing Job | GE | Seattle | WA | Base Salary | 199000 | 7 |
Principal Data Scientist | Microsoft | Redmond | WA | Base Salary | 198000 | 8 |
Corporate - Firmwide Forecasting & Analysis - Data Scientist/Engineer, Vice President | JPMorgan Chase | New York | NY | Base Salary | 196000 | 9 |
Data Scientist, State Street Global Exchange, Vice President | State Street Corporation | New York | NY | Base Salary | 195000 | 10 |
Chart
Top paid skills
The below combination of desired skills by employers will generate the top Base Salary of $265000.
Top Paid Skills | Listed in top most desired skills by employers |
---|---|
Product Management | FALSE |
Machine Learning | TRUE |
Data Science | TRUE |
Analytics | FALSE |
Statistics | FALSE |
The below tree will present the top two highest-paid base salaries and respective skills.
Combined Table of Top paid Skills
The table below displays which skills associate with the highest-paid compensation categories. For example, Data Science, Algorithms and Big Data are associated with the highest Expected (total) Salary. Skills in Machine Learning and Hadoop are associated with the highest Signing Bonuses.
Rank | Skills | Total Salary | Signing Bonus | Annual Bonus | Base Salary |
---|---|---|---|---|---|
1 | Machine Learning | TRUE | TRUE | ||
2 | Data Science | TRUE | TRUE | TRUE | |
3 | Algorithms | TRUE | TRUE | ||
4 | Hadoop | TRUE | |||
5 | Big Data | TRUE | TRUE | ||
6 | Python | ||||
7 | Data Mining | ||||
8 | Optimization | ||||
9 | Analytics | ||||
10 | C++ |
Maps
Open Positions by State
Open Positions by City
Highest-Valued Skills Measured by Mean Compensation
Some of the highest-valued skills are not the most common skills. They include Strategy, Leadership, Management and Data Science, a catch-all. ETL – for Extract, Transfer and Load – is a critical area of data warehousing.
This part of the analysis looks at the value of skills based on what employers pay rather the frequency of skills in a job posting. To do this, we compute a mean value for each skill across the database.
For example, the job ‘Principle Lead Data Scientist’ at Akamai is associated with six skills: Hadoop, Data Mining, Machine Learning, Python, Matlab and Ruby. The total compensation this job is $317,000. To value each job, we divide total compensation by 6 to get $52,833. We do similar computation for skills in each job, then calculate the overall mean of those values across all jobs for each skill. We then plot those values to rank skills in descending order.
Relative Value of Skills
Using ANOVA, we can compute how much a particular skill adds or subtracts from the mean Expected Salary for Algorithms, the reference level, all other things equal.
For example, the reference mean compensation for Algorithms is $30,323. Having ETL skills adds $9,283 to that mean; Matlab skills are worth $1,282 less.
The chart summarizes the skill values relative to the Algorithm baseline.
Skill | Adjustment |
---|---|
Algorithm (reference) | 30323 |
Analytics | 748 |
Architecture | -2114 |
AWS | -1342 |
Big Data | -1182 |
C++ | -64 |
Computer Vision | -2706 |
Data Mining | 220 |
Data Science | 3037 |
Deep Learning | -2054 |
Distributed Systems | 2631 |
ETL | 9283 |
Hadoop | -1953 |
Information Retrieval | 1972 |
Java | -1051 |
Machine Learning | -401 |
Management | 418 |
MapReduce | -1317 |
Matlab | -1282 |
Misc | 7587 |
Optimization | -224 |
Product Management | -555 |
Python | 1492 |
Relational Databases | -1556 |
REST | -3414 |
Ruby | -3079 |
Scala | -3665 |
Scalability | -2002 |
Software Design | -839 |
SQL | 7387 |
Statistics | 1038 |
Strategy | 13775 |
Technical Leadership | 2340 |
User Experience | 3296 |
Windows | -454 |
The below chart, display a salary weight composition for each desired skills by employers.
Conclusion
We were asked to work as a team to answer the the question: What are the most valued data science skills? Our examination of compensation data for Data Science jobs on the Paysa website looked at “value” in as a function of the frequency of skills advertised and determined that skills such as Machine Learning, Big Data and Algorithms ranked in the top 10. In terms of mean compensation, top skills included expertise in Strategy, ETL, SQL and User Experience. There are many possible ways to value skills; our study suggests that more data and additional refinement of skills into appropriate categories would provide a more confident assessment of the most-valued skills.
Our experience also shows the benefits of working in a collaborative environment, where team members can readily learn from each other and contribute ideas, creating synergy and improving the results. Teams also benefit from strong leadership that guides while giving team members the opportunity to succeed and sometimes fail, but always reach for improvement and professional growth.