Introduction

This challenging project for DATA607 focuses on the team building, collaboration and leadership required to succeed in the field of data science and analytics. We are instructed to work closely in groups toward the task of answering a specific question: “Which are the most valued data science skills?” Our team quickly established a rapport and communications channel on #slack then set to the task with gusto and shared responsibility. We chose as our leader Duubar Villalobos Jimenez, and under his coordination we assigned a variety of tasks and deadlines to accomplish our goal. Of course, data is required. We chose to collect Data Science salary data from the website Paysa, which lists a wide variety of tech postings, the skills associated with each posting and several salary components. Details follow, but our analysis shows the Machine Learning is the most-requested skill in our sample of 390 job listings, while the highest-valued skill, measured by mean compensation, was Strategy - reflecting the higher pay for management, leadership and vision in this rapidly evolving profession.

Team members

Name	Team	Email
Pavan Akula	Team 3	akulapavan@hotmail.com
Ambra Baboni Alexander	Team 3	ambra8due@hotmail.com
Thomas Detzel	Team 3	tomdetz@gmail.com
Dilip Ganesan	Team 3	dilipgan@gmail.com
Kyle Gilde	Team 3	kylegilde@gmail.com
Raghunathan Rammnath	Team 3	raghu74us@gmail.com
Duubar Villalobos Jimenez	Team 3	mydvtech@gmail.com

Our Process

Workspace preparation

Create vector with all needed libraries.

 load_packages <- c(
                    "knitr",
                    "RMySQL",
                    "tidyverse",
                    "tidyr",
                    "dplyr",
                    "stringr",
                    "plotly",
                    "htmlTable",
                    "stringr",
                    "prettydoc",
                    "shinythemes",
                    "treemap",
                    "data.tree",
                    "janitor",
                    "ggplot2",
                    "ggthemes",
                    "stats"
                  )

Organization and Communication

As a team, we had a brainstorm meetup in which some roles and lines of work were defined. Following are the most important agreements.

Github

We agreed to create a GitHub repository, D607-Group-Project. All team members had access and were able to post and read from a single repository location.

https://github.com/kylegilde/D607-Group-Project

Slack

We agreed to use Slack as our Team Collaboration platform. From Slack we were able to perform live meetups with “join.me”, which allows screen sharing and presentations to update and explore specific topics and problems. For example, we were able to look at code and discuss problems and refinements.

https://cuny-data607.slack.com

Google Docs

We created a spreadsheet in google docs to list deadlines and responsibilities.

https://docs.google.com/spreadsheets/d/1QNhmk6ebuFKYiyqrWJewhzT-PnrYgntZ09MC3yqpcx8/edit#gid=1903046315

Data

After some preliminary research and analysis, we decided to collect data from current data science job postings. We identified several web sites that offered raw information on position, location, company, salary and skills. In the end selected Paysa (https://www.paysa.com) because it offered the most comprehesive set of variables.

Limitations

Please note that this data has been collected from a single source and relates to job postings extant on March 14, 2017. No assumption should be made for past or future data science skills or any other conclusion we might end up for this project. A different sample from the same source will likely produce different results.

Collection

We were unable to find data in a table, csv file or other structured format. In addition, the Paysa proprietors declined our request to provide sample data for the project. We attempted to scrape the data, but Paysa has designed its web pages to prevent scraping. Because we did not require a huge sample of data for this process, we cut and pasted Paysa data into a text file that we then cleaned, organized and refined. We imported that raw text data into a SQL database for permanent storage, then exported it to R to conduct our analysys. The import-export process also was conducted via R. In total, we collected 390 job postings that listed 95 overall skills. Because some of those skills were the same but named differently, we collapsed the list into 35 unique skills and a miscellaneous group of skills that appeared fewer than 10 times in the data.

We named our raw text file paysa.txt and uploaded it to GitHub.

Data Preparation

Once we had our paysa.txt file with our desired information, we proceeded to extract valuable information from it and created a data frame. The code was shared among us to continue further cleaning and tidying.

Import

This code reads our raw scraped text into R from GitHub.

url <- "https://raw.githubusercontent.com/dilipganesan/D607-Group-Project/patch-1/scripts/paysa.txt"

mystring = read_file(url, locale=default_locale())

Sample of the input file.

## Data Scientist Job Results, showing 6K recent job postings
## Update Saved SearchNotification Settings
## 0%
## MATCH
## Head of SBG Data Science Engineering Logo
## Head of SBG Data Science Engineering
## Intuit
## EXPECTED
## $338K
## MARKET SALARY
## Base Salary$253K
## Annual Bonus$86K
## Signing Bonus$24K
## Annual Equity$0
## APPLY NOW
## Head of SBG Data Science Engineering at Intuit in Mountain View, CA
## How you match this job:
## Add more skills  to see how you match this job
## You can learn valuable new skills like: Distributed Systems, Big Data, Algorithms, Data Science, Strategy, Databases and more.
##   Jobs at Intuit      Jobs in Mountain View, CA
## 0%
## MATCH
## Principal Lead Data Scientist Logo
## Principal Lead Data Scientist
## Akamai Technologies
## EXPECTED
## $317K
## MARKET SALARY
## Base Salary$204K
## Annual Bonus$27K
## Annual Equity$86K
## Signing B

From text file to data frame

This code uses string handling to separate the text into an initial set of columns.

Sample of the initial data frame.

ID	applyposition	Base	Annual	signing	expected	skillset	location
1	APPLY NOW Head of SBG Data Science Engineering at Intuit in Mountain View, CA	Base Salary$253K	Annual Bonus$86K	Signing Bonus$24K	EXPECTED $338K	You can learn valuable new skills like: Distributed Systems, Big Data, Algorithms, Data Science, Strategy, Databases and more.	Jobs in Mountain View, CA
2	APPLY NOW Principal Lead Data Scientist at Akamai Technologies in Santa Clara, CA	Base Salary$204K	Annual Bonus$27K	Signing Bonus$18K	EXPECTED $317K	You can learn valuable new skills like: Hadoop, Data Mining, Machine Learning, Python, Matlab, Ruby and more.	Jobs in Santa Clara, CA
3	APPLY NOW Principal Lead Data Scientist at Akamai Technologies in Santa Clara, CA	Base Salary$204K	Annual Bonus$27K	Signing Bonus$18K	EXPECTED $317K	You can learn valuable new skills like: Hadoop, Data Mining, Big Data, Algorithms, Machine Learning, Python and more.	Jobs in Santa Clara, CA
4	APPLY NOW Director, Data Scientist at Dropbox in San Francisco, CA	Base Salary$183K	Annual Bonus$0	Signing Bonus$30K	EXPECTED $289K	You can learn valuable new skills like: Data Mining, Algorithms, Machine Learning, Data Science, SQL, Analytics and more.	Jobs in San Francisco, CA
5	APPLY NOW Principal Data Scientist at Microsoft in Bellevue, WA	Base Salary$200K	Annual Bonus$49K	Signing Bonus$19K	EXPECTED $289K	You can learn valuable new skills like: Hadoop, Data Mining, Optimization, Algorithms, MapReduce, C++ and more.	Jobs in Bellevue, WA
6	APPLY NOW Principal Data Scientist at Microsoft in Redmond, WA	Base Salary$198K	Annual Bonus$53K	Signing Bonus$19K	EXPECTED $287K	You can learn valuable new skills like: Algorithms, Machine Learning, C++, Python, Deep Learning, Data Science and more.	Jobs in Redmond, WA

MySQL

Next, we stored this initial data set on a remote MySQL server. We provided clear instructions for all team members about how to read tables into R.

The remote server setup is at MySQL URL mydvtech.com. We chose a commercial site because in our day-to-day work we will be reading and storing company data in company portals. This was good practice or all. Step-by-step instructions for connecting using MySQL server and cPanel were created in a manual distributed to the team.

http://rpubs.com/dvillalobos/confMySQLcPanel

Writing data into MySQL

This code connects to our remote MySQL server and writes a data frame into a table.

writeMySQLTable <- function(my.data = NULL, myLocalTableName = NULL){
  
  # Creating a schema if it doesn't exist by employing RMySQL() in R
  
  mydbconnection <- dbConnect(MySQL(), 
                  user = myLocalUser,
                  password = myLocalPassword,
                  host = myLocalHost)
  
  MySQLcode <- paste0("CREATE SCHEMA IF NOT EXISTS ",myLocalMySQLSchema,";",sep="")
  dbSendQuery(mydbconnection, MySQLcode)

  # Write our data frame into MySQL
  mydbconnection <- dbConnect(MySQL(), 
                  user = myLocalUser,
                  password = myLocalPassword,
                  host = myLocalHost,
                  dbname = myLocalMySQLSchema)
  
  myLocalTableName <- tolower(myLocalTableName)
  MySQLcode <- paste0("DROP TABLE IF EXISTS ",myLocalTableName,";",sep="")
  
  dbSendQuery(mydbconnection, MySQLcode)
  dbWriteTable(mydbconnection, name= myLocalTableName , value= my.data) 

  # Closing connection with local Schema
  dbDisconnect(mydbconnection)

  # Close all other open connections we might have
  lapply( dbListConnections( dbDriver( drv = "MySQL")), dbDisconnect)
}

Reading data from MySQL

For analysis and testing, we were encouraged to read from our MySQL instead of GitHub (our backup plan is we had problems with MySQL). One advantage of using a remote MySQL is the speed in terms of reading and transfering data; we noticed an incredible amount of resources employed when reading from our GitHub repository versus MySQL.

The following code connects to MySQL server and read the data stored into a table in a data frame into R.

readMySQLTable <- function(myLocalTableName = NULL){
  
  # Connecting to a schema by employing RMySQL() in R
  mydbconnection <- dbConnect(MySQL(), 
                  user = myLocalUser,
                  password = myLocalPassword,
                  host = myLocalHost,
                  dbname = myLocalMySQLSchema)

  # Check to see if our table exists? and read our data
  myLocalTableName <- tolower(myLocalTableName)
  if (dbExistsTable(mydbconnection, name = myLocalTableName)  == TRUE){
    slookup <- dbReadTable(mydbconnection, name = myLocalTableName)
  }

  # Closing connection with local Schema
  dbDisconnect(mydbconnection)

  #To close all open connections
  lapply( dbListConnections( dbDriver( drv = "MySQL")), dbDisconnect)
  
  return(slookup)
}

Tidying and transformation

We divided our tidying and transformation into several tasks as follows..

New cleaned table results.

ID	applyposition	Base	Annual	signing	expected	Skill1	Skill2	Skill3	Skill4	Skill5	Skill6	location
1	Head of SBG Data Science Engineering at Intuit in Mountain View, CA	253K	86K	24K	338K	Distributed Systems	Big Data	Algorithms	Data Science	Strategy	Databases	Mountain View, CA
2	Principal Lead Data Scientist at Akamai Technologies in Santa Clara, CA	204K	27K	18K	317K	Hadoop	Data Mining	Machine Learning	Python	Matlab	Ruby	Santa Clara, CA
3	Principal Lead Data Scientist at Akamai Technologies in Santa Clara, CA	204K	27K	18K	317K	Hadoop	Data Mining	Big Data	Algorithms	Machine Learning	Python	Santa Clara, CA
4	Director, Data Scientist at Dropbox in San Francisco, CA	183K		30K	289K	Data Mining	Algorithms	Machine Learning	Data Science	SQL	Analytics	San Francisco, CA
5	Principal Data Scientist at Microsoft in Bellevue, WA	200K	49K	19K	289K	Hadoop	Data Mining	Optimization	Algorithms	MapReduce	C++	Bellevue, WA
6	Principal Data Scientist at Microsoft in Redmond, WA	198K	53K	19K	287K	Algorithms	Machine Learning	C++	Python	Deep Learning	Data Science	Redmond, WA

Continuing with our clean up, we created new variable names, fixed dollar values and separated city and state.

Tidy table

Below is our tidy table with a total of 2217 skills. (In a subsequent step, we collapse repetitive skills (i.e., “C” and “C++”.)

ID	Position	Company	City	State	Skills	Type	Salary
1	Head of SBG Data Science Engineering	Intuit	Mountain View	CA	Distributed Systems	Base Salary	253000
2	Principal Lead Data Scientist	Akamai Technologies	Santa Clara	CA	Hadoop	Base Salary	204000
3	Principal Lead Data Scientist	Akamai Technologies	Santa Clara	CA	Hadoop	Base Salary	204000
4	Director, Data Scientist	Dropbox	San Francisco	CA	Data Mining	Base Salary	183000
5	Principal Data Scientist	Microsoft	Bellevue	WA	Hadoop	Base Salary	200000
6	Principal Data Scientist	Microsoft	Redmond	WA	Algorithms	Base Salary	198000

Analysis

Skills

From our gathered data we have a total of 72 skills.

Table: Data Science Frequency Skills Ranked
Skills	Count	Percentage	Rank
Machine Learning	260	11.73 %	1
Data Science	207	9.34 %	2
Algorithms	183	8.25 %	3
Hadoop	182	8.21 %	4
Big Data	177	7.98 %	5
Python	123	5.55 %	6
Data Mining	119	5.37 %	7
Optimization	97	4.38 %	8
Analytics	85	3.83 %	9
C++	71	3.2 %	10
Management	61	2.75 %	11
SQL	56	2.53 %	12
Statistics	53	2.39 %	13
Matlab	40	1.8 %	14
Scala	38	1.71 %	15
Product Management	33	1.49 %	16
MapReduce	31	1.4 %	17
Strategy	27	1.22 %	18
Architectures	22	0.99 %	19
Technical Leadership	19	0.86 %	20
Deep Learning	18	0.81 %	22
Distributed Systems	18	0.81 %	22
Information Retrieval	18	0.81 %	22
AWS	17	0.77 %	24
ETL	16	0.72 %	25
Relational Databases	15	0.68 %	26
User Experience	14	0.63 %	28
Windows	14	0.63 %	28
Java	13	0.59 %	30
Ruby	13	0.59 %	30
Scalability	13	0.59 %	30
REST	11	0.5 %	32
Computer Vision	10	0.45 %	34
Leadership	10	0.45 %	34
Software Design	10	0.45 %	34
Apache Spark	8	0.36 %	36
Databases	8	0.36 %	36
C	7	0.32 %	39
Search	7	0.32 %	39
Time Series Analysis	7	0.32 %	39
Architecture	6	0.27 %	42
Automation	6	0.27 %	42
Natural Language Processing	6	0.27 %	42
PHP	6	0.27 %	42
Android	5	0.23 %	46
Game Development	5	0.23 %	46
OS X	5	0.23 %	46
Mathematics	4	0.18 %	48
Scripting	4	0.18 %	48
Business Intelligence	3	0.14 %	52
Cassandra	3	0.14 %	52
Functional Programming	3	0.14 %	52
Go	3	0.14 %	52
MySQL	3	0.14 %	52
Enterprise Software	2	0.09 %	58
Image Processing	2	0.09 %	58
LAMP	2	0.09 %	58
Recommender Systems	2	0.09 %	58
Signal Processing	2	0.09 %	58
Tomcat	2	0.09 %	58
Algorithm Design	1	0.05 %	66
Data Science Scripting	1	0.05 %	66
EMPTY	1	0.05 %	66
Engineering Management	1	0.05 %	66
Firewalls	1	0.05 %	66
HTTP	1	0.05 %	66
Mathematical Modeling	1	0.05 %	66
Network Architecture	1	0.05 %	66
Optimization Data Science	1	0.05 %	66
Product Design Data Science	1	0.05 %	66
Test Driven Development	1	0.05 %	66
Web Services	1	0.05 %	66

Top 10 most desired skills by employers, by count of raw skill names.

Compensation (Exploratory Analysis)

The Paysa data lists Base Salary, Annual Bonus, Signing Bonus and a total called Expected Salary, providing a way to another way to measure value beyond raw counts.

Tree Analysis

Another way of Analysing the above data is by performing a tree analysis. This tree shows Total Salary for jobs associated with Machine Learning skills in Washington and California.

Following are the highest-paid jobs in the data.

Total Salary

Table

Table: Companies and job offerings with the top 10 Total Salaries
Position	Company	City	State	Type	Salary	Rank
Head of SBG Data Science Engineering	Intuit	Mountain View	CA	Expected Salary	338000	1
Principal Data Science Manager	Microsoft	Redmond	WA	Expected Salary	322000	2
Principal Lead Data Scientist	Akamai Technologies	Santa Clara	CA	Expected Salary	317000	3
Data Science and Analytics Lead, Global Revenue Acceleration	Google	Mountain View	CA	Expected Salary	305000	4
Director, Data Scientist	Dropbox	San Francisco	CA	Expected Salary	289000	5
Principal Data Scientist	Microsoft	Bellevue	WA	Expected Salary	289000	6
Principal Data Scientist	Microsoft	Redmond	WA	Expected Salary	287000	7
Director	Unknown	Vienna	VA	Expected Salary	285000	8
Sr. Principal Data Scientist (A/B Platform)	Coupang	Palo Alto	CA	Expected Salary	281000	9
Manager of Data Science	Yelp	San Francisco	CA	Expected Salary	278000	10

Chart

Top paid skills

The below combination of desired skills by employers will generate the top Total Salary of $338000.

Table: Top paid skills by top paid Total Salaries
Top Paid Skills	Listed in top most desired skills by employers
Distributed Systems	FALSE
Big Data	TRUE
Algorithms	TRUE
Data Science	TRUE
Strategy	FALSE
Databases	FALSE

The below tree will present the top two highest-paid salaries and respective skills.

Signing Bonus

Table

Table: Companies and job offerings with the top 10 Signing Bonus
Position	Company	City	State	Type	Salary	Rank
Principal Data Science Engineer	OpenTable	San Francisco	CA	Signing Salary	43000	1
Data Scientist, Population and Survey Sciences	Facebook	Menlo Park	CA	Signing Salary	42000	2
Data Scientist, Infrastructure	Facebook	Menlo Park	CA	Signing Salary	42000	3
Data Scientist- Consumer Insights	Facebook	Menlo Park	CA	Signing Salary	42000	4
Data Visualization Scientist	Facebook	Menlo Park	CA	Signing Salary	42000	5
Data Scientist	Facebook	Menlo Park	CA	Signing Salary	42000	6
Data Scientist, Auction & Delivery	Facebook	Menlo Park	CA	Signing Salary	42000	7
Software Development Mgr - Advertising Analytics and Data Science	Amazon	Seattle	WA	Signing Salary	40000	8
Infrastructure Data Scientist & Strategy Analyst	Facebook	Menlo Park	CA	Signing Salary	40000	9
Senior Data Scientist	Amazon	Seattle	WA	Signing Salary	39000	10

Chart

Top paid skills

The below combination of desired skills by employers will generate the top Signing Bonus of $43000.

Table: Top paid skills by top paid Signing Bonus
Top Paid Skills	Listed in top most desired skills by employers
Product Management	FALSE
Hadoop	TRUE
Software Design	FALSE
Information Retrieval	FALSE
Machine Learning	TRUE
ETL	FALSE

The below tree will present the top two highest-paid signing bonus and respective skills.

Annual Bonus

Table

Table: Companies and job offerings with the top 10 Annual Bonus
Position	Company	City	State	Type	Salary	Rank
Head of SBG Data Science Engineering	Intuit	Mountain View	CA	Annual Salary	86000	1
Principal Data Science Manager	Microsoft	Redmond	WA	Annual Salary	59000	2
Principal Data Scientist	Microsoft	Redmond	WA	Annual Salary	53000	3
Principal Data Scientist	Microsoft	Bellevue	WA	Annual Salary	49000	4
Principal Data Scientist Architect	SAP	Palo Alto	CA	Annual Salary	49000	5
Data Science and Analytics Lead, Global Revenue Acceleration	Google	Mountain View	CA	Annual Salary	48000	6
Principal Data Science Engineer	OpenTable	San Francisco	CA	Annual Salary	46000	7
Legal- Firmwide Initiatives Team OLO Data Scientist- VP	JPMorgan Chase	New York	NY	Annual Salary	46000	8
CIB-Rapid Prototyping Data Scientist	Unknown	New York	NY	Annual Salary	46000	9
Director	Unknown	Vienna	VA	Annual Salary	45000	10

Chart

Top paid skills

The below combination of desired skills by employers will generate the top Annual Bonus of $86000.

Table: Top paid skills by top paid Annual Bonus
Top Paid Skills	Listed in top most desired skills by employers
Distributed Systems	FALSE
Big Data	TRUE
Algorithms	TRUE
Data Science	TRUE
Strategy	FALSE
Databases	FALSE

The below tree will present the top two highest paid salaries and respective skills.

Base Salary

Table

Table: Companies and job offerings with the top 10 Base Salaries
Position	Company	City	State	Type	Salary	Rank
Head of Data Science and Engineering	Amazon	Seattle	WA	Base Salary	265000	1
Head of SBG Data Science Engineering	Intuit	Mountain View	CA	Base Salary	253000	2
Director	Unknown	Vienna	VA	Base Salary	240000	3
Principal Lead Data Scientist	Akamai Technologies	Santa Clara	CA	Base Salary	204000	4
Head of Data Science, Liquidity	First Republic Bank	San Francisco	CA	Base Salary	202000	5
Principal Data Scientist	Microsoft	Bellevue	WA	Base Salary	200000	6
Chief Data Scientist, Brilliant Manufacturing Job	GE	Seattle	WA	Base Salary	199000	7
Principal Data Scientist	Microsoft	Redmond	WA	Base Salary	198000	8
Corporate - Firmwide Forecasting & Analysis - Data Scientist/Engineer, Vice President	JPMorgan Chase	New York	NY	Base Salary	196000	9
Data Scientist, State Street Global Exchange, Vice President	State Street Corporation	New York	NY	Base Salary	195000	10

Chart

Top paid skills

The below combination of desired skills by employers will generate the top Base Salary of $265000.

Table: Top paid skills by top paid Base Salaries
Top Paid Skills	Listed in top most desired skills by employers
Product Management	FALSE
Machine Learning	TRUE
Data Science	TRUE
Analytics	FALSE
Statistics	FALSE

The below tree will present the top two highest-paid base salaries and respective skills.

Combined Table of Top paid Skills

The table below displays which skills associate with the highest-paid compensation categories. For example, Data Science, Algorithms and Big Data are associated with the highest Expected (total) Salary. Skills in Machine Learning and Hadoop are associated with the highest Signing Bonuses.

Table: Top paid skills with high skills on demand
Rank	Skills	Total Salary	Signing Bonus	Annual Bonus	Base Salary
1	Machine Learning		TRUE		TRUE
2	Data Science	TRUE		TRUE	TRUE
3	Algorithms	TRUE		TRUE
4	Hadoop		TRUE
5	Big Data	TRUE		TRUE
6	Python
7	Data Mining
8	Optimization
9	Analytics
10	C++

Maps

Open Positions by State

Open Positions by City

Highest-Valued Skills Measured by Mean Compensation

Some of the highest-valued skills are not the most common skills. They include Strategy, Leadership, Management and Data Science, a catch-all. ETL – for Extract, Transfer and Load – is a critical area of data warehousing.

This part of the analysis looks at the value of skills based on what employers pay rather the frequency of skills in a job posting. To do this, we compute a mean value for each skill across the database.

For example, the job ‘Principle Lead Data Scientist’ at Akamai is associated with six skills: Hadoop, Data Mining, Machine Learning, Python, Matlab and Ruby. The total compensation this job is $317,000. To value each job, we divide total compensation by 6 to get $52,833. We do similar computation for skills in each job, then calculate the overall mean of those values across all jobs for each skill. We then plot those values to rank skills in descending order.

Relative Value of Skills

Using ANOVA, we can compute how much a particular skill adds or subtracts from the mean Expected Salary for Algorithms, the reference level, all other things equal.

For example, the reference mean compensation for Algorithms is $30,323. Having ETL skills adds $9,283 to that mean; Matlab skills are worth $1,282 less.

The chart summarizes the skill values relative to the Algorithm baseline.

Table: Expected Salary for Algorithms
Skill	Adjustment
Algorithm (reference)	30323
Analytics	748
Architecture	-2114
AWS	-1342
Big Data	-1182
C++	-64
Computer Vision	-2706
Data Mining	220
Data Science	3037
Deep Learning	-2054
Distributed Systems	2631
ETL	9283
Hadoop	-1953
Information Retrieval	1972
Java	-1051
Machine Learning	-401
Management	418
MapReduce	-1317
Matlab	-1282
Misc	7587
Optimization	-224
Product Management	-555
Python	1492
Relational Databases	-1556
REST	-3414
Ruby	-3079
Scala	-3665
Scalability	-2002
Software Design	-839
SQL	7387
Statistics	1038
Strategy	13775
Technical Leadership	2340
User Experience	3296
Windows	-454

The below chart, display a salary weight composition for each desired skills by employers.

Conclusion

We were asked to work as a team to answer the the question: What are the most valued data science skills? Our examination of compensation data for Data Science jobs on the Paysa website looked at “value” in as a function of the frequency of skills advertised and determined that skills such as Machine Learning, Big Data and Algorithms ranked in the top 10. In terms of mean compensation, top skills included expertise in Strategy, ETL, SQL and User Experience. There are many possible ways to value skills; our study suggests that more data and additional refinement of skills into appropriate categories would provide a more confident assessment of the most-valued skills.

Our experience also shows the benefits of working in a collaborative environment, where team members can readily learn from each other and contribute ideas, creating synergy and improving the results. Teams also benefit from strong leadership that guides while giving team members the opportunity to succeed and sometimes fail, but always reach for improvement and professional growth.