Skills of a Data Scientist

Motivation

Our motivation for this study is to gain an understanding of which skills are the most useful for a data scientist to have so that we can plan what courses to take in our Master’s program.

Approach

To answer this question scrape data scientist job listings on dice.com and extract the skills listed on the postings.

  • 453 “Data Scientist” Job Postings
  • Scraped 15 October 2018

Data Sources

Data Acquisition and Processing

  • Data was scraped in two stages: First search results URLs were scrapped using Selinum and stored in MySQL table. Next the pages were scraped asynchronously and stored in the MySQL table.
  • Skills were extracted from JSON objects found in the JavaScript.
  • We used the cSplit function to parse out the strings by a single character separator I used : , ; , -
  • We used this in combination with the gather function to bring the columns back together after splitting them by these characters

Findings

What Are The Key Skills?

Top 20 Data Science Skills

Limitations & Workarounds

  • Data only for “Data Scientist” positions. May have excluded data scientist positions under a different name.
  • We only scraped the skills portion of the job posting. Additional skills could be found in the full job description section.
  • While we were able to clean the data and separate out all of the skills, we still needed to manually comb through the terms and categorize them.

Things to Consider

  • There were some vague skills listed, which makes us wonder how accurately data scientist job postings are articulated. Some examples are “Algorithm” or “Development” or even “Programming”. Intuitively, these don’t give much insight into what tools are being used, nor what the responsibilities will be for the future data scientist.

Corey Arnouts, Mike Silva, Michael Yampol, Elizabeth Drikman

October 21, 2018