10/20/2019

Impossible to answer precisely
Skills broadly fall into categories
Data Scientists need Domain Expertise to answer questions
SELECT Id, TagName, Count from Tags ORDER BY Count DESC;
## TagName Count ## Length:502 Min. : 1.0 ## Class :character 1st Qu.: 7.0 ## Mode :character Median : 22.0 ## Mean : 113.2 ## 3rd Qu.: 67.0 ## Max. :6296.0
Common Skills we will see again in Dice.com analysis. Unique Skills to only DataScience Exchange Tags.
We used a Jupyter Python notebook to scrape web data:
This data was significantly less tidy than the DataScience Exchange tags. After basic cleanup including removing ampersand-hex codes, punctuation, and obviously non-skill words such as “and” and “or”, the following observations can be made.
There are 766 unique tags which range in frequency from 1 to 159. Of these, 59 appear more than 5 times and 1 more than 100 times.
## skills N ## Length:766 Min. : 1.000 ## Class :character 1st Qu.: 1.000 ## Mode :character Median : 1.000 ## Mean : 2.602 ## 3rd Qu.: 2.000 ## Max. :159.000
Common Skills we saw in DataScience Exchange Tags. Unique Skills to only Dice.com Job Skills.
The following skills are represented in both the top 10 DataScience Exchange tags and the Dice.com job skills:
The following are unique to the top 10 DataScience exchange tags:
and conversely these are unique to the top 10 Dice.com job skills:
In November of 2018, Jeff Hale posted an entry on the KDNuggets blog where he described his findings based on a job-listing analysis performed against LinkedIn, Indeed, SimplyHired, and AngelList on October 10, 2018.
His findings, shown on the next slide, reinforce that the most requested skills are the analytical ones: computer science, analysis, statistics, and machine learning as examples. However, there are a number of “softer” skills requested, such as communications and visualization.
It should be noted that those would not necessarily be found as questions on DataScience Exchange.
In April of 2018, Michael Li, VP of Data at Coinbase, posted a piece on LinkedIn where he listed his main desired skills for new data science hires.
Data wrangling / Munging / Manipulation
Experiment Design and A/B testing
Statistical Modeling / Machine learning
Soft Skills
Case studies and problem-solving
What is important about Li’s piece is its focus on the “softer” skills of the data scientist. Of the five headings, only two would be considered classic “hard” data science: Data wrangling and Statistical modeling.
While one could make the case that Experimental Design is rigorous as in the work of Fisher, Neyman, and Pearson, it is a skill which does not receive enough mention in many discussions.
Li makes it clear that he views developing good case studies, visualizations, after-action summaries, communications, and persuasion skills as key in becoming a good data scientist. He concludes his section of soft-skills with:
Ultimately, the goal is to take the insights generated from the analysis and effectively influence critical decision-making, which drives business impact. The “hard skills” and “soft skills” need to work together for the success of a data scientist.