What are the most valued data science skills?
The purpose of the project was to effectively collaborate on acquiring appropriate datasets then tidying and transforming to analyze and visualize the dataset in effort to answer the questions.
Data Source:
“Data Scientist Jobs” from Kaggle (url: https://www.kaggle.com/andrewmvd/data-scientist-jobs) that contains information on Job title, Salary and Description which would explain what skills are highly desired.
“The most in Demand Skills for Data Scientists” from Kaggle (url: https://www.kaggle.com/discdiver/the-most-in-demand-skills-for-data-scientists/data) that contains general data scientists’ skills that are desired by the employers.
1. Acquiring Data : Finding the appropriate dataset and uploading it to Github as a csv file so that they can be read into rstudio for tidying and transforming.
2. Tidy and Transform : Using numerous r functions and packages such as dplyr and tidytext, data frames containing relevant information was extracted for analysis.
3. Visualize and Analysis : The plots were generated regarding general technical skills and desired programming languages using ggplot packages.
4. Conclusion : General insight and conclusion were drawn from the data.
Collaborators on this project are as follows:
Hazal Gunduz
Chunjie Nan
Jiho Kim
We used the following technologies to collaborate:
Slack
Google Docs
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidytext)
library(ggplot2)
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
There are many skills and skill-sets that employers are looking for when hiring a data-scientist. And, it’s necessary to know the best 9-10 languages. SQL, Python, and R were the most desired programming languages for data scientists and shiny and plotly were the most popular R libraries among the data scientists. When we look at the most desired general skilled graph, top skills are analysis, machine learning, and statistics followed by computer science indicating that most valuable skills for a data scientist are strong technical skills.