During early research,
We found this study
https://365datascience.com/research-into-1001-data-scientist-profiles/
imgage <- "journal.png" include_graphics(imgage)
10/22/2019
During early research,
We found this study
https://365datascience.com/research-into-1001-data-scientist-profiles/
imgage <- "journal.png" include_graphics(imgage)
As a team we saw similiar findings that fits the critia needed to be calculated in order to complete this project.
However, as a team, we needed some sort of data pool to start calulating perameters to answer a research question(s).
With some futher research, we found a similar project that wanted to answer a similar question.
-https://www.kaggle.com/discdiver/the-most-in-demand-skills-for-data-scientists
imgage <- "kaggle.png" include_graphics(imgage)
imgage <- "kaggle_data.png" include_graphics(imgage)
general_skills <- read_csv(file="https://raw.githubusercontent.com/josephsimone/DATA_607_Project_3/master/kaggle_ds_general_skills_revised.csv")
## Parsed with column specification: ## cols( ## Keyword = col_character(), ## LinkedIn = col_number(), ## Indeed = col_number(), ## SimplyHired = col_number(), ## Monster = col_number() ## )
head(general_skills)
## # A tibble: 6 x 5 ## Keyword LinkedIn Indeed SimplyHired Monster ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 machine learning 5701 3439 2561 2340 ## 2 analysis 5168 3500 2668 3306 ## 3 statistics 4893 2992 2308 2399 ## 4 computer science 4517 2739 2093 1900 ## 5 communication 3404 2344 1791 2053 ## 6 mathematics 2605 1961 1497 1815
software_skills <-read_csv(file = "https://raw.githubusercontent.com/josephsimone/DATA_607_Project_3/master/kaggle_ds_job_listing_software.csv")
## Parsed with column specification: ## cols( ## Keyword = col_character(), ## LinkedIn = col_number(), ## Indeed = col_number(), ## SimplyHired = col_number(), ## Monster = col_number(), ## `LinkedIn %` = col_character(), ## `Indeed %` = col_character(), ## `SimplyHired %` = col_character(), ## `Monster %` = col_character(), ## `Avg %` = col_character(), ## `GlassDoor Self Reported % 2017` = col_character(), ## Difference = col_character() ## )
head(software_skills)
## # A tibble: 6 x 12 ## Keyword LinkedIn Indeed SimplyHired Monster `LinkedIn %` `Indeed %` ## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> ## 1 Python 6347 3818 2888 2544 74% 74% ## 2 R 4553 3106 2393 2365 53% 60% ## 3 SQL 3879 2628 2056 1841 45% 51% ## 4 Spark 2169 1551 1167 1062 25% 30% ## 5 Hadoop 2142 1578 1164 1200 25% 31% ## 6 Java 1944 1377 1059 1002 23% 27% ## # ... with 5 more variables: `SimplyHired %` <chr>, `Monster %` <chr>, ## # `Avg %` <chr>, `GlassDoor Self Reported % 2017` <chr>, ## # Difference <chr>
What are the most important Techology based skill and Soft Skill, respectively, when hiring a Data Scientist?
Overall, what is the most sought after skill for a Data Scientist?
dt1 <- as_tibble(general_skills, key = "ï..Keyword") dt1$Skill_Set <- "Soft" dt2 <- as_tibble(software_skills, key = "ï..Keyword") dt2$Skill_Set <- "Tech" dt2 <- select(dt2,-c(6,7,8,9,10,11,12))
result<-rbind(dt1, dt2) names(result)[names(result) == "ï..Keyword"] <- "Keyword" head(result)
## # A tibble: 6 x 6 ## Keyword LinkedIn Indeed SimplyHired Monster Skill_Set ## <chr> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 machine learning 5701 3439 2561 2340 Soft ## 2 analysis 5168 3500 2668 3306 Soft ## 3 statistics 4893 2992 2308 2399 Soft ## 4 computer science 4517 2739 2093 1900 Soft ## 5 communication 3404 2344 1791 2053 Soft ## 6 mathematics 2605 1961 1497 1815 Soft
ds_skills <- as.data.frame(result) ds_skills$Total <- rowSums(sapply(ds_skills[,c(2:5)], as.numeric)) ds_skills <- mutate(ds_skills[, c(1,6,2:5,7)]) ds_skills <- data.frame(ds_skills) colnames(ds_skills)
## [1] "Keyword" "Skill_Set" "LinkedIn" "Indeed" "SimplyHired" ## [6] "Monster" "Total"
head(ds_skills)
## Keyword Skill_Set LinkedIn Indeed SimplyHired Monster Total ## 1 machine learning Soft 5701 3439 2561 2340 14041 ## 2 analysis Soft 5168 3500 2668 3306 14642 ## 3 statistics Soft 4893 2992 2308 2399 12592 ## 4 computer science Soft 4517 2739 2093 1900 11249 ## 5 communication Soft 3404 2344 1791 2053 9592 ## 6 mathematics Soft 2605 1961 1497 1815 7878
These two combined data-sets used the amount of keywords present during a general search on well known job recruitment websites, to fill in its' values. As a team, we thought it was more effective to create an average out of these values to get a better understanding of the dataset.
LinkedIn <- 8610 Indeed <- 5138 SimplyHired <- 3829 Monster <- 3746 Total <- LinkedIn + Indeed + SimplyHired + Monster ds_skills$LinkedIn <- ((ds_skills$LinkedIn)/(LinkedIn))*100 ds_skills$Indeed <- ((ds_skills$Indeed)/(Indeed))*100 ds_skills$SimplyHired <- ((ds_skills$SimplyHired)/(SimplyHired))*100 ds_skills$Monster <- ((ds_skills$Monster)/((Monster)))*100 ds_skills$Total <- ((ds_skills$Total/((Total*100))))*100
Set up for Bar Graphs
ds_skills_soft <- filter(ds_skills, ds_skills$Skill_Set == "Soft") dim(ds_skills_soft)
## [1] 15 7
head(ds_skills_soft)
## Keyword Skill_Set LinkedIn Indeed SimplyHired Monster ## 1 machine learning Soft 66.21370 66.93266 66.88430 62.46663 ## 2 analysis Soft 60.02323 68.11989 69.67877 88.25414 ## 3 statistics Soft 56.82927 58.23278 60.27683 64.04164 ## 4 computer science Soft 52.46225 53.30868 54.66179 50.72077 ## 5 communication Soft 39.53542 45.62086 46.77461 54.80513 ## 6 mathematics Soft 30.25552 38.16660 39.09637 48.45168 ## Total ## 1 0.6584908 ## 2 0.6866764 ## 3 0.5905360 ## 4 0.5275524 ## 5 0.4498429 ## 6 0.3694602
ds_skills_soft <- arrange(ds_skills_soft, desc(Total)) top_10_soft <- ds_skills_soft[1:10,c(1,3:7)] top_10_soft
## Keyword LinkedIn Indeed SimplyHired Monster Total ## 1 analysis 60.02323 68.11989 69.67877 88.25414 0.6866764 ## 2 machine learning 66.21370 66.93266 66.88430 62.46663 0.6584908 ## 3 statistics 56.82927 58.23278 60.27683 64.04164 0.5905360 ## 4 computer science 52.46225 53.30868 54.66179 50.72077 0.5275524 ## 5 communication 39.53542 45.62086 46.77461 54.80513 0.4498429 ## 6 mathematics 30.25552 38.16660 39.09637 48.45168 0.3694602 ## 7 visualization 21.82346 27.50097 30.11230 32.22104 0.2650659 ## 8 AI composite 18.21138 21.89568 21.18046 18.33956 0.1965483 ## 9 deep learning 15.21487 19.05411 17.62862 16.17726 0.1674248 ## 10 NLP composite 14.07666 17.71117 17.23688 15.53657 0.1577639
Top Ten Non-Technical skills as a percentage of total Data Science jobs
p
Top 5 Technical Skills for Data Scientists
Data transformations for visualization
ds_skills_tech <- filter(ds_skills, ds_skills$Skill_Set == "Tech") #Arrange data from largest to smallest by the Totals column ds_skills_tech <- arrange(ds_skills_tech, desc(Total)) #Filter out only the Top 10 technical skills top_10_tech <- ds_skills_tech[1:10,c(1,3:7)]
Top Ten Technical skills as a percentage of total Data Science jobs
x
Top 10 Tech Skills as an average pecentage of total data science jobs by job site
top_10_tech_visual <- data.frame("Keyword"=top_10_tech, top_10_tech) data <- top_10_tech_visual[,c('Keyword', 'LinkedIn', 'Indeed', 'SimplyHired', 'Monster', 'Total')]
s
Top 10 Non-Tech SKills as an average pecentage of total data science jobs by job site
top_10_soft_visual <- data.frame("Keyword"=top_10_soft, top_10_soft)
data2 <- top_10_soft_visual[,c('Keyword', 'LinkedIn', 'Indeed', 'SimplyHired', 'Monster', 'Total')]
t
head(ds_skills_tech, 2)
## Keyword Skill_Set LinkedIn Indeed SimplyHired Monster Total ## 1 Python Tech 73.71661 74.30907 75.42439 67.91244 0.7314637 ## 2 R Tech 52.88037 60.45154 62.49674 63.13401 0.5823289
head(ds_skills_soft, 2)
## Keyword Skill_Set LinkedIn Indeed SimplyHired Monster ## 1 analysis Soft 60.02323 68.11989 69.67877 88.25414 ## 2 machine learning Soft 66.21370 66.93266 66.88430 62.46663 ## Total ## 1 0.6866764 ## 2 0.6584908
There are many skills and skill-sets that employers are looking for when hiring a data-scientist. Given that Data Science is amultidisciplinary field, it is only fitting that there are both Techology based skills and Soft skills that companies look for in a data science.
The top Techology skill that employers are looking for when hiring a data scientist, according to this dataset, is Python.
The top Soft skill that employers are looking for when hiring a data scientist, according to this dataset, is Analysis.
The top Overall skill that employers are looking for when hiring a data scientist, according to this dataset, is Python.