This is a project for your entire class section to work on together, since being able to work effectively on a virtual team is a key “soft skill” for data scientists. Please note especially the requirement about making a presentation during our first meetup after the project is due.
W. Edwards Deming said, “In God we trust, all others must bring data.” Please use data to answer the question, “Which are the most valued data science skills?” Consider your work as an exploration; there is not necessarily a “right answer.”
When researching for this project, this study https://365datascience.com/research-into-1001-data-scientist-profiles/, has findings that fits the critia needed to be calculated in order to complete this project. However, as a team, we needed some sort of data pool to start calulating perameters to answer a research question(s).
imgage <- "C:/Users/jpsim/Documents/DATA Acquisition and Management/journal.png"
include_graphics(imgage)
With some futher research, we found a similar project that wanted to answer a similar question.
https://www.kaggle.com/discdiver/the-most-in-demand-skills-for-data-scientists
imgage <- "C:/Users/jpsim/Documents/DATA Acquisition and Management/kaggle.png"
include_graphics(imgage)
For this project we used the two main data-sets from the above kaggle web-page to answer our research question(s)
kaggle_ds_job_listing_software.csv & kaggle_ds_general_skills_revised.csv
imgage <- "C:/Users/jpsim/Documents/DATA Acquisition and Management/kaggle_data.png"
include_graphics(imgage)
general_skills <- read_csv(file="https://raw.githubusercontent.com/josephsimone/DATA_607_Project_3/master/kaggle_ds_general_skills_revised.csv")
## Parsed with column specification:
## cols(
## Keyword = col_character(),
## LinkedIn = col_number(),
## Indeed = col_number(),
## SimplyHired = col_number(),
## Monster = col_number()
## )
head(general_skills)
## # A tibble: 6 x 5
## Keyword LinkedIn Indeed SimplyHired Monster
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 machine learning 5701 3439 2561 2340
## 2 analysis 5168 3500 2668 3306
## 3 statistics 4893 2992 2308 2399
## 4 computer science 4517 2739 2093 1900
## 5 communication 3404 2344 1791 2053
## 6 mathematics 2605 1961 1497 1815
software_skills <-read_csv(file = "https://raw.githubusercontent.com/josephsimone/DATA_607_Project_3/master/kaggle_ds_job_listing_software.csv")
## Parsed with column specification:
## cols(
## Keyword = col_character(),
## LinkedIn = col_number(),
## Indeed = col_number(),
## SimplyHired = col_number(),
## Monster = col_number(),
## `LinkedIn %` = col_character(),
## `Indeed %` = col_character(),
## `SimplyHired %` = col_character(),
## `Monster %` = col_character(),
## `Avg %` = col_character(),
## `GlassDoor Self Reported % 2017` = col_character(),
## Difference = col_character()
## )
head(software_skills)
## # A tibble: 6 x 12
## Keyword LinkedIn Indeed SimplyHired Monster `LinkedIn %` `Indeed %`
## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 Python 6347 3818 2888 2544 74% 74%
## 2 R 4553 3106 2393 2365 53% 60%
## 3 SQL 3879 2628 2056 1841 45% 51%
## 4 Spark 2169 1551 1167 1062 25% 30%
## 5 Hadoop 2142 1578 1164 1200 25% 31%
## 6 Java 1944 1377 1059 1002 23% 27%
## # ... with 5 more variables: `SimplyHired %` <chr>, `Monster %` <chr>,
## # `Avg %` <chr>, `GlassDoor Self Reported % 2017` <chr>,
## # Difference <chr>
As a team, we thought it be best to join these two data-sets, cleaning the respective data-frames into similar formats to be merged.
dt1 <- as_tibble(general_skills, key = "ï..Keyword")
dt1$Skill_Set <- "Soft"
dt2 <- as_tibble(software_skills, key = "ï..Keyword")
dt2$Skill_Set <- "Tech"
dt2 <- select(dt2,-c(6,7,8,9,10,11,12))
result<-rbind(dt1, dt2)
head(result)
## # A tibble: 6 x 6
## Keyword LinkedIn Indeed SimplyHired Monster Skill_Set
## <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 machine learning 5701 3439 2561 2340 Soft
## 2 analysis 5168 3500 2668 3306 Soft
## 3 statistics 4893 2992 2308 2399 Soft
## 4 computer science 4517 2739 2093 1900 Soft
## 5 communication 3404 2344 1791 2053 Soft
## 6 mathematics 2605 1961 1497 1815 Soft
names(result)[names(result) == "ï..Keyword"] <- "Keyword"
write.csv(result, file = "joined_df2.csv",row.names=FALSE)
#ds_skills <- read_csv("joined_df2.csv")
ds_skills <- read_csv(file = "https://raw.githubusercontent.com/josephsimone/DATA_607_Project_3/master/joined_df2.csv")
## Parsed with column specification:
## cols(
## Keyword = col_character(),
## LinkedIn = col_number(),
## Indeed = col_number(),
## SimplyHired = col_number(),
## Monster = col_number(),
## Skill_Set = col_character()
## )
head(ds_skills)
## # A tibble: 6 x 6
## Keyword LinkedIn Indeed SimplyHired Monster Skill_Set
## <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 AI composite 1568 1125 811 687 Soft
## 2 analysis 5168 3500 2668 3306 Soft
## 3 communication 3404 2344 1791 2053 Soft
## 4 computer science 4517 2739 2093 1900 Soft
## 5 data engineering 514 339 276 200 Soft
## 6 deep learning 1310 979 675 606 Soft
ds_skills$Total <- rowSums(sapply(ds_skills[,c(2:5)], as.numeric))
ds_skills <- mutate(ds_skills[, c(1,6,2:5,7)])
ds_skills <- data.frame(ds_skills)
colnames(ds_skills)
## [1] "Keyword" "Skill_Set" "LinkedIn" "Indeed" "SimplyHired"
## [6] "Monster" "Total"
These two combined data-sets used the amount of keywords present during a general search on well known job recruitment websites, to fill in its’ values. As a team, we thought it was more effective to create an average out of these values to get a better understanding of the dataset.
LinkedIn <- 8610
Indeed <- 5138
SimplyHired <- 3829
Monster <- 3746
Total <- LinkedIn + Indeed + SimplyHired + Monster
ds_skills$LinkedIn <- ((ds_skills$LinkedIn)/(LinkedIn))*100
ds_skills$Indeed <- ((ds_skills$Indeed)/(Indeed))*100
ds_skills$SimplyHired <- ((ds_skills$SimplyHired)/(SimplyHired))*100
ds_skills$Monster <- ((ds_skills$Monster)/((Monster)))*100
ds_skills$Total <- ((ds_skills$Total/((Total))))*100
dim(ds_skills)
## [1] 52 7
head(ds_skills, 52)
## Keyword Skill_Set LinkedIn Indeed SimplyHired
## 1 AI composite Soft 18.211382 21.895679 21.180465
## 2 analysis Soft 60.023229 68.119891 69.678767
## 3 communication Soft 39.535424 45.620864 46.774615
## 4 computer science Soft 52.462253 53.308680 54.661792
## 5 data engineering Soft 5.969803 6.597898 7.208148
## 6 deep learning Soft 15.214866 19.054107 17.628624
## 7 machine learning Soft 66.213705 66.932659 66.884304
## 8 mathematics Soft 30.255517 38.166602 39.096370
## 9 neural networks Soft 7.793264 9.439471 10.995038
## 10 NLP composite Soft 14.076655 17.711172 17.236876
## 11 project management Soft 5.528455 7.726742 8.618438
## 12 software development Soft 8.501742 12.203192 12.562027
## 13 software engineering Soft 4.796748 5.741534 6.529120
## 14 statistics Soft 56.829268 58.232775 60.276835
## 15 visualization Soft 21.823461 27.500973 30.112301
## 16 AWS Tech 10.998839 15.395095 15.852703
## 17 Azure Tech 6.713124 8.096536 7.443197
## 18 C Tech 9.233449 9.575710 10.028728
## 19 C# Tech 3.763066 4.768392 4.753199
## 20 C++ Tech 11.893148 14.889062 15.147558
## 21 Caffe Tech 2.392567 2.899961 2.951162
## 22 Cassandra Tech 2.740999 3.386532 3.813006
## 23 D3 Tech 4.099884 2.899961 2.951162
## 24 Docker Tech 3.368177 4.671078 3.865239
## 25 Excel Tech 8.141696 11.074348 11.439018
## 26 Git Tech 3.275261 5.079798 4.857665
## 27 Hadoop Tech 24.878049 30.712339 30.399582
## 28 Hbase Tech 3.507549 4.262359 4.361452
## 29 Hive Tech 13.728223 16.154146 16.636197
## 30 Java Tech 22.578397 26.800311 27.657352
## 31 Javascript Tech 3.809524 4.768392 5.588927
## 32 Keras Tech 3.821138 4.924095 5.353878
## 33 Linux Tech 6.980256 10.062281 9.506399
## 34 Matlab Tech 9.361208 13.176333 14.207365
## 35 MongoDB Tech 2.915215 3.814714 4.309219
## 36 MySQL Tech 3.228804 4.534838 4.883782
## 37 NoSQL Tech 6.945412 8.485792 10.107078
## 38 Numpy Tech 4.494774 5.001946 6.059023
## 39 Pandas Tech 4.889663 6.422733 7.364847
## 40 Perl Tech 3.588850 5.021409 5.275529
## 41 Pig Tech 4.262485 5.760996 6.032907
## 42 Python Tech 73.716609 74.309070 75.424393
## 43 PyTorch Tech 2.485482 2.783184 3.421259
## 44 R Tech 52.880372 60.451538 62.496735
## 45 SAS Tech 19.895470 22.070845 23.765996
## 46 Scala Tech 12.078978 14.383028 15.382606
## 47 Scikit-learn Tech 5.505226 7.824056 7.678245
## 48 Spark Tech 25.191638 30.186843 30.477932
## 49 SPSS Tech 5.249710 6.422733 7.129799
## 50 SQL Tech 45.052265 51.148307 53.695482
## 51 Tableau Tech 14.123113 19.696380 20.370854
## 52 TensorFlow Tech 9.802555 12.864928 13.084356
## Monster Total
## 1 18.339562 19.654833
## 2 88.254138 68.667636
## 3 54.805125 44.984289
## 4 50.720769 52.755241
## 5 5.339028 6.232706
## 6 16.177256 16.742485
## 7 62.466631 65.849083
## 8 48.451682 36.946021
## 9 8.142018 8.826150
## 10 15.536572 15.776392
## 11 9.289909 7.273836
## 12 20.928991 12.305961
## 13 13.667912 6.893964
## 14 64.041644 59.053604
## 15 32.221036 26.506589
## 16 12.466631 13.187638
## 17 7.261078 7.273836
## 18 13.961559 10.289359
## 19 5.846236 4.549078
## 20 11.719167 13.168879
## 21 2.562734 2.645031
## 22 3.630539 3.245322
## 23 2.536038 3.329738
## 24 5.178857 4.089481
## 25 10.597971 9.871969
## 26 3.870796 4.098860
## 27 32.034170 28.532570
## 28 3.683930 3.873751
## 29 16.524293 15.326174
## 30 26.748532 25.240351
## 31 5.979712 4.741359
## 32 3.497064 4.305210
## 33 8.088628 8.371242
## 34 11.185264 11.471181
## 35 3.096636 3.414154
## 36 3.230112 3.840923
## 37 9.663641 8.361863
## 38 4.057662 4.821085
## 39 4.671650 5.665244
## 40 5.285638 4.535009
## 41 6.833956 5.393237
## 42 67.912440 73.146368
## 43 2.616124 2.748206
## 44 63.134010 58.232894
## 45 26.107848 22.206069
## 46 13.881474 13.544060
## 47 5.659370 6.481264
## 48 28.350240 27.899451
## 49 5.392419 5.895043
## 50 49.145755 48.792384
## 51 19.861185 17.596023
## 52 10.277629 11.213244
ds_skills_soft <- filter(ds_skills, ds_skills$Skill_Set == "Soft")
dim(ds_skills_soft)
## [1] 15 7
head(ds_skills_soft, 20)
## Keyword Skill_Set LinkedIn Indeed SimplyHired
## 1 AI composite Soft 18.211382 21.895679 21.180465
## 2 analysis Soft 60.023229 68.119891 69.678767
## 3 communication Soft 39.535424 45.620864 46.774615
## 4 computer science Soft 52.462253 53.308680 54.661792
## 5 data engineering Soft 5.969803 6.597898 7.208148
## 6 deep learning Soft 15.214866 19.054107 17.628624
## 7 machine learning Soft 66.213705 66.932659 66.884304
## 8 mathematics Soft 30.255517 38.166602 39.096370
## 9 neural networks Soft 7.793264 9.439471 10.995038
## 10 NLP composite Soft 14.076655 17.711172 17.236876
## 11 project management Soft 5.528455 7.726742 8.618438
## 12 software development Soft 8.501742 12.203192 12.562027
## 13 software engineering Soft 4.796748 5.741534 6.529120
## 14 statistics Soft 56.829268 58.232775 60.276835
## 15 visualization Soft 21.823461 27.500973 30.112301
## Monster Total
## 1 18.339562 19.654833
## 2 88.254138 68.667636
## 3 54.805125 44.984289
## 4 50.720769 52.755241
## 5 5.339028 6.232706
## 6 16.177256 16.742485
## 7 62.466631 65.849083
## 8 48.451682 36.946021
## 9 8.142018 8.826150
## 10 15.536572 15.776392
## 11 9.289909 7.273836
## 12 20.928991 12.305961
## 13 13.667912 6.893964
## 14 64.041644 59.053604
## 15 32.221036 26.506589
ds_skills_soft <- arrange(ds_skills_soft, desc(Total))
head(ds_skills_soft, 2)
## Keyword Skill_Set LinkedIn Indeed SimplyHired Monster
## 1 analysis Soft 60.02323 68.11989 69.67877 88.25414
## 2 machine learning Soft 66.21370 66.93266 66.88430 62.46663
## Total
## 1 68.66764
## 2 65.84908
#Top 10 soft skills
ds_skills_soft <- ds_skills_soft[1:10,c(1,3:7)]
#Bar Graphs ## Top ten Soft skills as a percentage of total Data Science jobs
p <- plot_ly(ds_skills_soft, x = ~ ds_skills_soft, y = ds_skills_soft$Total, type = 'bar', name = ds_skills_soft$Keyword)
## This version of Shiny is designed to work with 'htmlwidgets' >= 1.5.
## Please upgrade via install.packages('htmlwidgets').
p
top_10_soft_visual <- data.frame("Keyword"=ds_skills_soft, ds_skills_soft)
head(top_10_soft_visual, 2)
## Keyword.Keyword Keyword.LinkedIn Keyword.Indeed Keyword.SimplyHired
## 1 analysis 60.02323 68.11989 69.67877
## 2 machine learning 66.21370 66.93266 66.88430
## Keyword.Monster Keyword.Total Keyword LinkedIn Indeed
## 1 88.25414 68.66764 analysis 60.02323 68.11989
## 2 62.46663 65.84908 machine learning 66.21370 66.93266
## SimplyHired Monster Total
## 1 69.67877 88.25414 68.66764
## 2 66.88430 62.46663 65.84908
data_soft <- top_10_soft_visual[,c('Keyword', 'LinkedIn', 'Indeed', 'SimplyHired', 'Monster', 'Total')]
ds_skills_soft <- filter(ds_skills, ds_skills$Skill_Set == "Soft")
dim(ds_skills_soft)
## [1] 15 7
# View first 20 non-technical skills
head(ds_skills_soft, 20)
## Keyword Skill_Set LinkedIn Indeed SimplyHired
## 1 AI composite Soft 18.211382 21.895679 21.180465
## 2 analysis Soft 60.023229 68.119891 69.678767
## 3 communication Soft 39.535424 45.620864 46.774615
## 4 computer science Soft 52.462253 53.308680 54.661792
## 5 data engineering Soft 5.969803 6.597898 7.208148
## 6 deep learning Soft 15.214866 19.054107 17.628624
## 7 machine learning Soft 66.213705 66.932659 66.884304
## 8 mathematics Soft 30.255517 38.166602 39.096370
## 9 neural networks Soft 7.793264 9.439471 10.995038
## 10 NLP composite Soft 14.076655 17.711172 17.236876
## 11 project management Soft 5.528455 7.726742 8.618438
## 12 software development Soft 8.501742 12.203192 12.562027
## 13 software engineering Soft 4.796748 5.741534 6.529120
## 14 statistics Soft 56.829268 58.232775 60.276835
## 15 visualization Soft 21.823461 27.500973 30.112301
## Monster Total
## 1 18.339562 19.654833
## 2 88.254138 68.667636
## 3 54.805125 44.984289
## 4 50.720769 52.755241
## 5 5.339028 6.232706
## 6 16.177256 16.742485
## 7 62.466631 65.849083
## 8 48.451682 36.946021
## 9 8.142018 8.826150
## 10 15.536572 15.776392
## 11 9.289909 7.273836
## 12 20.928991 12.305961
## 13 13.667912 6.893964
## 14 64.041644 59.053604
## 15 32.221036 26.506589
#Arrange data from largest to smallest by the Totals column
ds_skills_soft <- arrange(ds_skills_soft, desc(Total))
#Filter out only the Top 10 non-technical skills
top_10_soft <- ds_skills_soft[1:10,c(1,3:7)]
top_10_soft
## Keyword LinkedIn Indeed SimplyHired Monster Total
## 1 analysis 60.02323 68.11989 69.67877 88.25414 68.66764
## 2 machine learning 66.21370 66.93266 66.88430 62.46663 65.84908
## 3 statistics 56.82927 58.23278 60.27683 64.04164 59.05360
## 4 computer science 52.46225 53.30868 54.66179 50.72077 52.75524
## 5 communication 39.53542 45.62086 46.77461 54.80513 44.98429
## 6 mathematics 30.25552 38.16660 39.09637 48.45168 36.94602
## 7 visualization 21.82346 27.50097 30.11230 32.22104 26.50659
## 8 AI composite 18.21138 21.89568 21.18046 18.33956 19.65483
## 9 deep learning 15.21487 19.05411 17.62862 16.17726 16.74248
## 10 NLP composite 14.07666 17.71117 17.23688 15.53657 15.77639
ds_skills_tech <- filter(ds_skills, ds_skills$Skill_Set == "Tech")
dim(ds_skills_tech)
## [1] 37 7
# View first 20 skills
head(ds_skills_tech, 37)
## Keyword Skill_Set LinkedIn Indeed SimplyHired Monster
## 1 AWS Tech 10.998839 15.395095 15.852703 12.466631
## 2 Azure Tech 6.713124 8.096536 7.443197 7.261078
## 3 C Tech 9.233449 9.575710 10.028728 13.961559
## 4 C# Tech 3.763066 4.768392 4.753199 5.846236
## 5 C++ Tech 11.893148 14.889062 15.147558 11.719167
## 6 Caffe Tech 2.392567 2.899961 2.951162 2.562734
## 7 Cassandra Tech 2.740999 3.386532 3.813006 3.630539
## 8 D3 Tech 4.099884 2.899961 2.951162 2.536038
## 9 Docker Tech 3.368177 4.671078 3.865239 5.178857
## 10 Excel Tech 8.141696 11.074348 11.439018 10.597971
## 11 Git Tech 3.275261 5.079798 4.857665 3.870796
## 12 Hadoop Tech 24.878049 30.712339 30.399582 32.034170
## 13 Hbase Tech 3.507549 4.262359 4.361452 3.683930
## 14 Hive Tech 13.728223 16.154146 16.636197 16.524293
## 15 Java Tech 22.578397 26.800311 27.657352 26.748532
## 16 Javascript Tech 3.809524 4.768392 5.588927 5.979712
## 17 Keras Tech 3.821138 4.924095 5.353878 3.497064
## 18 Linux Tech 6.980256 10.062281 9.506399 8.088628
## 19 Matlab Tech 9.361208 13.176333 14.207365 11.185264
## 20 MongoDB Tech 2.915215 3.814714 4.309219 3.096636
## 21 MySQL Tech 3.228804 4.534838 4.883782 3.230112
## 22 NoSQL Tech 6.945412 8.485792 10.107078 9.663641
## 23 Numpy Tech 4.494774 5.001946 6.059023 4.057662
## 24 Pandas Tech 4.889663 6.422733 7.364847 4.671650
## 25 Perl Tech 3.588850 5.021409 5.275529 5.285638
## 26 Pig Tech 4.262485 5.760996 6.032907 6.833956
## 27 Python Tech 73.716609 74.309070 75.424393 67.912440
## 28 PyTorch Tech 2.485482 2.783184 3.421259 2.616124
## 29 R Tech 52.880372 60.451538 62.496735 63.134010
## 30 SAS Tech 19.895470 22.070845 23.765996 26.107848
## 31 Scala Tech 12.078978 14.383028 15.382606 13.881474
## 32 Scikit-learn Tech 5.505226 7.824056 7.678245 5.659370
## 33 Spark Tech 25.191638 30.186843 30.477932 28.350240
## 34 SPSS Tech 5.249710 6.422733 7.129799 5.392419
## 35 SQL Tech 45.052265 51.148307 53.695482 49.145755
## 36 Tableau Tech 14.123113 19.696380 20.370854 19.861185
## 37 TensorFlow Tech 9.802555 12.864928 13.084356 10.277629
## Total
## 1 13.187638
## 2 7.273836
## 3 10.289359
## 4 4.549078
## 5 13.168879
## 6 2.645031
## 7 3.245322
## 8 3.329738
## 9 4.089481
## 10 9.871969
## 11 4.098860
## 12 28.532570
## 13 3.873751
## 14 15.326174
## 15 25.240351
## 16 4.741359
## 17 4.305210
## 18 8.371242
## 19 11.471181
## 20 3.414154
## 21 3.840923
## 22 8.361863
## 23 4.821085
## 24 5.665244
## 25 4.535009
## 26 5.393237
## 27 73.146368
## 28 2.748206
## 29 58.232894
## 30 22.206069
## 31 13.544060
## 32 6.481264
## 33 27.899451
## 34 5.895043
## 35 48.792384
## 36 17.596023
## 37 11.213244
#Arrange data from largest to smallest by the Totals column
ds_skills_tech <- arrange(ds_skills_tech, desc(Total))
#Filter out only the Top 10 technical skills
top_10_tech <- ds_skills_tech[1:10,c(1,3:7)]
top_10_tech
## Keyword LinkedIn Indeed SimplyHired Monster Total
## 1 Python 73.71661 74.30907 75.42439 67.91244 73.14637
## 2 R 52.88037 60.45154 62.49674 63.13401 58.23289
## 3 SQL 45.05226 51.14831 53.69548 49.14576 48.79238
## 4 Hadoop 24.87805 30.71234 30.39958 32.03417 28.53257
## 5 Spark 25.19164 30.18684 30.47793 28.35024 27.89945
## 6 Java 22.57840 26.80031 27.65735 26.74853 25.24035
## 7 SAS 19.89547 22.07084 23.76600 26.10785 22.20607
## 8 Tableau 14.12311 19.69638 20.37085 19.86119 17.59602
## 9 Hive 13.72822 16.15415 16.63620 16.52429 15.32617
## 10 Scala 12.07898 14.38303 15.38261 13.88147 13.54406
x <- plot_ly(top_10_tech, x = ~ top_10_tech, y = top_10_tech$Total, type = 'bar', name = top_10_tech$Keyword)
x
y <- plot_ly(top_10_soft, x = ~ top_10_soft, y = top_10_soft$Total, type = 'bar', name = top_10_soft$Keyword)
y
top_10_tech_visual <- data.frame("Keyword"=top_10_tech, top_10_tech)
head(top_10_tech_visual, 2)
## Keyword.Keyword Keyword.LinkedIn Keyword.Indeed Keyword.SimplyHired
## 1 Python 73.71661 74.30907 75.42439
## 2 R 52.88037 60.45154 62.49674
## Keyword.Monster Keyword.Total Keyword LinkedIn Indeed SimplyHired
## 1 67.91244 73.14637 Python 73.71661 74.30907 75.42439
## 2 63.13401 58.23289 R 52.88037 60.45154 62.49674
## Monster Total
## 1 67.91244 73.14637
## 2 63.13401 58.23289
data <- top_10_tech_visual[,c('Keyword', 'LinkedIn', 'Indeed', 'SimplyHired', 'Monster', 'Total')]
head(data, 2)
## Keyword LinkedIn Indeed SimplyHired Monster Total
## 1 Python 73.71661 74.30907 75.42439 67.91244 73.14637
## 2 R 52.88037 60.45154 62.49674 63.13401 58.23289
s <- plot_ly(data, x = ~Keyword, y = ~LinkedIn, type = 'bar', name = "Linkedin") %>%
add_trace(y = ~Indeed, name = "Indeed") %>%
add_trace(y = ~SimplyHired, name = "SimplyHired") %>%
add_trace(y = ~Monster, name = "Monster") %>%
add_trace(y = ~Total, name = "Average") %>%
layout(title = 'Technical Skills by Job Sites', yaxis = list(title = '% of Data Science Jobs'), xaxis = list(title = 'Tech Skills'), barmode = 'group')
s
top_10_soft_visual <- data.frame("Keyword"=top_10_soft, top_10_soft)
head(top_10_soft_visual, 2)
## Keyword.Keyword Keyword.LinkedIn Keyword.Indeed Keyword.SimplyHired
## 1 analysis 60.02323 68.11989 69.67877
## 2 machine learning 66.21370 66.93266 66.88430
## Keyword.Monster Keyword.Total Keyword LinkedIn Indeed
## 1 88.25414 68.66764 analysis 60.02323 68.11989
## 2 62.46663 65.84908 machine learning 66.21370 66.93266
## SimplyHired Monster Total
## 1 69.67877 88.25414 68.66764
## 2 66.88430 62.46663 65.84908
data2 <- top_10_soft_visual[,c('Keyword', 'LinkedIn', 'Indeed', 'SimplyHired', 'Monster', 'Total')]
head(data, 2)
## Keyword LinkedIn Indeed SimplyHired Monster Total
## 1 Python 73.71661 74.30907 75.42439 67.91244 73.14637
## 2 R 52.88037 60.45154 62.49674 63.13401 58.23289
t <- plot_ly(data2, x = ~Keyword, y = ~LinkedIn, type = 'bar', name = "Linkedin") %>%
add_trace(y = ~Indeed, name = "Indeed") %>%
add_trace(y = ~SimplyHired, name = "SimplyHired") %>%
add_trace(y = ~Monster, name = "Monster") %>%
add_trace(y = ~Total, name = "Average") %>%
layout(title = 'Non-Technical Skills by Job Sites', yaxis = list(title = '% of Data Science Jobs'), xaxis = list(title = 'Non-Tech Skills'), barmode = 'group')
t
head(ds_skills_tech, 2)
## Keyword Skill_Set LinkedIn Indeed SimplyHired Monster Total
## 1 Python Tech 73.71661 74.30907 75.42439 67.91244 73.14637
## 2 R Tech 52.88037 60.45154 62.49674 63.13401 58.23289
head(ds_skills_soft, 2)
## Keyword Skill_Set LinkedIn Indeed SimplyHired Monster
## 1 analysis Soft 60.02323 68.11989 69.67877 88.25414
## 2 machine learning Soft 66.21370 66.93266 66.88430 62.46663
## Total
## 1 68.66764
## 2 65.84908
There are many skills and skill-sets that employers are looking for when hiring a data-scientist. Given that Data Science is amultidisciplinary field, it is only fitting that there are both Techology based skills and Soft skills that companies look for in a data science.
The top Techology skill that employers are looking for when hiring a data scientist, according to this dataset, is Python.
The top Soft skill that employers are looking for when hiring a data scientist, according to this dataset, is Analyis.
The top Overall skill that employers are looking for when hiring a data scientist, according to this dataset, is Python.