Use data in Open Skills web site to find job titles with Data Science skills.
Data from site to be used is clean_title_count
Use tidy workflow to produce analysis:
library(RCurl)
library(dplyr)
library(tidyr)
library(rvest)
library(stringr)
library(kableExtra)
csvFiles<-read_html("https://open-skills-datasets.s3-us-west-2.amazonaws.com/") %>% html_nodes("key") %>% html_text() %>% str_extract("cleaned_title_count/[:print:]+") %>% str_extract("[0-9]+Q[1-4].csv")
csvFiles<-csvFiles[!is.na(csvFiles)]
csvFiles %>% kable(align = 'c') %>% kable_styling() %>% scroll_box(width = "300px",height="400px") %>% kable_styling(position = "center")
| x |
|---|
| 2011Q1.csv |
| 2011Q2.csv |
| 2011Q3.csv |
| 2011Q4.csv |
| 2012Q1.csv |
| 2012Q2.csv |
| 2012Q3.csv |
| 2012Q4.csv |
| 2013Q1.csv |
| 2013Q2.csv |
| 2013Q3.csv |
| 2013Q4.csv |
| 2014Q1.csv |
| 2014Q2.csv |
| 2015Q1.csv |
| 2016Q1.csv |
| 2016Q2.csv |
| 2016Q3.csv |
| 2016Q4.csv |
| 2017Q1.csv |
out<-vector()
for (i in 0:3) {
url<-paste("https://open-skills-datasets.s3-us-west-2.amazonaws.com/cleaned_title_count/",csvFiles[length(csvFiles)-i],sep="")
x<-getURL(url)
out<-rbind(out,read.csv(textConnection(x)))
}
dataTable<-tbl_df(out)
head(dataTable,n=20) %>% kable() %>% kable_styling() %>% scroll_box(width = "910px",height="400px")
| title | counts_total | skills_1 | skills_2 | skills_3 | skills_4 | skills_5 | skills_6 | skills_7 | skills_8 | skills_9 | skills_10 | soc_code_common_1 | soc_code_common_2 | soc_code_common_total | soc_code_top_1 | soc_code_top_2 | soc_code_top_total | soc_code_given_1 | soc_code_given_2 | soc_code_given_total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 39-9032.00 | 1 | 13-1071.00 | 1 | 35-2014.00 | NA | 1 | |||||||||||||
| rsaf language training elt instructors | 2 | c | reduce | english language | 25-1123.00 | 1 | 27-3031.00 | 21-1093.00 | 2 | NA | 1 | |||||||||
| integrator | 2 | troubleshooting | skill | red hat enterprise linux | linux | design | 15-1142.00 | 1 | 15-1122.00 | 15-1143.00 | 2 | NA | 1 | |||||||
| correctional officer | 1 | transportation | brakes | 33-1012.00 | 1 | 29-2012.00 | 1 | 33-3012.00 | NA | 1 | ||||||||||
| store all positions | 2 | time management | 13-1199.00 | 51-9151.00 | 2 | 43-4161.00 | 43-4051.03 | 2 | NA | 1 | ||||||||||
| head of office | 2 | writing | forth | 11-2022.00 | 11-3021.00 | 2 | 13-1151.00 | 11-3021.00 | 2 | NA | 1 | |||||||||
| security positions | 1 | rules | closedcircuit tv cameras | security cameras | bulletproof vests | surveillance cameras | alarms | 33-3051.01 | 1 | 33-3051.01 | 1 | NA | 1 | |||||||
| sr manager event communications | 1 | speaking | writing | forth | 11-2031.00 | 1 | 11-2011.00 | 1 | 11-2021.00 | NA | 1 | |||||||||
| assistant director of engineering and construction | 1 | ada | mechanical | levels | impact | 11-9041.00 | 1 | 17-2141.00 | 1 | NA | 1 | |||||||||
| scheduling coordinator | 3 | microsoft office | 31-1014.00 | 29-2061.00 | 2 | 11-9111.00 | 41-4011.00 | 3 | 43-5061.00 | NA | 1 | |||||||||
| systems engineer fair | 1 | skill | 17-2071.00 | 1 | 17-2141.00 | 1 | NA | 1 | ||||||||||||
| school bus driver | 2 | skill | transportation | route | 25-2012.00 | 1 | 25-2052.00 | 25-9041.00 | 2 | 53-3022.00 | NA | 1 | ||||||||
| systems analyst | 1 | skill | c | simulation software | mathematics | mechanical | physics | science | design | 13-1111.00 | 1 | 15-1121.00 | 1 | 15-1121.00 | NA | 1 | ||||
| warehouser | 1 | trucks | pallet jacks | forklifts | sweepers | 33-3051.01 | 1 | 49-9071.00 | 1 | 49-9071.00 | NA | 1 | ||||||||
| cloud solution architect | 2 | redmine | c | programming | python | visualization | javascript | puppet | apache tomcat | stencils | oracle | 15-1132.00 | 15-1121.00 | 2 | 15-1132.00 | 15-1143.00 | 2 | 15-1199.02 | NA | 1 |
| insemination crew worker | 4 | speaking | self | personal protective equipment | 43-5061.00 | 27-2011.00 | 4 | 35-3021.00 | 37-2011.00 | 4 | NA | 1 | ||||||||
| product support sales representative | 2 | stamina | 41-4012.00 | 1 | 41-4011.00 | 41-2031.00 | 2 | NA | 1 | |||||||||||
| direct support professional team leader | 1 | 29-1181.00 | 1 | 15-1122.00 | 1 | NA | 1 | |||||||||||||
| data scientist | 1 | skill | science | git | python | mathematics | r | derive | 15-1134.00 | 1 | 17-2112.00 | 1 | 15-1111.00 | NA | 1 | |||||
| part bus driver commercial drivers license | 1 | 33-3051.01 | 1 | 53-3022.00 | 1 | NA | 1 |
#tidyDataTable<-gather(dataTable,)