Open Skill data science skills analysis

Use data in Open Skills web site to find job titles with Data Science skills.
Data from site to be used is clean_title_count

Use tidy workflow to produce analysis:

Libraries

library(RCurl)
library(dplyr)
library(tidyr)
library(rvest)
library(stringr)
library(kableExtra)

Import

Get list of csv files to import from Open Skills website

csvFiles<-read_html("https://open-skills-datasets.s3-us-west-2.amazonaws.com/") %>% html_nodes("key") %>% html_text() %>% str_extract("cleaned_title_count/[:print:]+") %>% str_extract("[0-9]+Q[1-4].csv")
csvFiles<-csvFiles[!is.na(csvFiles)] 
csvFiles %>% kable(align = 'c') %>% kable_styling() %>% scroll_box(width = "300px",height="400px") %>% kable_styling(position = "center") 
x
2011Q1.csv
2011Q2.csv
2011Q3.csv
2011Q4.csv
2012Q1.csv
2012Q2.csv
2012Q3.csv
2012Q4.csv
2013Q1.csv
2013Q2.csv
2013Q3.csv
2013Q4.csv
2014Q1.csv
2014Q2.csv
2015Q1.csv
2016Q1.csv
2016Q2.csv
2016Q3.csv
2016Q4.csv
2017Q1.csv

Load last 4 csv files - last year worth of data

out<-vector()
for (i in 0:3) {
  url<-paste("https://open-skills-datasets.s3-us-west-2.amazonaws.com/cleaned_title_count/",csvFiles[length(csvFiles)-i],sep="") 
  x<-getURL(url)
  out<-rbind(out,read.csv(textConnection(x)))
}
dataTable<-tbl_df(out)
head(dataTable,n=20) %>% kable() %>% kable_styling() %>% scroll_box(width = "910px",height="400px")
title counts_total skills_1 skills_2 skills_3 skills_4 skills_5 skills_6 skills_7 skills_8 skills_9 skills_10 soc_code_common_1 soc_code_common_2 soc_code_common_total soc_code_top_1 soc_code_top_2 soc_code_top_total soc_code_given_1 soc_code_given_2 soc_code_given_total
1 39-9032.00 1 13-1071.00 1 35-2014.00 NA 1
rsaf language training elt instructors 2 c reduce english language 25-1123.00 1 27-3031.00 21-1093.00 2 NA 1
integrator 2 troubleshooting skill red hat enterprise linux linux design 15-1142.00 1 15-1122.00 15-1143.00 2 NA 1
correctional officer 1 transportation brakes 33-1012.00 1 29-2012.00 1 33-3012.00 NA 1
store all positions 2 time management 13-1199.00 51-9151.00 2 43-4161.00 43-4051.03 2 NA 1
head of office 2 writing forth 11-2022.00 11-3021.00 2 13-1151.00 11-3021.00 2 NA 1
security positions 1 rules closedcircuit tv cameras security cameras bulletproof vests surveillance cameras alarms 33-3051.01 1 33-3051.01 1 NA 1
sr manager event communications 1 speaking writing forth 11-2031.00 1 11-2011.00 1 11-2021.00 NA 1
assistant director of engineering and construction 1 ada mechanical levels impact 11-9041.00 1 17-2141.00 1 NA 1
scheduling coordinator 3 microsoft office 31-1014.00 29-2061.00 2 11-9111.00 41-4011.00 3 43-5061.00 NA 1
systems engineer fair 1 skill 17-2071.00 1 17-2141.00 1 NA 1
school bus driver 2 skill transportation route 25-2012.00 1 25-2052.00 25-9041.00 2 53-3022.00 NA 1
systems analyst 1 skill c simulation software mathematics mechanical physics science design 13-1111.00 1 15-1121.00 1 15-1121.00 NA 1
warehouser 1 trucks pallet jacks forklifts sweepers 33-3051.01 1 49-9071.00 1 49-9071.00 NA 1
cloud solution architect 2 redmine c programming python visualization javascript puppet apache tomcat stencils oracle 15-1132.00 15-1121.00 2 15-1132.00 15-1143.00 2 15-1199.02 NA 1
insemination crew worker 4 speaking self personal protective equipment 43-5061.00 27-2011.00 4 35-3021.00 37-2011.00 4 NA 1
product support sales representative 2 stamina 41-4012.00 1 41-4011.00 41-2031.00 2 NA 1
direct support professional team leader 1 29-1181.00 1 15-1122.00 1 NA 1
data scientist 1 skill science git python mathematics r derive 15-1134.00 1 17-2112.00 1 15-1111.00 NA 1
part bus driver commercial drivers license 1 33-3051.01 1 53-3022.00 1 NA 1

Tidy

#tidyDataTable<-gather(dataTable,)