For this project, we are going to do research about data scientists’ career future, necessary skill and salary. Data were colllected from Indeed, O*NET, Bureau of Labor Statistics website by webscaping using R.
Hui (Gracie) Han and Jun Pan) were focused on the data analysis of 31 data science related jobs and necessary hard skills.
Firstly, 31 data science related jobs and required skills were downloaded from O*NET website and saved in github repository. Load csv files for occupations and skills from github
#try(setwd("tech_skills/old"))
f1<- read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_11-3111-00.csv")
f3<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_13-1141-00.csv")
f4<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_13-1161-00.csv")
f5<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_13-2011-02.csv")
f6<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_13-2041-00.csv")
f7<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_13-2051-00.csv")
f8<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_13-2053-00.csv")
f9<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_13-2099-02.csv")
f10<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-1111-00.csv")
f11<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-1121-00.csv")
f12<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-1131-00.csv")
f13<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-1133-00.csv")
f14<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-1134-00.csv")
f15<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-1141-00.csv")
f17<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-2021-00.csv")
f18<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-2031-00.csv")
f19<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-2041-00.csv")
f20<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-2041-01.csv")
f21<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-2041-02.csv")
f22<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_19-2099-01.csv")
f23<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_19-3011-00.csv")
f24<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_19-3022-00.csv")
f25<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_19-4061-00.csv")
f26<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_25-1021-00.csv")
f27<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_27-4011-00.csv")
f28<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_27-4012-00.csv")
f29<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_43-9011-00.csv")
f30<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_49-2011-00.csv")
f31<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_25-9011-00.csv")
f32<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-1151-00.csv")
f33<-read.csv("https://raw.githubusercontent.com/simplymathematics/data-skills/master/tech_skills/old/technology_skills_15-2011-00.csv")
Set Working Environment
combined all information of 33 jobs into one mass dataframe
df<-bind_rows(f1,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30, f31, f32, f33)
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
head (df)
## Section Category
## 1 Technology Accounting software
## 2 Technology Accounting software
## 3 Technology Analytical or scientific software
## 4 Technology Analytical or scientific software
## 5 Technology Data base reporting software
## 6 Technology Data base user interface and query software
## Example
## 1 Deltek Costpoint
## 2 Intuit QuickBooks
## 3 Business analysis software
## 4 Relex Weibull
## 5 AdRelevance
## 6 Microsoft Access
tail(df)
## Section Category
## 1879 Technology Object or component oriented development software
## 1880 Technology Object oriented data base management software
## 1881 Technology Office suite software
## 1882 Technology Presentation software
## 1883 Technology Spreadsheet software
## 1884 Technology Word processing software
## Example
## 1879 R
## 1880 Microsoft Visual FoxPro
## 1881 Microsoft Office
## 1882 Microsoft PowerPoint
## 1883 Microsoft Excel
## 1884 Microsoft Word
dim(df)
## [1] 1884 3
Our partener has orgnized a bunch of key skills for data scientists. Later, we will see the match to our mass dataframe of 33 occupations.
skills_Ravi<- c("AWS", "Python","AI", "SQL", "R", "SAS", "Tableau", "AZURE", "SparkML", "Spark","Hadoop", "Machine Learning", "Shiny","Statistics","Probability")
After, review Ravi’s key skills and O.NET website. Gracie and Jun has pulled out a set of skills as backup plan for this study.
skills_Jun <- c("C", "C#","Cassandra", "Django", "Hadoop", "Hive", "HTML", "Java", "MangoDB", "Matlab", "Python", "Pig", "SAS", "R", "Ruby", "SAS", "SQL", "Statistics","Tableau","Teradata")
Using the filter function of dplyr packge to get the data of our mass dataframe matched with Ravi’s key skills.
df_Ravi <- df %>% filter (Example %in% skills_Ravi)
## Warning: package 'bindrcpp' was built under R version 3.3.3
print(df_Ravi)
## Section Category Example
## 1 Technology Analytical or scientific software SAS
## 2 Technology Object or component oriented development software R
## 3 Technology Analytical or scientific software SAS
## 4 Technology Business intelligence and data analysis software Tableau
## 5 Technology Object or component oriented development software R
## 6 Technology Analytical or scientific software SAS
## 7 Technology Analytical or scientific software SAS
## 8 Technology Business intelligence and data analysis software Tableau
## 9 Technology Object or component oriented development software R
## 10 Technology Analytical or scientific software SAS
## 11 Technology Business intelligence and data analysis software Tableau
## 12 Technology Object or component oriented development software R
## 13 Technology Analytical or scientific software SAS
## 14 Technology Business intelligence and data analysis software Tableau
## 15 Technology Object or component oriented development software Python
## 16 Technology Object or component oriented development software R
## 17 Technology Analytical or scientific software SAS
## 18 Technology Object or component oriented development software Python
## 19 Technology Analytical or scientific software SAS
## 20 Technology Business intelligence and data analysis software Tableau
## 21 Technology Object or component oriented development software Python
## 22 Technology Analytical or scientific software SAS
## 23 Technology Object or component oriented development software Python
## 24 Technology Business intelligence and data analysis software Tableau
## 25 Technology Object or component oriented development software Python
## 26 Technology Analytical or scientific software SAS
## 27 Technology Business intelligence and data analysis software Tableau
## 28 Technology Object or component oriented development software Python
## 29 Technology Object or component oriented development software R
## 30 Technology Analytical or scientific software SAS
## 31 Technology Object or component oriented development software Python
## 32 Technology Object or component oriented development software R
## 33 Technology Analytical or scientific software SAS
## 34 Technology Business intelligence and data analysis software Tableau
## 35 Technology Object or component oriented development software Python
## 36 Technology Object or component oriented development software R
## 37 Technology Analytical or scientific software SAS
## 38 Technology Business intelligence and data analysis software Tableau
## 39 Technology Object or component oriented development software Python
## 40 Technology Object or component oriented development software R
## 41 Technology Analytical or scientific software SAS
## 42 Technology Object or component oriented development software Python
## 43 Technology Analytical or scientific software SAS
## 44 Technology Analytical or scientific software SAS
## 45 Technology Object or component oriented development software Python
## 46 Technology Analytical or scientific software SAS
## 47 Technology Business intelligence and data analysis software Tableau
## 48 Technology Object or component oriented development software Python
## 49 Technology Analytical or scientific software SAS
## 50 Technology Object or component oriented development software Python
## 51 Technology Object or component oriented development software Python
## 52 Technology Business intelligence and data analysis software Tableau
## 53 Technology Object or component oriented development software Python
## 54 Technology Analytical or scientific software SAS
## 55 Technology Object or component oriented development software R
Data visulization using ggplot2. We can find that according to Ravi’s skills, we can find that the top 4 skills for data scientists are the follwoing: SAS, Python, Tableau and R.
pl <- ggplot(df_Ravi, aes(x = Example, color = Example, fill = Example)) + geom_bar()
print(pl)
Similar finding were observed using Gracie and Jun’s key words for data scientist. The top 8 key skills for data scientist are SAS, Pythone, Tableau, R, C, Ruby, Diango.
df_Jun <- df %>% filter (Example %in% skills_Jun)
pl_Jun <- ggplot(df_Jun, aes(x = Example, color = Example, fill = Example)) + geom_bar()
print(pl_Jun)
Techskills.graphic <- pl_Jun
Those are the very preliminary data from our analysis.
Techskill.frame <- df_Ravi
Techskills.graphic
setwd(‘..’)