Data607 Project 3

Project 3 - Most Wanted skills in Finance and Insurance?

library(tidyr)

## Warning: package 'tidyr' was built under R version 3.4.2

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.4.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

#Clean job data for project 3 across industries
JobData <- tbl_df(read.csv("Project3_CleanData.csv", stringsAsFactors = FALSE, check.names = FALSE))

Which skill sets are the most freqents in terms of keywords?

FinanceAndInsurData_SkillSetCount <- JobData %>%
  filter(Industry == "Finance") %>%
  count(`Skill Set`, sort= TRUE)

## Warning: package 'bindrcpp' was built under R version 3.4.2

# count() is a short-hand for group_by() + tally()

(FinanceAndInsurData_SkillSetCount)

## # A tibble: 5 x 2
##             `Skill Set`     n
##                   <chr> <int>
## 1 Programming/Technical    87
## 2            Soft Skill    17
## 3              Business    11
## 4     Analysis/Research     5
## 5           Mathematics     3

As we can see above, Programming/Technical and soft skills are most important while business related keywords are much less frequent. It may imply that data scientist job in Financial / Insurance job, business knowledge is not the major focus.

sO, WHICH Skill Type are most variable in Programming/Technical and Soft Skills?

FinanceAndInsurData_Programming_tech <- JobData %>%
  filter(Industry == "Finance" & `Skill Set` == "Programming/Technical") %>%
  count(`Skill Type`, sort=TRUE)

    
(FinanceAndInsurData_Programming_tech)

## # A tibble: 54 x 2
##                     `Skill Type`     n
##                            <chr> <int>
##  1                        Python     7
##  2                          Java     5
##  3                             R     5
##  4                        Hadoop     4
##  5                         Scala     4
##  6                           C++     3
##  7                         Spark     3
##  8                           SQL     3
##  9 Build Machine Learning Models     2
## 10                  Data Mining      2
## # ... with 44 more rows

For programming and technical skills

Python, Java, R, Hadoop are must have. Some skills such as machine learning and cloud technlogy are important too but it could be platform specific (e.g. Google Machine learning)

FinanceAndInsurData_SoftSkills <- JobData %>%
  filter(Industry == "Finance" & `Skill Set` == "Soft Skill") %>%
  count(`Skill Type`, sort=TRUE)

(FinanceAndInsurData_SoftSkills)

## # A tibble: 12 x 2
##           `Skill Type`     n
##                  <chr> <int>
##  1  Good Communication     5
##  2       Collaboration     2
##  3               Agile     1
##  4 Attention To Detail     1
##  5           Curiosity     1
##  6 Deadline Management     1
##  7            Friendly     1
##  8     Good Work Ethic     1
##  9            Positive     1
## 10        Quantitative     1
## 11        Self-Starter     1
## 12  Work Independantly     1

For softskills

Communication and collaboration are the most important while personality such as Friendly, Positive, Curiosity, Detail are also considered.

ggplot(FinanceAndInsurData_SoftSkills, aes(x = FinanceAndInsurData_SoftSkills$`Skill Type`, y = FinanceAndInsurData_SoftSkills$n, fill = FinanceAndInsurData_SoftSkills$n)) + 
  geom_bar(stat = "identity") +
  xlab("Skills") + 
  ylab("Freq.") + 
  theme(legend.position = "none",  
        axis.text.x = element_text(angle = 65, hjust = 1)) +
  ggtitle("Soft Skills of Data Scientist in Finance/Insurance")

Conclusion

So for Finance industry, hard skills such as Python, Java, R, Hadoop are must Have. Be able to communicate and work together are most important soft skills.

Data607 Project 3

Yuen Chun Wong

October 22, 2017

Project 3 - Most Wanted skills in Finance and Insurance?

For programming and technical skills

For softskills

Conclusion