Introduction

This data was provided in collaboration with LinkedIn and Word Bank Group. The “Skill Penetration” metric looks at how many skills from each of LinkedIn’s skill groups apepar among the top 30 skills for each occupation in an industry. The penetration rates are averaged across occupations to derive the industry averages reported. There are over 100 countries captured in this study, which has data from 2015-2019.

This report will focus on the “Data Science” skill group, which belongs to the “Disruptive Tech Skills” skill group category. A further analysis will focus on determining if the spread of the skill in the Aerospace industry has increased over time, as the data science discipline becomes more widespread.

(Data Source)

Data Preparation

Loading the Dataset

skill_data <- read_csv("linkedin_skill_penetration.csv")
## Rows: 30740 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): skill_group_category, skill_group_name, isic_section_index, isic_se...
## dbl (2): year, skill_group_penetration_rate
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(skill_data)
## # A tibble: 6 × 7
##    year skill_group_category skill_group_name isic_section_index
##   <dbl> <chr>                <chr>            <chr>             
## 1  2015 Business Skills      Accounts Payable M                 
## 2  2015 Business Skills      Accounts Payable M                 
## 3  2015 Business Skills      Accounts Payable M                 
## 4  2015 Business Skills      Accounts Payable C                 
## 5  2015 Business Skills      Accounts Payable B                 
## 6  2015 Business Skills      Accounts Payable C                 
## # ℹ 3 more variables: isic_section_name <chr>, industry_name <chr>,
## #   skill_group_penetration_rate <dbl>

Subsetting the data and renaming columns

#subsetting data
subset <- skill_data %>%
  select(year, skill_group_category, skill_group_name, isic_section_name, industry_name, skill_group_penetration_rate)
names(subset)
## [1] "year"                         "skill_group_category"        
## [3] "skill_group_name"             "isic_section_name"           
## [5] "industry_name"                "skill_group_penetration_rate"
#column renaming
subset <- rename(subset, "skill_category" = "skill_group_category", "skill_group" = "skill_group_name", "sector" = "isic_section_name", "industry" = "industry_name", "skill_penetration_rate" = "skill_group_penetration_rate")
names(subset)
## [1] "year"                   "skill_category"         "skill_group"           
## [4] "sector"                 "industry"               "skill_penetration_rate"

Removing Missing Values

#remove NA values
subset <- subset %>%
  filter(complete.cases(subset))
head(subset)
## # A tibble: 6 × 6
##    year skill_category  skill_group      sector  industry skill_penetration_rate
##   <dbl> <chr>           <chr>            <chr>   <chr>                     <dbl>
## 1  2015 Business Skills Accounts Payable Profes… Account…                0.00719
## 2  2015 Business Skills Accounts Payable Profes… Law Pra…                0.00244
## 3  2015 Business Skills Accounts Payable Profes… Executi…                0.00222
## 4  2015 Business Skills Accounts Payable Manufa… Packagi…                0.00132
## 5  2015 Business Skills Accounts Payable Mining… Oil & E…                0.00132
## 6  2015 Business Skills Accounts Payable Manufa… Printing                0.00128

Data Exploration

Data Science Skill Penetration by Sector

On average, from 2015-2019, the penetration rate for data science skills have been highest in the Financial and Insurance sector, and lowest in the Arts, Entertainment, and Recreation sector.

attach(subset)
subset %>% 
  filter(skill_group == "Data Science") %>%
  group_by(sector) %>%
  summarize(average_penetration_rate = mean(skill_penetration_rate), na.rm = TRUE)
## # A tibble: 6 × 3
##   sector                                           average_penetration_r…¹ na.rm
##   <chr>                                                              <dbl> <lgl>
## 1 Arts, entertainment and recreation                               0.00210 TRUE 
## 2 Financial and insurance activities                               0.00958 TRUE 
## 3 Information and communication                                    0.00644 TRUE 
## 4 Manufacturing                                                    0.00262 TRUE 
## 5 Mining and quarrying                                             0.00325 TRUE 
## 6 Professional scientific and technical activities                 0.00638 TRUE 
## # ℹ abbreviated name: ¹​average_penetration_rate

Data Science Skill Penetration within Manufacturing sector

On average, from 2015-2019, within the Manufacturing sector there has been the highest penetration rate of data science skills in the Pharmaceuticals industry and the lowest within Industrial Automation and Packaging & Containers.

data_science <- subset %>%
  filter(skill_group == "Data Science") %>%
  filter(sector == "Manufacturing")

data_science %>%
  group_by(industry) %>%
  summarize(average_penetration_rate = mean(skill_penetration_rate), na.rm = TRUE)
## # A tibble: 14 × 3
##    industry                              average_penetration_rate na.rm
##    <chr>                                                    <dbl> <lgl>
##  1 Automotive                                            0.00392  TRUE 
##  2 Aviation & Aerospace                                  0.00358  TRUE 
##  3 Chemicals                                             0.00294  TRUE 
##  4 Electrical & Electronic Manufacturing                 0.00343  TRUE 
##  5 Food Production                                       0.0023   TRUE 
##  6 Industrial Automation                                 0.00139  TRUE 
##  7 Machinery                                             0.00151  TRUE 
##  8 Packaging & Containers                                0.00139  TRUE 
##  9 Paper & Forest Products                               0.00133  TRUE 
## 10 Pharmaceuticals                                       0.00728  TRUE 
## 11 Plastics                                              0.00075  TRUE 
## 12 Printing                                              0.000476 TRUE 
## 13 Renewables & Environment                              0.00313  TRUE 
## 14 Textiles                                              0.00332  TRUE
ds_by_industry <- ggplot(data_science, aes(y = industry, x = skill_penetration_rate)) + geom_boxplot(color = "purple") + labs(title = "Data Science Skill Pentration Rate in \n Manufacturing Sector", x = "Average Penetration Rate", y = "Industries", caption = "Fatima Choudhury | Data Source (World Bank | Linkedin)")

ds_by_industry + theme_tufte()

Data Science Skill Penetration in Aerospace and Aviation: 2015-2019

There is a positive trend associated with Data Science Penetration Rate over Time within the Aerospace and Aviation Industry, with the average penetration rate nearly doubling between 2018 (0.00419) and 2019 (0.00728).

ds_aerospace <- data_science %>%
  filter(industry == "Aviation & Aerospace") %>%
  select(year, skill_penetration_rate)

ds_aerospace
## # A tibble: 5 × 2
##    year skill_penetration_rate
##   <dbl>                  <dbl>
## 1  2015                0.00107
## 2  2016                0.00153
## 3  2017                0.00383
## 4  2018                0.00419
## 5  2019                0.00728
ds_aerospace_plot <- ggplot(ds_aerospace, aes(year, skill_penetration_rate)) + geom_point(color = "purple") + labs(title = "Data Science Skill Pentration Rate in \n Aerospace and Avation: 2015-2019", x = "Year", y = "Average Penetration Rate", caption = "Fatima Choudhury | Data Source (World Bank | Linkedin)")

ds_aerospace_plot + theme_tufte()