This data was provided in collaboration with LinkedIn and Word Bank Group. The “Skill Penetration” metric looks at how many skills from each of LinkedIn’s skill groups apepar among the top 30 skills for each occupation in an industry. The penetration rates are averaged across occupations to derive the industry averages reported. There are over 100 countries captured in this study, which has data from 2015-2019.
This report will focus on the “Data Science” skill group, which belongs to the “Disruptive Tech Skills” skill group category. A further analysis will focus on determining if the spread of the skill in the Aerospace industry has increased over time, as the data science discipline becomes more widespread.
skill_data <- read_csv("linkedin_skill_penetration.csv")
## Rows: 30740 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): skill_group_category, skill_group_name, isic_section_index, isic_se...
## dbl (2): year, skill_group_penetration_rate
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(skill_data)
## # A tibble: 6 × 7
## year skill_group_category skill_group_name isic_section_index
## <dbl> <chr> <chr> <chr>
## 1 2015 Business Skills Accounts Payable M
## 2 2015 Business Skills Accounts Payable M
## 3 2015 Business Skills Accounts Payable M
## 4 2015 Business Skills Accounts Payable C
## 5 2015 Business Skills Accounts Payable B
## 6 2015 Business Skills Accounts Payable C
## # ℹ 3 more variables: isic_section_name <chr>, industry_name <chr>,
## # skill_group_penetration_rate <dbl>
#subsetting data
subset <- skill_data %>%
select(year, skill_group_category, skill_group_name, isic_section_name, industry_name, skill_group_penetration_rate)
names(subset)
## [1] "year" "skill_group_category"
## [3] "skill_group_name" "isic_section_name"
## [5] "industry_name" "skill_group_penetration_rate"
#column renaming
subset <- rename(subset, "skill_category" = "skill_group_category", "skill_group" = "skill_group_name", "sector" = "isic_section_name", "industry" = "industry_name", "skill_penetration_rate" = "skill_group_penetration_rate")
names(subset)
## [1] "year" "skill_category" "skill_group"
## [4] "sector" "industry" "skill_penetration_rate"
#remove NA values
subset <- subset %>%
filter(complete.cases(subset))
head(subset)
## # A tibble: 6 × 6
## year skill_category skill_group sector industry skill_penetration_rate
## <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 2015 Business Skills Accounts Payable Profes… Account… 0.00719
## 2 2015 Business Skills Accounts Payable Profes… Law Pra… 0.00244
## 3 2015 Business Skills Accounts Payable Profes… Executi… 0.00222
## 4 2015 Business Skills Accounts Payable Manufa… Packagi… 0.00132
## 5 2015 Business Skills Accounts Payable Mining… Oil & E… 0.00132
## 6 2015 Business Skills Accounts Payable Manufa… Printing 0.00128
On average, from 2015-2019, the penetration rate for data science skills have been highest in the Financial and Insurance sector, and lowest in the Arts, Entertainment, and Recreation sector.
attach(subset)
subset %>%
filter(skill_group == "Data Science") %>%
group_by(sector) %>%
summarize(average_penetration_rate = mean(skill_penetration_rate), na.rm = TRUE)
## # A tibble: 6 × 3
## sector average_penetration_r…¹ na.rm
## <chr> <dbl> <lgl>
## 1 Arts, entertainment and recreation 0.00210 TRUE
## 2 Financial and insurance activities 0.00958 TRUE
## 3 Information and communication 0.00644 TRUE
## 4 Manufacturing 0.00262 TRUE
## 5 Mining and quarrying 0.00325 TRUE
## 6 Professional scientific and technical activities 0.00638 TRUE
## # ℹ abbreviated name: ¹average_penetration_rate
On average, from 2015-2019, within the Manufacturing sector there has been the highest penetration rate of data science skills in the Pharmaceuticals industry and the lowest within Industrial Automation and Packaging & Containers.
data_science <- subset %>%
filter(skill_group == "Data Science") %>%
filter(sector == "Manufacturing")
data_science %>%
group_by(industry) %>%
summarize(average_penetration_rate = mean(skill_penetration_rate), na.rm = TRUE)
## # A tibble: 14 × 3
## industry average_penetration_rate na.rm
## <chr> <dbl> <lgl>
## 1 Automotive 0.00392 TRUE
## 2 Aviation & Aerospace 0.00358 TRUE
## 3 Chemicals 0.00294 TRUE
## 4 Electrical & Electronic Manufacturing 0.00343 TRUE
## 5 Food Production 0.0023 TRUE
## 6 Industrial Automation 0.00139 TRUE
## 7 Machinery 0.00151 TRUE
## 8 Packaging & Containers 0.00139 TRUE
## 9 Paper & Forest Products 0.00133 TRUE
## 10 Pharmaceuticals 0.00728 TRUE
## 11 Plastics 0.00075 TRUE
## 12 Printing 0.000476 TRUE
## 13 Renewables & Environment 0.00313 TRUE
## 14 Textiles 0.00332 TRUE
ds_by_industry <- ggplot(data_science, aes(y = industry, x = skill_penetration_rate)) + geom_boxplot(color = "purple") + labs(title = "Data Science Skill Pentration Rate in \n Manufacturing Sector", x = "Average Penetration Rate", y = "Industries", caption = "Fatima Choudhury | Data Source (World Bank | Linkedin)")
ds_by_industry + theme_tufte()
There is a positive trend associated with Data Science Penetration Rate over Time within the Aerospace and Aviation Industry, with the average penetration rate nearly doubling between 2018 (0.00419) and 2019 (0.00728).
ds_aerospace <- data_science %>%
filter(industry == "Aviation & Aerospace") %>%
select(year, skill_penetration_rate)
ds_aerospace
## # A tibble: 5 × 2
## year skill_penetration_rate
## <dbl> <dbl>
## 1 2015 0.00107
## 2 2016 0.00153
## 3 2017 0.00383
## 4 2018 0.00419
## 5 2019 0.00728
ds_aerospace_plot <- ggplot(ds_aerospace, aes(year, skill_penetration_rate)) + geom_point(color = "purple") + labs(title = "Data Science Skill Pentration Rate in \n Aerospace and Avation: 2015-2019", x = "Year", y = "Average Penetration Rate", caption = "Fatima Choudhury | Data Source (World Bank | Linkedin)")
ds_aerospace_plot + theme_tufte()