Data Source: “Talent Migration Data” by World Bank Group & LinkedIn Corporation

Questions:

For the purpose of this report I am displaying the setup code below. In my setup code I am:

knitr::opts_chunk$set(echo = TRUE)
library(readr)
library(dplyr)
library(tidyverse)
industry <- read_csv("industry_migration_public.csv")

Step 1: I am going to format my data to make it usable.

First I am going to select only a subset of variables that I am interested in including in my analysis.

To do this I am going to start off by looking at all the variables in the industries data set:

print(names(industry))
##  [1] "country_code"       "country_name"       "wb_income"         
##  [4] "wb_region"          "isic_section_index" "isic_section_name" 
##  [7] "industry_id"        "industry_name"      "net_per_10K_2015"  
## [10] "net_per_10K_2016"   "net_per_10K_2017"   "net_per_10K_2018"  
## [13] "net_per_10K_2019"   "...14"              "...15"             
## [16] "...16"              "...17"              "...18"             
## [19] "...19"              "...20"              "...21"             
## [22] "...22"              "...23"              "...24"             
## [25] "...25"              "...26"              "...27"             
## [28] "...28"              "...29"              "...30"

I am going to create a new data frame ‘industry_analysis’ in which I only select the variables that I want to include in my analysis

industry_analysis <- data.frame(industry$country_name, industry$wb_region,industry$isic_section_name,industry$industry_name, industry$net_per_10K_2015, industry$net_per_10K_2016, industry$net_per_10K_2017, industry$net_per_10K_2018, industry$net_per_10K_2019)

Next I am going to rename the variables.

names(industry_analysis) <- c("country_name","region","sector_name","industry_name","net2015","net2016","net2017","net2018","net2019")

Finally I am going to add a variable to my new data frame in which I calculate the total net migration for periods 2015 to 2019. I will call this variable ‘net_total’

industry_analysis$net_total <- industry_analysis$net2015 + industry_analysis$net2016 + industry_analysis$net2017 + industry_analysis$net2018 + industry_analysis$net2019

Now I am going to look at the structure of my data set by displaying the first 10 rows of data in a table.

knitr::kable(
  head(industry_analysis, n=10),
  label = "First 10 rows of data in my industries analysis data frame"
)
country_name region sector_name industry_name net2015 net2016 net2017 net2018 net2019 net_total
United Arab Emirates Middle East & North Africa Manufacturing Defense & Space 378.74 127.94 8.20 68.51 49.55 632.94
United Arab Emirates Middle East & North Africa Information and communication Computer Hardware 100.97 358.14 112.98 149.57 182.22 903.88
United Arab Emirates Middle East & North Africa Information and communication Computer Software 1079.36 848.15 596.48 409.18 407.41 3340.58
United Arab Emirates Middle East & North Africa Information and communication Computer Networking 401.46 447.39 163.99 236.69 188.07 1437.60
United Arab Emirates Middle East & North Africa Information and communication Internet 1840.33 1368.42 877.71 852.39 519.40 5458.25
United Arab Emirates Middle East & North Africa Information and communication Telecommunications 676.91 676.11 365.96 283.59 281.91 2284.48
United Arab Emirates Middle East & North Africa Professional scientific and technical activities Law Practice 694.50 529.50 362.83 356.56 147.83 2091.22
United Arab Emirates Middle East & North Africa Professional scientific and technical activities Legal Services 661.10 665.65 462.30 475.62 255.21 2519.88
United Arab Emirates Middle East & North Africa Professional scientific and technical activities Management Consulting 982.85 694.25 520.28 574.60 406.03 3178.01
United Arab Emirates Middle East & North Africa Professional scientific and technical activities Biotechnology 1230.17 846.28 500.77 205.78 513.23 3296.23

Step 2: Data Analysis

Now that I have the data in a format that is usable I can start with my data analysis to answer my questions.

Which industries experienced the biggest net migration in talent for the periods from 2015 to 2019?

industry_rank <- industry_analysis %>% group_by(industry_name) %>% summarise(sum_net_total = sum(net_total,na.rm = TRUE), .groups = "drop")

industry_rank_ordered <- industry_rank[with(industry_rank, order(-industry_rank$sum_net_total)),]

knitr::kable(
  head(industry_rank_ordered,n=5),
  label = "Top 5 industries with highest net migration from 2015 - 2019"
)
industry_name sum_net_total
Internet 46852.93
International Trade & Development 30233.38
Renewables & Environment 29982.72
Outsourcing/Offshoring 29527.62
E-Learning 29067.23

Which region experienced the biggest net increase in talent migration in the Information and communication sector for the periods from 2015 to 2019?

industry_region_df <- subset(industry_analysis,sector_name == "Information and communication")

region_it_rank <- industry_region_df %>% group_by(region, sector_name) %>% summarise (sum_net_total = sum(net_total, na.rm = TRUE), .groups = "drop")

region_it_rank_ordered <- region_it_rank[with(region_it_rank, order(-region_it_rank$sum_net_total)),]

knitr::kable(
  region_it_rank_ordered,
  label = "Regions with most to least talent migration in Information & Communications Sector"
)
region sector_name sum_net_total
Europe & Central Asia Information and communication 105249.34
East Asia & Pacific Information and communication 56354.44
Middle East & North Africa Information and communication 27855.85
North America Information and communication 17922.41
Sub-Saharan Africa Information and communication 3454.66
South Asia Information and communication -20810.75
Latin America & Caribbean Information and communication -28691.58

As an additional step I also want to know how many countries are included in each region and I want to add that count as a variable to my results.

region_country <- as.data.frame(unique(industry_region_df[,1:2]))
region_country_count <- region_country %>% group_by(region) %>% count(region)
result_df <- merge(region_it_rank_ordered,region_country_count,by="region")
result_df <- result_df[with(result_df,order(-result_df$sum_net_total)),]
names(result_df)[4] <- "num_countries"
knitr::kable(
  result_df,
  label = "Regions (including country count) with most to least talent migration in Information & Communications"
)
region sector_name sum_net_total num_countries
2 Europe & Central Asia Information and communication 105249.34 42
1 East Asia & Pacific Information and communication 56354.44 15
4 Middle East & North Africa Information and communication 27855.85 18
5 North America Information and communication 17922.41 2
7 Sub-Saharan Africa Information and communication 3454.66 18
6 South Asia Information and communication -20810.75 6
3 Latin America & Caribbean Information and communication -28691.58 18

Step 3: Charting

Lastly I want to chart the top 20 and bottom 20 industries to visually display the results:

t20_industries <- head(industry_rank_ordered,n=20)
t20_industries %>% mutate(industry_name = fct_reorder(industry_name,sum_net_total)) %>%
ggplot( aes(industry_name,sum_net_total)) + geom_bar(stat="identity", fill="#44AD50") + coord_flip() + xlab("Industry") + ylab("Net Movement (per 50k) 2015 - 2019")

b20_industries <- tail(industry_rank_ordered,n=20)
b20_industries %>% mutate(industry_name = fct_reorder(industry_name,sum_net_total)) %>%
ggplot( aes(industry_name,sum_net_total)) + geom_bar(stat="identity", fill="#446EAD") + coord_flip() + xlab("Industry") + ylab("Net Movement (per 50k) 2015 - 2019")