Week 3 Challenge

For my data, I chose to focus on the inflows and outflows of talent within countries’ labor forces. An inflow refers to people immigrating into a specified country to work, while an outflow refers to people leaving that country to work elsewhere. These movements were measured using LinkedIn’s location-based statistics.

Question: Which country had the highest ratio of inflows? Which country has the lowest?

Firstly, We will import the dataset and take a look at what it looks like.

library(readr)
inoutflow <- read_csv("inoutflow.csv")
## Rows: 284 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): country_region
## dbl (3): year, inflow_outflow_ratio, yoy_change
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(inoutflow)
## # A tibble: 6 × 4
##    year country_region inflow_outflow_ratio yoy_change
##   <dbl> <chr>                         <dbl>      <dbl>
## 1  2019 Algeria                        0.18    NA     
## 2  2020 Algeria                        0.34     0.929 
## 3  2021 Algeria                        0.36     0.0420
## 4  2022 Algeria                        0.33    -0.0678
## 5  2019 Argentina                      0.63    NA     
## 6  2020 Argentina                      0.57    -0.104

Next, we will start to edit this dataset. We will start by importing the dplyr package. Then, we will take a look at the columns that are important to this question.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
attach(inoutflow)
kable(head(inoutflow%>%select(year, country_region, inflow_outflow_ratio))) 
year country_region inflow_outflow_ratio
2019 Algeria 0.18
2020 Algeria 0.34
2021 Algeria 0.36
2022 Algeria 0.33
2019 Argentina 0.63
2020 Argentina 0.57

Now, let’s see, out of the four years of data taken for each country, what the highest inflows of immigration were for each country.

kable(head(inoutflow%>% group_by(country_region)%>%
        summarise(ratiomax= max(inflow_outflow_ratio))))
country_region ratiomax
Algeria 0.36
Argentina 0.63
Australia 2.23
Austria 1.48
Bahrain 1.08
Bangladesh 0.60

Now, let’s do this for the smallest amount of inflows (or largest outflows).

kable(head(inoutflow%>% group_by(country_region)%>%
        summarise(ratiomin= min(inflow_outflow_ratio))%>%
        arrange(ratiomin)))
country_region ratiomin
Venezuela 0.09
Tunisia 0.11
Morocco 0.15
Algeria 0.18
Lebanon 0.22
Sri Lanka 0.22

Before making a graph, let’s make the dataset smaller by only seeing the top 10 highest inflows.

top10 <- inoutflow %>%
  group_by(country_region) %>%
  summarise(max_ratio = max(inflow_outflow_ratio, na.rm = TRUE)) %>%
  arrange(desc(max_ratio)) %>%
  slice_head(n = 10)
kable(top10)
country_region max_ratio
Canada 2.50
Australia 2.23
Cyprus 2.04
Luxembourg 1.97
Estonia 1.83
Germany 1.83
Portugal 1.83
Switzerland 1.80
Spain 1.78
Qatar 1.65

We have all of the data that we need to answer the question. Here is a graph to show the maximum inflow of working immigrants into these nations between the years of 2019 and 2022.

library(ggplot2)

ggplot(top10, aes(x = country_region, y = max_ratio)) +
  geom_col(fill = "steelblue") +
  labs(
    title = "Top 10 Countries by Maximum Inflow/Outflow Ratio (2019–2022)",
    x = "Country",
    y = "Maximum Ratio",
    caption = "Daniel Gellis | Data Source: World Bank Group") +
  theme(
    text = element_text(family = "Times New Roman", size = 12)
  )

In conclusion, the highest inflow was Canada with a 2.5 ratio, and the smallest was 0.09 in Venezuela.