Week 3 Challenge
For my data, I chose to focus on the inflows and outflows of talent within countries’ labor forces. An inflow refers to people immigrating into a specified country to work, while an outflow refers to people leaving that country to work elsewhere. These movements were measured using LinkedIn’s location-based statistics.
Question: Which country had the highest ratio of inflows? Which country has the lowest?
Firstly, We will import the dataset and take a look at what it looks like.
library(readr)
inoutflow <- read_csv("inoutflow.csv")
## Rows: 284 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): country_region
## dbl (3): year, inflow_outflow_ratio, yoy_change
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(inoutflow)
## # A tibble: 6 × 4
## year country_region inflow_outflow_ratio yoy_change
## <dbl> <chr> <dbl> <dbl>
## 1 2019 Algeria 0.18 NA
## 2 2020 Algeria 0.34 0.929
## 3 2021 Algeria 0.36 0.0420
## 4 2022 Algeria 0.33 -0.0678
## 5 2019 Argentina 0.63 NA
## 6 2020 Argentina 0.57 -0.104
Next, we will start to edit this dataset. We will start by importing the dplyr package. Then, we will take a look at the columns that are important to this question.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
attach(inoutflow)
kable(head(inoutflow%>%select(year, country_region, inflow_outflow_ratio)))
year | country_region | inflow_outflow_ratio |
---|---|---|
2019 | Algeria | 0.18 |
2020 | Algeria | 0.34 |
2021 | Algeria | 0.36 |
2022 | Algeria | 0.33 |
2019 | Argentina | 0.63 |
2020 | Argentina | 0.57 |
Now, let’s see, out of the four years of data taken for each country, what the highest inflows of immigration were for each country.
kable(head(inoutflow%>% group_by(country_region)%>%
summarise(ratiomax= max(inflow_outflow_ratio))))
country_region | ratiomax |
---|---|
Algeria | 0.36 |
Argentina | 0.63 |
Australia | 2.23 |
Austria | 1.48 |
Bahrain | 1.08 |
Bangladesh | 0.60 |
Now, let’s do this for the smallest amount of inflows (or largest outflows).
kable(head(inoutflow%>% group_by(country_region)%>%
summarise(ratiomin= min(inflow_outflow_ratio))%>%
arrange(ratiomin)))
country_region | ratiomin |
---|---|
Venezuela | 0.09 |
Tunisia | 0.11 |
Morocco | 0.15 |
Algeria | 0.18 |
Lebanon | 0.22 |
Sri Lanka | 0.22 |
Before making a graph, let’s make the dataset smaller by only seeing the top 10 highest inflows.
top10 <- inoutflow %>%
group_by(country_region) %>%
summarise(max_ratio = max(inflow_outflow_ratio, na.rm = TRUE)) %>%
arrange(desc(max_ratio)) %>%
slice_head(n = 10)
kable(top10)
country_region | max_ratio |
---|---|
Canada | 2.50 |
Australia | 2.23 |
Cyprus | 2.04 |
Luxembourg | 1.97 |
Estonia | 1.83 |
Germany | 1.83 |
Portugal | 1.83 |
Switzerland | 1.80 |
Spain | 1.78 |
Qatar | 1.65 |
We have all of the data that we need to answer the question. Here is a graph to show the maximum inflow of working immigrants into these nations between the years of 2019 and 2022.
library(ggplot2)
ggplot(top10, aes(x = country_region, y = max_ratio)) +
geom_col(fill = "steelblue") +
labs(
title = "Top 10 Countries by Maximum Inflow/Outflow Ratio (2019–2022)",
x = "Country",
y = "Maximum Ratio",
caption = "Daniel Gellis | Data Source: World Bank Group") +
theme(
text = element_text(family = "Times New Roman", size = 12)
)
In conclusion, the highest inflow was Canada with a 2.5 ratio, and the smallest was 0.09 in Venezuela.