I. Introduction
I.1 Context
Since the dawn of humanity, people have always felt the need to move from one place to another (whether voluntarily or not) in search of better living conditions, fertile land, hunting opportunities, favorable climates for agriculture, or to escape conflicts. Today, migration remains highly relevant, driven by factors such as globalization, fewer border restrictions, technological advances in transportations, wars, family planning, population aging, corporate relocation etc.
“In 2005, there were approximately 191 million international migrants worldwide. Projections estimate this number will rise to 290 million by 2025, an increase of nearly 52% over two decades.”
United Nations, Department of Economic and Social Affairs, Population Division (2009). International Migration Report 2009: A Global Assessment
People migrate for various reasons, including economic factors (business opportunities, employment, company relocation, taxation, education), tourism (discovery, love, vacations, sports, entertainment), scholarships (universities), or as refugees and asylum seekers. The destination knowledge is essential for better integration and a successful stay.
I.2 Problem
With the rise of Big Data, the volumetry and variety (images, maps, audio, videos…) of available information’s through databases and social medias, make migration decisions more complex to be taken based on so various and unstructured data.
I.3 Our solution
To help simplify this process, I developed a Migration Recommender System using R and Shiny. It presents for each country/destination, all relevant information’s which a potential migrant needs in a single interactive row table (reactable), enriched with an AI assistant, designed to easy user self-service deepdive.
II. Data Preparation
We used a lot of data preparation methods (packages data, web scrapping, AI Data, YouTube, missing value imputation etc.) in this work. But we will not present them here as it is not the purpose of this contest. So we will just present and describe the final data sets used.
All the data needed are inside the folder “www/DATA”. let’s begin with .csv data files. These datasets together provide a rich multi-dimensional profile of countries, combining: Socio-demographic indicators (Literacy_Rate), Geographic & multimedia resources (Capital), Macroeconomic data (Macro1), Economic trends over time (growth_inflation). They are particularly suitable for applications country profiling,
II.1 Excel/csv data files
II.1.1 Literacy_Rate dataset
This dataset provides an overview of literacy and urbanization indicators by country.
| Column Name | Type | Description |
|---|---|---|
Country |
Character | Full name of the country or region. |
iso2c |
Character | Two-letter ISO code of the country. |
urban |
Numeric | Percentage of the population living in urban areas. |
rural |
Numeric | Percentage of the population living in rural areas. |
labor_force |
Numeric | Labor force participation rate (in % of total population). |
Literacy Rate |
Numeric | Proportion of the literate population. |
II.1.2 Capital dataset
This dataset contains geographical and media information about the capital cities of each country.
| Column Name | Type | Description |
|---|---|---|
country |
Character | Full country name. |
iso2c |
Character | Two-letter ISO code. |
iso3c |
Character | Three-letter ISO code. |
latitude |
Numeric | Latitude coordinate of the capital city. |
longitude |
Numeric | Longitude coordinate of the capital city. |
Places |
Character | Link (e.g., YouTube) showcasing the capital city. |
II.1.3 Macro1 dataset
This dataset contains macroeconomic indicators for each country for the year 2024.
| Column Name | Type | Description |
|---|---|---|
iso2c |
Character | Two-letter ISO code. |
country |
Character | Country name. |
year |
Numeric | Year of observation (2024). |
GDP |
Numeric | Gross Domestic Product (current US dollars). |
GDP_per_capita_PPP |
Numeric | GDP per capita, adjusted for Purchasing Power Parity. |
Unemployment |
Numeric | Unemployment rate (%). |
Exchange_rate |
Numeric | Exchange rate (local currency per USD). |
II.1.4 growth_inflation dataset
This dataset records historical macroeconomic trends (growth and inflation) for multiple years per country.
| Column Name | Type | Description |
|---|---|---|
country |
Character | Country name. |
iso2c |
Character | Two-letter ISO code. |
iso3c |
Character | Three-letter ISO code. |
year |
Numeric | Year of observation. |
GDP_GR |
Numeric | Annual GDP growth rate (%). |
Inflation |
Numeric | Inflation rate (%). |
II.2 .rds data files
II.2.1 resultdataset
A large dataset (204 columns) containing a wide range of international migration indicators for each country or region.
| Column Name | Type | Description |
|---|---|---|
Child migrants (<19) |
Numeric | Percentage or number of child migrants under the age of 19. |
International students (destination count) |
Numeric | Number of international students hosted by the country. |
Older migrants (>65) |
Numeric | Percentage or number of migrants over the age of 65. |
Share of female immigrants |
Numeric | Percentage of female migrants. |
Working age migrants (20-64) |
Numeric | Percentage or number of working-age migrants. |
Permanent migration inflows |
Numeric | Total number of permanent immigration inflows. |
Asylum seekers (host country/region) |
Numeric | Number of asylum seekers hosted in the country. |
Refugees (host country/region) |
Numeric | Number of refugees hosted in the country. |
Migration_to_<Country> |
Numeric | Bilateral migration inflows to specific countries (e.g., Migration_to_Australia). |
Contains demographic, social, and migration-related indicators. Allows analysis of migration composition by age, gender, and migrant type (e.g., students, refugees, asylum seekers). Supports bilateral migration flow analysis between countries.
II.2.2 pays_continentdataset
A simple reference dataset that maps each country to its corresponding continent along with its ISO 2-letter code.
| Column Name | Type | Description |
|---|---|---|
continent |
String | Name of the continent to which the country belongs (e.g., Africa, Asia). |
country |
String | Official country name. |
iso2c |
String | Two-letter ISO 3166-1 alpha-2 country code. |
II.2.3 pyramide_plotdataset
This dataset describes population pyramids by age and sex for different countries, including variables needed for visual plotting (population pyramid chart).
| Variable | Type | Description |
|---|---|---|
name |
chr |
Country name (e.g., Andorra). |
code |
chr |
ISO 2-letter country code (e.g., AD). |
AGE |
fct |
Age group categories in 5-year intervals (e.g., 0_4, 5_9, 10_14). |
SEX |
fct |
Sex of the population (Male or Female). |
population |
dbl |
Absolute number of people in this age/sex group. |
age_num |
dbl |
Numeric representation of the age group (1 for 0–4, 2 for 5–9, etc.). Useful for ordering or plotting. |
pop_abs |
dbl |
Same as population (redundant or prepared for further calculations). |
xmin, xmax |
dbl |
Minimum and maximum x-coordinates for the bar segment representing the population group (used for plotting the pyramid). |
ymin, ymax |
dbl |
Minimum and maximum y-coordinates for the bar segment. This suggests a rectangle-based plotting approach in ggplot or similar. |
flag_colors |
list |
Vector of colors derived from the country’s flag (e.g., 3 dominant colors). Used to color the pyramid bars. |
n_colors |
int |
Number of colors extracted from the flag. |
gradient |
list |
Gradient colors computed for the age/sex group (likely for smoother visualization). |
color |
chr |
Final color assigned to the segment (e.g., "#ff0000"). |