Required packages:
library("colorspace")
library("sf")
library("spData")
library("spdep")
library("spatialreg")
library("tidyverse")
library(dplyr)
library(ggplot2)
library(gridExtra)
library("mapsf")
library(corrplot)
Function to plot Moran scatterplot:
source("https://raw.githubusercontent.com/tibo31/spatial_eco/refs/heads/master/moran_scatter_plot.R")
The analysis is based on the migration flow data provided by Abel and Cohen (2019), who use six different algorithms to estimate bilateral flows from changes in the migrant stocks in all countries. The different methods of estimation are compared in Berlemann et al. (2021) and the data can be loaded as follows :
load("mig_data.RData")
The data contains the aggregate number of estimated migration by
origin (variable origin) and destination (variable
dest) countries in the time intervals 1990-1995, 1995-2000,
2005-2010, 2010-2015, 2015-2020, 2020-2024 (variable
period). The migration estimates variables are :
drop_neg : Drop Negative
reverse_neg: Reverse Negative
mig_rate: Migration Rates
min_open: Minimization open
min_close: Minimization closed
pb: Pseudo-Bayesian closed
In this project, we will consider the period “2020-2024” as a reference period. However, at the end of part 2, we will see if we obtain similar results by changing the period of reference.
Reverse Negative Method: Abel and Cohen (2019)
The Reverse Negative method addresses the issue that bilateral migration flows estimated from changes in migrant stocks can sometimes be negative, a result that is not plausible when interpreting flows as actual migrant movements. In this approach, any negative change in the migrant stock between two periods is “reversed” and re‐allocated as a positive flow in the opposite direction.
Principle of the Method:
Assume that for a given pair of countries \((i, j)\) we have bilateral migrant stock data measured at time \(t\) and \(t+1\), denoted by \(s_{ij}^t\) and \(s_{ij}^{t+1}\) respectively. The change in the stock is defined as:
\[ \Delta s_{ij} = s_{ij}^{t+1} - s_{ij}^{t} \]
If \(\Delta s_{ij} \ge 0\), then the estimated migration flow from origin \(i\) to destination \(j\) is given by:
\[y_{ij} = \Delta s_{ij} = s_{ij}^{t+1} - s_{ij}^{t}\]
If \(\Delta s_{ij} < 0\), instead of interpreting the negative difference as a loss of migrants from \(i\) to \(j\), the method treats it as a positive flow from \(j\) to \(i\). In this case, we set:
\[y_{ij} = 0\]
\[y_{ji} = -\Delta s_{ij} = s_{ij}^{t} -
s_{ij}^{t+1}\]
Thus, the bilateral migration flows are defined as:
\[ y_{ij} = \begin{cases} s_{ij}^{t+1} - s_{ij}^{t}, & \text{if } s_{ij}^{t+1} - s_{ij}^{t} > 0 \text{ and } s_{ji}^{t+1} - s_{ji}^{t} \ge 0 \text{ and } i \neq j,\\ \bigl[s_{ij}^{t+1} - s_{ij}^{t}\bigr] + \bigl[s_{ji}^{t+1} - s_{ji}^{t}\bigr], & \text{if } s_{ij}^{t+1} - s_{ij}^{t} > 0 \text{ and } s_{ji}^{t+1} - s_{ji}^{t} < 0 \text{ and } i \neq j,\\ \bigl[s_{ij}^{t+1} - s_{ij}^{t}\bigr] - \bigl[s_{ji}^{t+1} - s_{ji}^{t}\bigr], & \text{if } s_{ij}^{t+1} - s_{ij}^{t} \le 0 \text{ and } s_{ji}^{t+1} - s_{ji}^{t} < 0 \text{ and } i \neq j,\\ 0, & \text{if } s_{ij}^{t+1} - s_{ij}^{t} \le 0 \text{ and } s_{ji}^{t+1} - s_{ji}^{t} \ge 0 \text{ or } i = j, \end{cases} \]
In other words, if the stock difference \(s_{ij}^{t+1} - s_{ij}^{t}\) is negative (indicating a loss from \(i\) to \(j\)), it is interpreted as a flow in the opposite direction. This approach reinterprets negative changes in the migrant stock as reverse flows, preserving the information contained in the stock differences while ensuring that all estimated flows remain non-negative.
The image below displays the largest 0.2% of flows between countries for the reverse negative method. In the flow map, the left bar for each country indicates the volume of outgoing flows, while the right bar shows the volume of incoming flows.
The map (using the Reverse Negative approach) reflects a distinctive reallocation of migration flows compared to methods that drop or otherwise adjust negative differences in migrant stocks. Under this method, any negative stock difference (i.e., an apparent decrease in the number of migrants) is reversed, thereby converting what would otherwise be a loss of migrants into an inflow. This procedure offers a more balanced perspective on migration by preventing negative values from simply being discarded.
Outgoing Flows (Left Bar)
Under the Reverse Negative method, the left bars (outgoing flows) are typically lower than or at least somewhat reduced relative to methods that drop negative values. Because negative differences are treated as inflows in the opposite direction, the total outflow from a given country may appear smaller than in other approaches that simply ignore or zero out these negative values.
Incoming Flows (Right Bar)
Conversely, the right bars (incoming flows) tend to be higher. Reassigning negative differences as inflows boosts the total immigration count for some countries, resulting in a more symmetrical representation of migration corridors. In many cases, countries that would have appeared as major senders under other methods exhibit increased incoming flows here.
Comparison with Other Estimation Methods
When examining the Reverse Negative map alongside the other five approaches (Drop Negative, Migration Rates, Minimization Open, Minimization Closed, and Pseudo-Bayesian Closed), several important similarities and differences emerge:
Overall Flow Patterns Despite the differences in how negative values are handled, the major corridors of global migration remain consistent. Large flows tend to connect the same origin and destination countries across all methods.
Treatment of Negative Differences
Correlation between Estimators A correlation analysis reveals moderate to high correlations among all methods for most origin-destination pairs, given that major migration patterns remain the same. However, the Reverse Negative approach systematically reallocates negative differences, often resulting in notably higher inflows for certain countries, especially those with volatile or uncertain stock data; than methods that drop negatives.
Extraction of Largest Flows Focusing on the top 0.2% of flows highlights how:
In summary, the Reverse Negative method provides a more symmetrical allocation of migration flows, offering an alternative perspective on the distribution of global migration. While it shares many overall patterns with other approaches, it stands out by reallocating what would otherwise be discarded negative differences into positive inflows. This nuanced approach is especially useful for investigating how methodological choices can alter our understanding of bilateral migration flows.
# Import the world contour boundaries from the GeoJSON file
world_contours <- st_read("CCMN_GeoDATA.geojson/CCMN_GeoDATA.geojson")
## Reading layer `CCMN_GeoDATA' from data source
## `C:\Users\coral\Dropbox\projet_spatial\CCMN_GeoDATA.geojson\CCMN_GeoDATA.geojson'
## using driver `GeoJSON'
## Simple feature collection with 262 features and 36 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.62372
## Geodetic CRS: WGS 84
# Transform the coordinate reference system to 'ESRI:54030'
world_contours_transformed <- st_transform(world_contours, crs = "ESRI:54030")
Most explanatory variables at country level (object
my_covariates) are available in the file
covariates.RData. This data is taken from Laurent et
al. (2022) who study international remittances flows. The file combines
information from several data sets that were originally disseminated by
different government agencies or international organizations (United
Nations, World Bank, CIA, etc.).
load("covariates.RData")
Among these variables, we will consider:
deflactor: GDP deflator (measure of inflation, World
Bank)
lifeexp: life expectancy (World Bank)
natural_disaster: total number of people affected by
a climate disaster (https://public.emdat.be/data)
population: size of the population (World
Bank)
politicalstability: measure of political stability
(World Bank)
conflictpercapita: number of conflict events per
capita (ACLED)
vulnerability: Measures a country’s exposure,
sensitivity and capacity to adapt to the negative effects of climate
change (https://gain-new.crc.nd.edu/about/download)
t2m_diff: average annual difference temperature (in
Celsius)
prec_5days: average heavy precipitation for 5
consecutive days (in mm) during one year
hwf_upp: average number of heat wave days during one
year
dry: average number of dry days during one
year
ghwr_35: average number of days above 35 degrees
during one year
Propose at least one additional variable that could explain the emigration/immigration rate by country. If possible, give a link to the data source or a reference to an article that uses this information in the context of migration. Import this variable and merge it to the existing explanatory variables. In the set of explanatory variables presented below, we selected the reference year 1992 for the period “1990-1995,” the year 1997 for the period “1995-2000,” and so on.
We can extend the analysis by incorporating an additional variable: the unemployment rate which could help explain migration rates by country. For instance, high unemployment in a country can increase the propensity to emigrate. A reliable data source is the World Bank (World Bank Unemployment Data).
Below is the code to import the unemployment data from the CSV file (using the World Bank file) and extract the unemployment rate for the reference year. In many World Bank CSV files, the first few rows are metadata, so we skip those rows.
# Import the CSV file containing unemployment data.
unemp_raw <- read_csv("word_bank/unemployment_rate.csv", skip = 4) %>%
select(-`Indicator Name`, -`Indicator Code`, -`...69`)
## New names:
## Rows: 266 Columns: 69
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (4): Country Name, Country Code, Indicator Name, Indicator Code dbl (33): 1991,
## 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, ... lgl (32): 1960,
## 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...69`
head(unemp_raw)
For now we have one column per year with the corresponding unemployment rate for each country. We would like to have instead a column period and a column giving the associated employment rate. This will help reducing the number of columns. We keep the same logic that was applied for the other variables, selecting for instance 1992 as the reference year for the 1990-1995 period.
# Selecting the reference years
reference_years <- c(1992, 1997, 2002, 2007, 2012, 2017, 2022)
periods <- c("1990-1995", "1995-2000", "2000-2005", "2005-2010", "2010-2015", "2015-2020", "2020-2025")
# Filtering only the necessary columns
unemp_selected <- unemp_raw %>%
select(`Country Name`, `Country Code`, all_of(as.character(reference_years))) %>%
pivot_longer(cols = all_of(as.character(reference_years)),
names_to = "Year", values_to = "unemployment_rate") %>%
mutate(Year = as.integer(Year),
period = case_when(
Year == 1992 ~ "1990-1995",
Year == 1997 ~ "1995-2000",
Year == 2002 ~ "2000-2005",
Year == 2007 ~ "2005-2010",
Year == 2012 ~ "2010-2015",
Year == 2017 ~ "2015-2020",
Year == 2022 ~ "2020-2024",
TRUE ~ NA_character_
)) %>%
select(`Country Code`, period, `unemployment_rate`)
# Checking the result
head(unemp_selected)
We can now add this variable to our previous dataset.
#First we rename the columns used for the join to ensure consistency:
my_covariates_renamed <- my_covariates %>%
rename(`Country Code` = ISO3)
final_covariates <- left_join(my_covariates_renamed, unemp_selected, by = c("Country Code", "period"))
#We verify the join
head(final_covariates)
We can that we have some missing values. We can impute them using the
missForest package:
require("missForest")
## Le chargement a nécessité le package : missForest
nom_vars <- c("deflactor", "lifeexp", "population", "politicalstability", "conflictpercapita", "vulnerability", "natural_disaster", "t2m_diff", "dry", "prec_5days", "ghwr_35", "hwf_upp", "unemployment_rate")
final_covariates[, nom_vars] <- missForest(final_covariates[, nom_vars])$ximp
We filter the explanatory variables for the last period:
my_covariates_period <- final_covariates[final_covariates$period == "2020-2024", ]
head(my_covariates_period)
In this part you should explain the emigration (out-migration) and immigration (in-migration) rates by country. The immigration rate measures the overall number of migrants arriving in a country relative to the size of the local population. Similarly, the emigration rate measure the number of migrants the leaving a country relative to the population.
You should first derive the total in and out flows by country. This is easily achieved by aggregating the migration flow data first by origin, respectively by destination country.
First, we need to create the emigration and immigration aggreagation measure. As decided in part 1, we use the reverse negative measure.
#Select the 2020-2024 period for the mig_data df
migration <- mig_data[mig_data$period == "2020-2024", ]
# 1. Aggregating emigration (outflows) by origin country
emigration_data <- migration %>%
group_by(origin) %>%
summarise(total_out = sum(reverse_neg))
# 2. Aggregating immigration (inflows) by destination country
immigration_data <- migration %>%
group_by(dest) %>%
summarise(total_in = sum(reverse_neg))
# 3. Merge emigration and immigration data on country code
migration_summary <- full_join(emigration_data, immigration_data,
by = c("origin" = "dest")) %>%
rename("Country Code" = origin) # Rename origin to Country
# View the final summary
print(migration_summary)
## # A tibble: 232 × 3
## `Country Code` total_out total_in
## <chr> <dbl> <dbl>
## 1 ABW 15174 20032
## 2 AFG 1850994 5163
## 3 AGO 87223 9029
## 4 AIA 225 274
## 5 ALB 142879 1201
## 6 AND 259 4199
## 7 ARE 14779 1012585
## 8 ARG 236885 136570
## 9 ARM 32703 140952
## 10 ASM 997 578
## # ℹ 222 more rows
Merge the total in and out flows to the country-level explanatory variables and also combine it with the contours data. Represent the Top 10 countries sending the most migrants and the Top 10 countries receiving the most migrants. The common key is the ISO3 variable. You may obtain something like that (in this example, migration data is based on the reverse negative method):
First, we join the data containing information about the migration with the one used above containing the covariates.
# Merging emigration and immigration data with country-level explanatory variables (my_covariates_period)
final_df <- left_join(my_covariates_period, migration_summary, by = "Country Code")
# Select relevant columns from world_contours_transformed
world_contours_select <- world_contours_transformed %>%
select(ISO3, VISUALIZATION_NAME, geometry)
# Join final_df with world_contours_select by Country Code and ISO3
world <- left_join(final_df, world_contours_select, by = c("Country Code" = "ISO3"))
# Sorting the data to get the top 10 countries by immigration rate
top_immigration <- world %>%
arrange(desc(total_in)) %>%
select(VISUALIZATION_NAME, total_in) %>%
head(10)
top_emigration <- world %>%
arrange(desc(total_out)) %>%
select(VISUALIZATION_NAME, total_out) %>%
head(10)
head(world)
We can now visualize the results.
# View the Top 10 countries for immigration
print("Top 10 Countries Receiving Most Migrants (Immigration):")
## [1] "Top 10 Countries Receiving Most Migrants (Immigration):"
print(top_immigration)
## VISUALIZATION_NAME total_in
## 1 United States of America 5005300
## 2 Germany 2971601
## 3 Spain 1753212
## 4 United Kingdom of Great Britain and Northern Ireland 1396195
## 5 Russian Federation 1225108
## 6 Poland 1111392
## 7 Colombia 1055203
## 8 Iran 1037054
## 9 United Arab Emirates 1012585
## 10 Saudi Arabia 1010146
# View the Top 10 countries for emigration
print("Top 10 Countries Sending Most Migrants (Emigration):")
## [1] "Top 10 Countries Sending Most Migrants (Emigration):"
print(top_emigration)
## VISUALIZATION_NAME total_out
## 1 Ukraine 5640289
## 2 Venezuela 2542393
## 3 India 2429451
## 4 Afghanistan 1850994
## 5 Sudan 1536173
## 6 Russian Federation 1411302
## 7 China 1059601
## 8 Türkiye 978519
## 9 Bangladesh 961309
## 10 Pakistan 842749
# Scatter plot for top 10 countries with highest emigration rates
ggplot(top_emigration, aes(x = reorder(VISUALIZATION_NAME, total_out), y = total_out)) +
geom_point(shape = 1) +
coord_flip() +
labs(title = "Top Outflow",
x = "Country",
y = "Emigration") +
theme_minimal()
# Scatter plot for top 10 countries with highest immigration rates
ggplot(top_immigration, aes(x = reorder(VISUALIZATION_NAME, total_in), y = total_in)) +
geom_point(shape = 1) +
coord_flip() +
labs(title = "Top Inflow",
x = "Country",
y = "Immigration") +
theme_minimal()
Then, we can calculate immigration and emigration rates relatively to the population.
The formula are:
\[ \text{Emigration Rate} = \frac{\text{Total Emigrants}}{\text{Population}} \] and,
\[ \text{Immigration Rate} = \frac{\text{Total Immigrants}}{\text{Population}} \]
We also create the net variable which accounts for the
difference between immigrates and
emigrates.
# Calculate immigration, emigration rates and net migration
world_final <- world %>%
mutate(
emigrates = total_out / population, # Outflows per capita
immigrates = total_in / population, # Inflows per capita
net = immigrates - emigrates # Net migration per capita
)
# Sorting the data to get the top 10 countries by immigration rate
top_immigration_rate <- world_final %>%
arrange(desc(immigrates)) %>%
select(VISUALIZATION_NAME, immigrates) %>%
head(10)
# Sorting the data to get the top 10 countries by emigration rate
top_emigration_rate <- world_final %>%
arrange(desc(emigrates)) %>%
select(VISUALIZATION_NAME, emigrates) %>%
head(10)
# Sorting the data to get the top 10 countries by net migration rate
top_net_migration <- world_final %>%
arrange(desc(net)) %>%
select(VISUALIZATION_NAME, net) %>%
head(10)
We can visualize the results:
# Scatter plot for top 10 countries with highest emigration rates
ggplot(top_emigration_rate, aes(x = reorder(VISUALIZATION_NAME, emigrates), y = emigrates)) +
geom_point(shape = 1) +
coord_flip() +
labs(title = "Top Outflow",
x = "Country",
y = "Emigration Rate") +
theme_minimal()
# Scatter plot for top 10 countries with highest immigration rates
ggplot(top_immigration_rate, aes(x = reorder(VISUALIZATION_NAME, immigrates), y = immigrates)) +
geom_point(shape = 1) +
coord_flip() +
labs(title = "Top Inflow",
x = "Country",
y = "Immigration Rate") +
theme_minimal()
ggplot(top_net_migration, aes(x = reorder(VISUALIZATION_NAME, net), y = net)) +
geom_point(shape = 1) +
coord_flip() +
labs(title = "Top 10 Countries by Net Migration Rate",
x = "Country",
y = "Net Migration Rate (Immigration - Emigration)") +
theme_minimal()
Begin the exploratory data analysis by investigating which explanatory variables are the most correlated with the dependent variables and comment the results you obtain. Do you see any differences of correlation sign between emigrates and immigrates?
# First, select the numeric columns relevant for the analysis.
numeric_data <- world_final %>%
select_if(is.numeric)
# Compute the correlation matrix using complete observations
cor_matrix <- cor(numeric_data, use = "complete.obs")
# Visualize the correlation matrix focusing on the dependent variables
corrplot(cor_matrix, method = "color", tl.cex = 0.8)
We display the most correlated variables:
# First, select only the numeric explanatory variables (exclude dependent vars and the vars used to build them)
explanatory_vars <- world_final %>%
select_if(is.numeric) %>%
select(-emigrates,-total_out, -total_in, -immigrates, -net)
# Calculate correlations with emigrates and immigrates
cor_with_emigrates <- sapply(explanatory_vars, function(x) cor(x, world_final$emigrates, use = "complete.obs"))
cor_with_immigrates <- sapply(explanatory_vars, function(x) cor(x, world_final$immigrates, use = "complete.obs"))
# Identify the variable with the highest absolute correlation for emigrates
max_corr_emigrates <- names(which.max(abs(cor_with_emigrates)))
max_corr_value_emigrates <- cor_with_emigrates[max_corr_emigrates]
# Identify the variable with the highest absolute correlation for immigrates
max_corr_immigrates <- names(which.max(abs(cor_with_immigrates)))
max_corr_value_immigrates <- cor_with_immigrates[max_corr_immigrates]
# Display the results
cat("The variable most correlated with emigrates is:", max_corr_emigrates,
"with a correlation of", round(max_corr_value_emigrates, 3), "\n")
## The variable most correlated with emigrates is: conflictpercapita with a correlation of 0.192
cat("The variable most correlated with immigrates is:", max_corr_immigrates,
"with a correlation of", round(max_corr_value_immigrates, 3), "\n")
## The variable most correlated with immigrates is: lifeexp with a correlation of 0.373
The results show that conflictpercapita is the variable most correlated with emigration (\(r \approx 0.183\)), suggesting that higher conflict levels are associated with increased outmigration. For immigration, lifeexp is most correlated (\(r \approx 0.373\)), indicating that countries with higher life expectancy tend to attract more migrants. While the correlation for emigration is relatively weak, the stronger correlation for immigration implies that quality-of-life factors may be a key pull factor for migrants.
# Ensure world_contours_transformed is in the correct format for ggplot (sf object)
world_final_transformed <- st_as_sf(world_final)
# Create a choropleth map for immigrates (immigration rates per capita)
immigration_map <- ggplot(world_final_transformed) +
geom_sf(aes(fill = immigrates), color = NA) +
scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
labs(title = "Immigration Rates (per capita)",
fill = "Immigrates") +
theme_minimal()
immigration_map
# Create a choropleth map for emigrates (emigration rates per capita)
emigration_map <- ggplot(world_final_transformed) +
geom_sf(aes(fill = emigrates), color = NA) +
scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
labs(title = "Emigration Rates (per capita)",
fill = "Emigrates") +
theme_minimal()
emigration_map
# Create a choropleth map for net migration (net rates per capita)
net_map <- ggplot(world_final_transformed) +
geom_sf(aes(fill = net), color = NA) +
scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
labs(title = "Net Rates (per capita)",
fill = "Net") +
theme_minimal()
net_map
We decide to create a map to visualize the climatic variable
t2m_diff (average annual difference temperature).
# Select the climatic variable to visualize
climatic_var <- "t2m_diff"
# Create the choropleth map
ggplot(world_final_transformed) +
geom_sf(aes(fill = !!sym(climatic_var)), color = NA) +
scale_fill_viridis_c(option = "magma", na.value = "grey90") +
labs(title = paste("Global Distribution of temperature difference"),
fill = climatic_var) +
theme_minimal()
We decide to combine contiguity-based neighbors (countries sharing borders) with 3-nearest neighbors to capture both geographic and proximity relationships. Indeed, contiguity-based neighbors reflect real-world spatial relationships, 3-nearest neighbors ensure that even non-contiguous countries have neighbors, avoiding isolated cases. Combining both improves spatial analysis by accounting for both direct borders and close geographic proximity.
# Ensure world_final_transformed has a valid CRS
world_final_transformed_crs <- st_transform(world_final_transformed, crs = "ESRI:54030")
# Create a contiguity-based neighbors list
contiguity_nb <- poly2nb(world_final_transformed_crs)
# Extract the centroid coordinates of each country
coords <- st_centroid(world_final_transformed_crs, of_largest_polygon = TRUE) |>
st_coordinates()
# Ensure the coordinates are in matrix format
coords <- as.matrix(coords)
# Define the 4-nearest neighbors
knn_nb <- knn2nb(knearneigh(coords, k = 3))
# Combine contiguity-based and k-nearest neighbors
combined_nb <- union.nb(contiguity_nb, knn_nb)
# Convert neighborhood links into spatial lines for visualization
nb_lines <- nb2lines(combined_nb, coords = coords)
# Convert nb_lines into an sf object and assign the same CRS
nb_lines_sf <- st_as_sf(nb_lines)
st_crs(nb_lines_sf) <- st_crs(world_final_transformed_crs) # Ensure CRS consistency
# Plot the spatial neighborhood links
ggplot() +
geom_sf(data = world_final_transformed_crs, fill = "white", color = "grey90") +
geom_sf(data = nb_lines_sf, color = "black", size = 0.5) +
labs(title = "Spatial Neighborhood Links (Contiguity + 3-Nearest Neighbors)") +
theme_minimal()
There are two packages to draw the scatter plot.
First, we draw the scatter plot for the
emigrates dependent variable:
# Create the spatial weights matrix
lw <- nb2listw(combined_nb, style = "W")
# Generate the Moran scatter plot using spdep's moran.plot
moran.plot(world_final_transformed_crs$emigrates, lw,
labels = world_final_transformed_crs$`Country Code`,
pch = 19)
# Generate the Moran scatter plot using customed function moran_scatter_plot
moran_scatter_plot(world_final_transformed_crs$emigrates,
my_listw = lw,
geo = st_geometry(world_final_transformed_crs))
From the Moran scatter plots of the emigrates dependent
variable and the corresponding map, the highest local spatial
autocorrelation (i.e., clusters of either consistently high or
consistently low values) appears primarily in:
Eastern Europe (e.g., Moldova, Romania, Ukraine)
Caribbean/Latin America (e.g., Aruba, Curaçao, St. Vincent & the Grenadines)
Parts of Africa (e.g., Comoros)
These countries are labeled as outliers in the Moran scatter plot (often in the high-high quadrant) and also stand out on the map with larger symbols, indicating statistically significant local Moran’s I values.
Then, we draw the scatter plot for the
immigrates dependent variable:
# Generate the Moran scatter plot using spdep's moran.plot
moran.plot(world_final_transformed_crs$immigrates, lw,
labels = world_final_transformed_crs$`Country Code`,
pch = 19)
# Generate the Moran scatter plot using customed function moran_scatter_plot
moran_scatter_plot(world_final_transformed_crs$immigrates,
my_listw = lw,
geo = st_geometry(world_final_transformed_crs))
Based on the Moran scatter plots of the the immigrates
dependent variable, the countries showing the strongest local spatial
autocorrelation (often in the high–high quadrant or as notable outliers)
are predominantly:
These locations stand out due to high values of the variable in question coupled with similarly high values in neighboring countries (or distinct clustering patterns), indicating significant local spatial autocorrelation.
First, we perform the statistical test for the
emigrates dependent variable:
# Perform Moran's I test on the dependent variable 'emigrates'
moran_test <- moran.test(world_final_transformed_crs$emigrates, lw)
# Print the test results
print(moran_test)
##
## Moran I test under randomisation
##
## data: world_final_transformed_crs$emigrates
## weights: lw
##
## Moran I statistic standard deviate = 3.1008, p-value = 0.0009651
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.126195643 -0.004329004 0.001771944
Since the p-value is less than 0.05, we reject the null hypothesis of no spatial autocorrelation. This indicates that the emigration rates are significantly spatially autocorrelated; in other words, countries with similar emigration rates tend to be located near each other.
Then, we perform the statistical test for the
immigrates dependent variable:
# Perform Moran's I test on the dependent variable 'immigrates'
moran_test <- moran.test(world_final_transformed_crs$immigrates, lw)
# Print the test results
print(moran_test)
##
## Moran I test under randomisation
##
## data: world_final_transformed_crs$immigrates
## weights: lw
##
## Moran I statistic standard deviate = 6.6065, p-value = 1.968e-11
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.274781506 -0.004329004 0.001784883
Since the p-value is extremely low (p < 0.05), we reject the null hypothesis of no spatial autocorrelation. This indicates that immigration rates are significantly spatially correlated: countries with similar immigration rates tend to cluster together in space.
Step 1: Estimate OLS Models
We decide to take the logarithm of variables population, natural disaster, and precipitation. It is beneficial because these variables are typically highly skewed and span several orders of magnitude. Logging them compresses extreme values, helping to normalize their distribution and linearize relationships with the dependent variables. This transformation makes the coefficients easier to interpret (as elasticities) and improves the overall performance and robustness of the regression model by better meeting its linearity and homoscedasticity assumptions.
For instance, we draw the histogram of the population and of the log population:
# Histogram for raw population
p_raw <- ggplot(world_final_transformed_crs, aes(x = population)) +
geom_histogram(bins = 30, fill = "blue") +
labs(title = "Raw Population Distribution", x = "Population", y = "Count") +
theme_minimal()
# Histogram for log-transformed population
p_log <- ggplot(world_final_transformed_crs, aes(x = log(population))) +
geom_histogram(bins = 30, fill = "green") +
labs(title = "Log(Population) Distribution", x = "Log(Population)", y = "Count") +
theme_minimal()
# Arrange plots side by side for comparison
grid.arrange(p_raw, p_log, ncol = 2)
We can see that using the log helped normalizing the variable.
We can now build the models.
# OLS model for emigration
model_emigrates <- lm(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate,
data = world_final_transformed_crs)
# OLS model for immigration
model_immigrates <- lm(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate,
data = world_final_transformed_crs)
# Print model summaries
summary(model_emigrates)
##
## Call:
## lm(formula = emigrates ~ deflactor + lifeexp + log(population) +
## politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
## ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.026146 -0.009436 -0.002674 0.003135 0.125663
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.579e-02 3.611e-02 1.822 0.06982 .
## deflactor 5.666e-06 3.734e-05 0.152 0.87954
## lifeexp -1.241e-04 2.845e-04 -0.436 0.66308
## log(population) -2.604e-03 9.223e-04 -2.823 0.00520 **
## politicalstability -6.500e-03 2.319e-03 -2.803 0.00551 **
## conflictpercapita 5.544e+00 2.377e+00 2.333 0.02058 *
## vulnerability 9.924e-03 2.840e-02 0.349 0.72713
## log(natural_disaster) 1.744e-04 5.324e-04 0.327 0.74362
## t2m_diff 4.908e-03 3.149e-03 1.559 0.12052
## dry -5.080e-05 3.965e-05 -1.281 0.20149
## log(prec_5days) -2.718e-03 3.141e-03 -0.865 0.38788
## ghwr_35 -3.831e-05 5.035e-05 -0.761 0.44755
## hwf_upp 4.310e-05 2.999e-05 1.437 0.15209
## unemployment_rate -1.874e-04 2.721e-04 -0.689 0.49167
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01824 on 218 degrees of freedom
## Multiple R-squared: 0.1605, Adjusted R-squared: 0.1104
## F-statistic: 3.206 on 13 and 218 DF, p-value: 0.0001944
summary(model_immigrates)
##
## Call:
## lm(formula = immigrates ~ deflactor + lifeexp + log(population) +
## politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
## ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.031482 -0.011890 -0.003262 0.006983 0.157554
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.662e-02 4.369e-02 1.754 0.080914 .
## deflactor -1.872e-07 4.519e-05 -0.004 0.996698
## lifeexp 3.107e-04 3.443e-04 0.902 0.367841
## log(population) -3.946e-03 1.116e-03 -3.535 0.000497 ***
## politicalstability 2.172e-03 2.806e-03 0.774 0.439821
## conflictpercapita 2.148e-01 2.876e+00 0.075 0.940547
## vulnerability -1.048e-01 3.437e-02 -3.049 0.002578 **
## log(natural_disaster) 1.293e-04 6.443e-04 0.201 0.841132
## t2m_diff 2.837e-03 3.811e-03 0.745 0.457371
## dry 5.318e-05 4.798e-05 1.108 0.268910
## log(prec_5days) 3.084e-03 3.802e-03 0.811 0.418138
## ghwr_35 1.749e-04 6.093e-05 2.871 0.004494 **
## hwf_upp 1.429e-05 3.629e-05 0.394 0.694051
## unemployment_rate -4.919e-04 3.293e-04 -1.494 0.136655
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02207 on 218 degrees of freedom
## Multiple R-squared: 0.2912, Adjusted R-squared: 0.2489
## F-statistic: 6.89 on 13 and 218 DF, p-value: 4.278e-11
The OLS model for emigrates explains
15.7% of the variance (R² = 0.157, Adjusted R² = 0.107)
with a residual standard error of 0.01827 on 218 degrees of freedom.
Notably, log(population) has a significant negative
coefficient (Estimate = -0.00236, p = 0.01172), indicating that, after
adjusting for other factors, larger populations are associated with
lower per capita emigration. Similarly, political
stability is significantly negative (Estimate = -0.00628, p =
0.00828), suggesting that more stable countries experience less
emigration. In contrast, conflict per capita shows a
significant positive effect (Estimate = 5.603, p = 0.01961), meaning
that higher conflict levels tend to drive increased emigration. The
remaining variables, including life expectancy, vulnerability,
log(natural_disaster), t2m_diff, dry, log(prec_5days), ghwr_35, hwf_upp,
and unemployment_rate, are not statistically significant, which may
imply missing factors or unaccounted spatial dependencies influencing
emigration patterns.
The OLS model for immigrates explains 25.7% of
the variance (Multiple R² = 0.2878, Adjusted R² = 0.2453) with
a residual standard error of 0.02212 on 218 degrees of freedom. Notably,
log(population) has a significant negative effect
(Estimate = -0.00378, p = 0.000899), suggesting that larger populations
are associated with lower per capita immigration. Additionally,
vulnerability shows a significant negative coefficient
(Estimate = -0.09984, p = 0.004152), while ghwr_35 is
significantly positive (Estimate = 0.0001748, p = 0.004316). These
significant predictors imply that, beyond the raw demographic size,
other factors such as environmental stress and infrastructural aspects
play an important role in shaping immigration. Most other variables,
including deflactor, life expectancy, political stability, and conflict
per capita, do not have statistically significant impacts, indicating
that key drivers of immigration might be missing or that unaccounted
spatial effects are influencing the results.
Step 2: Check Spatial Autocorrelation in Residuals
# Extract residuals
residuals_emigrates <- residuals(model_emigrates)
residuals_immigrates <- residuals(model_immigrates)
# Moran I test for spatial autocorrelation in residuals
moran_resid_emigrates <- moran.test(residuals_emigrates, lw)
moran_resid_immigrates <- moran.test(residuals_immigrates, lw)
# Print results
print(moran_resid_emigrates)
##
## Moran I test under randomisation
##
## data: residuals_emigrates
## weights: lw
##
## Moran I statistic standard deviate = 1.5746, p-value = 0.05767
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.062132306 -0.004329004 0.001781456
print(moran_resid_immigrates)
##
## Moran I test under randomisation
##
## data: residuals_immigrates
## weights: lw
##
## Moran I statistic standard deviate = 4.0395, p-value = 2.678e-05
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.164812990 -0.004329004 0.001753268
The Moran’s I test detects significant spatial autocorrelation in emigration residuals (p < 0.05). The positive Moran’s I (0.057) suggests that emigration patterns are spatially clustered rather than randomly distributed, indicating potential spatial dependencies not captured by the model.
The Moran’s I test detects significant spatial autocorrelation in immigration residuals (p value extremely low (p < 0.05)). The positive Moran’s I (0.17) suggests that immigration patterns are spatially clustered rather than randomly distributed, indicating potential spatial dependencies not captured by the model.
We can also draw Moran Scatter Plot Residuals:
# Function to create Moran scatter plot
plot_moran <- function(residuals, lw, labels, title) {
# Define quadrant-based labels
quad_labels <- ifelse(
residuals > 0 & spdep::lag.listw(lw, residuals) > 0, "HH", # Top-right
ifelse(residuals < 0 & spdep::lag.listw(lw, residuals) > 0, "LH", # Top-left
ifelse(residuals < 0 & spdep::lag.listw(lw, residuals) < 0, "LL", # Bottom-left
"HL")) # Bottom-right
)
# Color map
color_map <- c("HH" = "red", "HL" = "green", "LH" = "blue", "LL" = "purple")
colors <- color_map[quad_labels]
# Plot
moran.plot(
x = residuals,
listw = lw,
labels = labels,
xlab = paste("Residuals of the", title, "OLS model"),
ylab = "Spatially Lagged Residuals",
col = colors,
main = paste("Moran Scatter Plot (", title, " Model)", sep = ""),
pch = 19
)
abline(h = 0, v = 0, lty = 2, col = "black")
abline(lm(spdep::lag.listw(lw, residuals) ~ residuals), col = "red", lwd = 2)
legend("topright", legend = c("HH", "LH", "LL", "HL"),
col = c("red", "blue", "purple", "green"), pch = 19, cex = 0.8, bty = "n")
return(quad_labels)
}
# Emigrates
world_final_transformed_crs$cluster_emigrates <- plot_moran(
residuals_emigrates,
lw,
world_final_transformed_crs$`Country Code`,
"Emigrates"
)
# Immigrates
world_final_transformed_crs$cluster_immigrates <- plot_moran(
residuals_immigrates,
lw,
world_final_transformed_crs$`Country Code`,
"Immigrates"
)
The Moran scatter plots reveal distinct spatial clustering patterns in the model residuals. For the emigration model, several countries in the High-High quadrant (red) indicate that those with above-average residuals are surrounded by neighbors with similarly high residuals, suggesting a concentrated area of under- or over-prediction by the model. In contrast, countries in the Low-High (blue) or High-Low (green) quadrants highlight spatial discrepancies, where a country’s residual significantly deviates from that of its neighbors. The immigration model’s scatter plot shows similar patterns, with clusters of High-High and Low-Low groups, reinforcing the presence of spatial autocorrelation. The positive slope of the fitted regression line in both plots confirms that countries tend to exhibit similar model residuals as their neighbors, implying that regional factors may be influencing migration flows beyond what the model currently captures.
Step 3: Identify Spatial Clusters in Residuals (LISA)
First we analyse the Spatial Cluster in Residuals for the
immigrates dependent variable.
The map below displays the spatial clusters of the immigration model residuals using LISA, highlighting areas with similar values among neighboring countries (e.g., High-High, Low-Low).
# Plot with color mapping
mf_map(
x = world_final_transformed_crs,
var = "cluster_immigrates",
type = "typo",
pal = c("red", "green", "blue", "purple"),
leg_title = "local_l_in (Immigrates)")
The map illustrates the spatial clustering of immigration residuals using local Moran’s I. Countries in red (HH) represent high immigration residuals surrounded by neighbors with similarly high values, indicating significant regional hot spots. In contrast, purple areas (LL) denote clusters of low immigration residuals bordered by countries with equally low values, forming cold spots. Green (HL) and blue (LH) areas reflect spatial outliers, where countries with high immigration are neighbored by low-immigration countries, and vice versa. Notably, several African nations form a prominent HH cluster, suggesting shared regional drivers of elevated immigration, while regions such as Australia, Eastern Europe, and parts of Asia show LL patterns, highlighting consistent spatial concentrations of low immigration levels.
Then, we perform a Local Indicators of Spatial Association (LISA) cluster analysis on the residuals from the immigration model, identifying spatial patterns of significant local autocorrelation. We calculates local Moran’s I statistics, assigns countries to cluster types (High-High, Low-Low, High-Low, Low-High, or Not Significant) based on the relationship between their residuals and those of their neighbors, and visualize the resulting spatial clusters on a map.
# ------------------ LISA Cluster Analysis for Immigration Residuals ------------------
# Step 1: Compute Local Moran’s I for immigration residuals
lisa_resid_immigrates <- localmoran(residuals_immigrates, lw, alternative = "two.sided")
# Step 2: Add LISA results to the spatial dataframe
world_final_transformed_crs$LISA_I <- lisa_resid_immigrates[, "Ii"]
world_final_transformed_crs$Pr_in <- lisa_resid_immigrates[, "Pr(z != E(Ii))"] # p-values
world_final_transformed_crs$Z_in <- lisa_resid_immigrates[, "Z.Ii"] # z-scores
# Step 3: Define significance threshold
sig_level <- 0.05
# Step 4: Compute global means for quadrant definition
mean_res <- mean(residuals_immigrates, na.rm = TRUE)
lag_res <- spdep::lag.listw(lw, residuals_immigrates)
mean_lag <- mean(lag_res, na.rm = TRUE)
# Step 5: Define LISA cluster categories
world_final_transformed_crs$cluster_in <- with(
world_final_transformed_crs,
ifelse(
Pr_in > sig_level, "NA",
ifelse(
residuals_immigrates > mean_res & lag_res > mean_lag, "High-High",
ifelse(
residuals_immigrates < mean_res & lag_res < mean_lag, "Low-Low",
ifelse(
residuals_immigrates > mean_res & lag_res < mean_lag, "High-Low",
ifelse(
residuals_immigrates < mean_res & lag_res > mean_lag, "Low-High", "NA"
)
)
)
)
)
)
# Step 6: Convert cluster to factor to control legend order
world_final_transformed_crs$cluster_in <- factor(
world_final_transformed_crs$cluster_in,
levels = c("High-High", "High-Low", "Low-High", "Low-Low", "NA")
)
# Step 7: Map LISA clusters
mf_map(
x = world_final_transformed_crs,
var = "cluster_in",
type = "typo",
pal = c("red", "green", "grey90", "blue", "purple"),
leg_title = "LISA Clusters (Immigrates)"
)
The map shows that most countries do not exhibit significant spatial autocorrelation in immigration residuals (in gray). One country forms a High-High cluster (in red), indicating both high residuals and neighboring countries with similarly high values. A few countries appear as Low-High clusters (in green), suggesting they have low residuals while being surrounded by countries with high residuals. These clusters reveal spatial patterns in model misfit that may reflect regional dynamics or missing explanatory factors.
Then we analyse the Spatial Cluster in Residuals for the
emigrates dependent variable.
The map below displays the spatial clusters of the immigration model residuals using LISA, highlighting areas with similar values among neighboring countries (e.g., High-High, Low-Low).
# Plot with color mapping
mf_map(
x = world_final_transformed_crs,
var = "cluster_emigrates",
type = "typo",
pal = c("red", "green", "blue", "purple"),
leg_title = "local_l_in (Emigrates)")
The LISA cluster map reveals clear spatial patterns in the immigration residuals. Regions classified as High-High (in red) indicate countries with unexpectedly high immigration that are surrounded by neighbors exhibiting similarly high values, while Low-Low areas (in purple) show clusters of low immigration residuals. Additionally, the presence of High-Low and Low-High clusters highlights local outliers where a country’s residual significantly contrasts with those of its neighbors.
Then, we perform a Local Indicators of Spatial Association (LISA) cluster analysis on the residuals from the emigration model, identifying spatial patterns of significant local autocorrelation. We calculates local Moran’s I statistics, assigns countries to cluster types (High-High, Low-Low, High-Low, Low-High, or Not Significant) based on the relationship between their residuals and those of their neighbors, and visualize the resulting spatial clusters on a map.
# ------------------ LISA Cluster Analysis for Immigration Residuals ------------------
# Step 1: Compute Local Moran’s I for immigration residuals
lisa_resid_emigrates <- localmoran(residuals_emigrates, lw, alternative = "two.sided")
# Step 2: Add LISA results to the spatial dataframe
world_final_transformed_crs$LISA_I <- lisa_resid_emigrates[, "Ii"]
world_final_transformed_crs$Pr_in <- lisa_resid_emigrates[, "Pr(z != E(Ii))"] # p-values
world_final_transformed_crs$Z_in <- lisa_resid_emigrates[, "Z.Ii"] # z-scores
# Step 3: Define significance threshold
sig_level <- 0.05
# Step 4: Compute global means for quadrant definition
mean_res <- mean(residuals_emigrates, na.rm = TRUE)
lag_res <- spdep::lag.listw(lw, residuals_emigrates)
mean_lag <- mean(lag_res, na.rm = TRUE)
# Step 5: Define LISA cluster categories
world_final_transformed_crs$cluster_in <- with(
world_final_transformed_crs,
ifelse(
Pr_in > sig_level, "NA",
ifelse(
residuals_emigrates > mean_res & lag_res > mean_lag, "High-High",
ifelse(
residuals_emigrates < mean_res & lag_res < mean_lag, "Low-Low",
ifelse(
residuals_emigrates > mean_res & lag_res < mean_lag, "High-Low",
ifelse(
residuals_emigrates < mean_res & lag_res > mean_lag, "Low-High", "NA"
)
)
)
)
)
)
# Step 6: Convert cluster to factor to control legend order
world_final_transformed_crs$cluster_in <- factor(
world_final_transformed_crs$cluster_in,
levels = c("High-High", "High-Low", "Low-High", "Low-Low", "NA")
)
# Step 7: Map LISA clusters
mf_map(
x = world_final_transformed_crs,
var = "cluster_in",
type = "typo",
pal = c("red", "green", "grey90", "blue", "purple"),
leg_title = "LISA Clusters (Emigrates)"
)
These spatial clusters confirm the presence of significant local spatial autocorrelation, suggesting that regional factors influence immigration patterns beyond what the current model captures. This insight reinforces the need to consider spatial econometric models to better account for these regional dependencies and improve overall model accuracy.
First, we choose the model for the emigrates
dependent variable:
#testing strategy
res_test_emigrates <- lm.RStests(model_emigrates, lw, test = "all")
summary(res_test_emigrates)
## Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
## dependence
## data:
## model: lm(formula = emigrates ~ deflactor + lifeexp + log(population) +
## politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) + ghwr_35 +
## hwf_upp + unemployment_rate, data = world_final_transformed_crs)
## test weights: lw
##
## statistic parameter p.value
## RSerr 1.968838 1 0.1606
## RSlag 2.411200 1 0.1205
## adjRSerr 0.031013 1 0.8602
## adjRSlag 0.473374 1 0.4914
## SARMA 2.442212 2 0.2949
None of the test is significant suggesting that the best model is an OLM.
We then estimate the different spatial models:
# SLX model
slx_1 <- lm(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs)
slx_2 <- lmSLX(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs,
listw = lw)
# SEM model
sem_emi <- errorsarlm(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs,
listw = lw)
# Lag model
lagm_emi <- lagsarlm(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs,
listw = lw)
# SDM model
durb_emi <- lagsarlm(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs, listw = lw, Durbin = TRUE)
We now compare the different models using AIC criteria (function AIC()):
c(olm = AIC(model_emigrates), slx = AIC(slx_2), lag = AIC(lagm_emi),
sem = AIC(sem_emi), sdm = AIC(durb_emi))
## olm slx lag sem sdm
## -1183.995 -1170.331 -1184.221 -1183.938 -1169.401
AIC gives lag as the best model but is very close to the OLM and SEM models.
We apply the same strategy for the models of the
immigrates dependent variable:
#testing strategy
res_test_immigrates <- lm.RStests(model_immigrates, lw, test = "all")
summary(res_test_immigrates)
## Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
## dependence
## data:
## model: lm(formula = immigrates ~ deflactor + lifeexp + log(population)
## + politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) + ghwr_35 +
## hwf_upp + unemployment_rate, data = world_final_transformed_crs)
## test weights: lw
##
## statistic parameter p.value
## RSerr 13.85345 1 0.0001976 ***
## RSlag 12.43036 1 0.0004224 ***
## adjRSerr 1.52671 1 0.2166068
## adjRSlag 0.10362 1 0.7475268
## SARMA 13.95707 2 0.0009317 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The \(LMerr\) and \(LMlag\) tests are both significant. Hence, we look at the robust version of these tests \(RLMerr\) and \(RLMlag\). None of them is significant. We thus choose the model from which the test is the most significant. In our case it would be SEM but LAG is also very close.
We then estimate the different spatial models:
# SLX model
slx_1 <- lm(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs)
slx_2 <- lmSLX(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs,
listw = lw)
# SEM model
sem_immi <- errorsarlm(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs,
listw = lw)
# Lag model
lagm <- lagsarlm(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs,
listw = lw)
# SDM model
durb_immi <- lagsarlm(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs, listw = lw, Durbin = TRUE)
We now compare the different models using AIC criteria (function AIC()):
c(olm = AIC(model_immigrates), slx = AIC(slx_2), lag = AIC(lagm),
sem = AIC(sem_immi), sdm = AIC(durb_immi))
## olm slx lag sem sdm
## -1095.498 -1127.169 -1106.190 -1111.026 -1130.306
AIC gives SDM as the best model, as it is the one with the lowest AIC.
First, we present the results of the models chosen for the
emigrates dependent variable:
For the emigrates dependent variable, we choose the OLM
and LAG models and the results are presented below:
stargazer::stargazer(model_emigrates,lagm_emi, type = "text")
##
## ============================================================
## Dependent variable:
## --------------------------------------
## emigrates
## OLS spatial
## autoregressive
## (1) (2)
## ------------------------------------------------------------
## deflactor 0.00001 0.00001
## (0.00004) (0.00004)
##
## lifeexp -0.0001 -0.0001
## (0.0003) (0.0003)
##
## log(population) -0.003*** -0.003***
## (0.001) (0.001)
##
## politicalstability -0.007*** -0.006***
## (0.002) (0.002)
##
## conflictpercapita 5.544** 5.614**
## (2.377) (2.291)
##
## vulnerability 0.010 0.011
## (0.028) (0.027)
##
## log(natural_disaster) 0.0002 0.0002
## (0.001) (0.001)
##
## t2m_diff 0.005 0.005
## (0.003) (0.003)
##
## dry -0.0001 -0.00005
## (0.00004) (0.00004)
##
## log(prec_5days) -0.003 -0.002
## (0.003) (0.003)
##
## ghwr_35 -0.00004 -0.00003
## (0.0001) (0.00005)
##
## hwf_upp 0.00004 0.00003
## (0.00003) (0.00003)
##
## unemployment_rate -0.0002 -0.0002
## (0.0003) (0.0003)
##
## Constant 0.066* 0.061*
## (0.036) (0.035)
##
## ------------------------------------------------------------
## Observations 232 232
## R2 0.160
## Adjusted R2 0.110
## Log Likelihood 608.111
## sigma2 0.0003
## Akaike Inf. Crit. -1,184.221
## Residual Std. Error 0.018 (df = 218)
## F Statistic 3.206*** (df = 13; 218)
## Wald Test 2.256 (df = 1)
## LR Test 2.226 (df = 1)
## ============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The OLS results indicate that key determinants of emigration include log(population), political stability, and conflict per capita. Specifically, a higher population is associated with lower per capita emigration (coefficient \(\approx\) -0.003, significant at 1%), while greater political stability is linked to reduced emigration (coefficient \(\approx\) 0.006, significant at 1%). In contrast, higher levels of conflict per capita significantly increase emigration (coefficient \(\approx\) 5.92, significant at 5%). Other variables such as life expectancy, vulnerability, and measures of natural disasters do not appear to significantly affect emigration in this specification.
The spatial autoregressive (lag) model—which accounts for the influence of neighboring countries—yields very similar coefficient estimates, reinforcing the importance of population, political stability, and conflict per capita. The spatial tests (Wald and LR) indicate that incorporating spatial dependence is justified, meaning that emigration in one country is partially influenced by the emigration levels of its neighbors. Although the estimates are close to the OLS values, a further decomposition into direct and indirect effects would clarify how much of the impact is localized versus transmitted through spatial spillovers.
In both the OLS and the spatial autoregressive models, several coefficients are notable. For example, the coefficient for log(population) is approximately -0.003 and is statistically significant, implying that, holding other factors constant, a 1% increase in population is associated with a slight decrease in the per capita emigration rate. Political stability also has a negative effect (about -0.006), suggesting that more politically stable countries tend to experience lower emigration. In contrast, conflict per capita shows a strong positive effect (around 5.92 in the OLS model and 5.99 in the spatial model), indicating that higher levels of conflict are associated with significantly higher emigration. These results suggest that while demographic and political factors reduce emigration, conflict serves as a major push factor.
We then use the decomposition of direct and indirect impact for the lag model:
# -------------------------------
# Decomposition of Impacts for the Spatial Lag Model
# -------------------------------
# Compute impacts using 1000 simulations for significance testing
lag_impacts <- impacts(lagm_emi, listw = lw, R = 1000)
summary_impacts <- summary(lag_impacts, zstats = TRUE)
print(summary_impacts)
## Impact measures (lag, exact):
## Direct Indirect Total
## deflactor 7.385503e-06 1.072647e-06 8.458150e-06
## lifeexp -1.298092e-04 -1.885308e-05 -1.486623e-04
## log(population) -2.527070e-03 -3.670236e-04 -2.894094e-03
## politicalstability -6.037779e-03 -8.769077e-04 -6.914686e-03
## conflictpercapita 5.633923e+00 8.182530e-01 6.452176e+00
## vulnerability 1.108260e-02 1.609602e-03 1.269220e-02
## log(natural_disaster) 2.162309e-04 3.140469e-05 2.476356e-04
## t2m_diff 4.761470e-03 6.915407e-04 5.453011e-03
## dry -4.824720e-05 -7.007269e-06 -5.525446e-05
## log(prec_5days) -2.358533e-03 -3.425458e-04 -2.701079e-03
## ghwr_35 -3.181354e-05 -4.620497e-06 -3.643404e-05
## hwf_upp 3.434032e-05 4.987478e-06 3.932779e-05
## unemployment_rate -1.825152e-04 -2.650792e-05 -2.090231e-04
## ========================================================
## Simulation results ( variance matrix):
## Direct:
##
## Iterations = 1:1000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## deflactor 7.467e-06 3.582e-05 1.133e-06 1.133e-06
## lifeexp -1.327e-04 2.752e-04 8.703e-06 8.703e-06
## log(population) -2.531e-03 9.458e-04 2.991e-05 2.991e-05
## politicalstability -6.021e-03 2.384e-03 7.537e-05 6.911e-05
## conflictpercapita 5.598e+00 2.266e+00 7.166e-02 7.166e-02
## vulnerability 1.063e-02 2.795e-02 8.837e-04 8.837e-04
## log(natural_disaster) 2.221e-04 5.177e-04 1.637e-05 1.637e-05
## t2m_diff 4.741e-03 3.094e-03 9.783e-05 9.783e-05
## dry -4.720e-05 3.877e-05 1.226e-06 1.226e-06
## log(prec_5days) -2.185e-03 3.136e-03 9.917e-05 8.733e-05
## ghwr_35 -3.111e-05 4.939e-05 1.562e-06 1.562e-06
## hwf_upp 3.541e-05 2.784e-05 8.804e-07 8.335e-07
## unemployment_rate -1.857e-04 2.561e-04 8.099e-06 7.691e-06
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## deflactor -6.606e-05 -1.573e-05 7.053e-06 3.089e-05 7.661e-05
## lifeexp -6.924e-04 -3.090e-04 -1.308e-04 5.402e-05 3.934e-04
## log(population) -4.460e-03 -3.164e-03 -2.539e-03 -1.877e-03 -6.735e-04
## politicalstability -1.086e-02 -7.661e-03 -6.002e-03 -4.453e-03 -1.458e-03
## conflictpercapita 1.216e+00 4.072e+00 5.432e+00 7.181e+00 1.012e+01
## vulnerability -4.792e-02 -7.603e-03 1.187e-02 2.860e-02 6.715e-02
## log(natural_disaster) -7.760e-04 -1.305e-04 2.070e-04 5.709e-04 1.223e-03
## t2m_diff -1.210e-03 2.534e-03 4.723e-03 6.889e-03 1.081e-02
## dry -1.241e-04 -7.280e-05 -4.793e-05 -2.178e-05 2.434e-05
## log(prec_5days) -8.259e-03 -4.266e-03 -2.188e-03 1.262e-04 3.873e-03
## ghwr_35 -1.296e-04 -6.224e-05 -2.983e-05 3.428e-06 6.141e-05
## hwf_upp -1.966e-05 1.735e-05 3.508e-05 5.328e-05 9.140e-05
## unemployment_rate -7.039e-04 -3.581e-04 -1.772e-04 -2.075e-05 3.338e-04
##
## ========================================================
## Indirect:
##
## Iterations = 1:1000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## deflactor 1.133e-06 6.482e-06 2.050e-07 2.050e-07
## lifeexp -2.048e-05 5.644e-05 1.785e-06 1.785e-06
## log(population) -3.807e-04 3.208e-04 1.014e-05 1.014e-05
## politicalstability -8.960e-04 7.681e-04 2.429e-05 2.429e-05
## conflictpercapita 8.928e-01 7.915e-01 2.503e-02 2.530e-02
## vulnerability 1.653e-03 5.590e-03 1.768e-04 1.768e-04
## log(natural_disaster) 3.298e-05 9.771e-05 3.090e-06 2.616e-06
## t2m_diff 7.183e-04 7.687e-04 2.431e-05 2.431e-05
## dry -7.282e-06 8.787e-06 2.779e-07 2.779e-07
## log(prec_5days) -3.254e-04 6.493e-04 2.053e-05 1.638e-05
## ghwr_35 -4.575e-06 1.036e-05 3.278e-07 3.036e-07
## hwf_upp 5.084e-06 6.395e-06 2.022e-07 2.088e-07
## unemployment_rate -2.955e-05 5.501e-05 1.740e-06 1.774e-06
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## deflactor -1.176e-05 -1.845e-06 5.898e-07 4.022e-06 1.581e-05
## lifeexp -1.571e-04 -4.321e-05 -1.183e-05 6.525e-06 8.238e-05
## log(population) -1.127e-03 -5.422e-04 -3.395e-04 -1.560e-04 1.106e-04
## politicalstability -2.655e-03 -1.314e-03 -7.784e-04 -3.659e-04 2.958e-04
## conflictpercapita -2.123e-01 3.285e-01 7.499e-01 1.253e+00 2.984e+00
## vulnerability -8.926e-03 -8.873e-04 9.398e-04 4.008e-03 1.495e-02
## log(natural_disaster) -1.617e-04 -1.681e-05 1.634e-05 7.917e-05 2.553e-04
## t2m_diff -3.517e-04 1.659e-04 5.404e-04 1.117e-03 2.648e-03
## dry -2.923e-05 -1.169e-05 -5.330e-06 -1.266e-06 4.890e-06
## log(prec_5days) -1.919e-03 -5.790e-04 -1.969e-04 3.454e-05 6.883e-04
## ghwr_35 -2.912e-05 -8.695e-06 -2.518e-06 7.420e-07 1.295e-05
## hwf_upp -4.280e-06 9.079e-07 3.922e-06 7.942e-06 2.115e-05
## unemployment_rate -1.663e-04 -5.139e-05 -1.763e-05 1.160e-06 6.307e-05
##
## ========================================================
## Total:
##
## Iterations = 1:1000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## deflactor 8.601e-06 4.120e-05 1.303e-06 1.303e-06
## lifeexp -1.532e-04 3.211e-04 1.015e-05 1.015e-05
## log(population) -2.912e-03 1.102e-03 3.483e-05 3.483e-05
## politicalstability -6.917e-03 2.754e-03 8.710e-05 7.995e-05
## conflictpercapita 6.491e+00 2.768e+00 8.752e-02 8.752e-02
## vulnerability 1.228e-02 3.247e-02 1.027e-03 1.027e-03
## log(natural_disaster) 2.550e-04 5.972e-04 1.888e-05 1.888e-05
## t2m_diff 5.459e-03 3.599e-03 1.138e-04 1.138e-04
## dry -5.448e-05 4.504e-05 1.424e-06 1.424e-06
## log(prec_5days) -2.510e-03 3.640e-03 1.151e-04 1.037e-04
## ghwr_35 -3.569e-05 5.755e-05 1.820e-06 1.741e-06
## hwf_upp 4.049e-05 3.215e-05 1.017e-06 9.635e-07
## unemployment_rate -2.152e-04 3.001e-04 9.489e-06 8.983e-06
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## deflactor -7.364e-05 -1.839e-05 7.765e-06 3.632e-05 8.822e-05
## lifeexp -8.278e-04 -3.542e-04 -1.523e-04 6.347e-05 4.661e-04
## log(population) -5.290e-03 -3.595e-03 -2.906e-03 -2.167e-03 -8.746e-04
## politicalstability -1.244e-02 -8.774e-03 -6.870e-03 -5.090e-03 -1.660e-03
## conflictpercapita 1.509e+00 4.597e+00 6.266e+00 8.301e+00 1.220e+01
## vulnerability -5.281e-02 -8.493e-03 1.346e-02 3.268e-02 7.647e-02
## log(natural_disaster) -9.323e-04 -1.539e-04 2.384e-04 6.589e-04 1.402e-03
## t2m_diff -1.430e-03 2.857e-03 5.440e-03 7.840e-03 1.258e-02
## dry -1.465e-04 -8.301e-05 -5.448e-05 -2.441e-05 2.776e-05
## log(prec_5days) -9.686e-03 -4.903e-03 -2.586e-03 1.439e-04 4.210e-03
## ghwr_35 -1.469e-04 -7.238e-05 -3.412e-05 3.965e-06 7.235e-05
## hwf_upp -2.314e-05 2.003e-05 4.000e-05 6.145e-05 1.020e-04
## unemployment_rate -8.217e-04 -4.143e-04 -2.049e-04 -2.287e-05 3.879e-04
##
## ========================================================
## Simulated standard errors
## Direct Indirect Total
## deflactor 3.581522e-05 6.482323e-06 4.120391e-05
## lifeexp 2.752047e-04 5.643669e-05 3.210676e-04
## log(population) 9.458183e-04 3.207956e-04 1.101541e-03
## politicalstability 2.383518e-03 7.681179e-04 2.754226e-03
## conflictpercapita 2.266167e+00 7.914949e-01 2.767734e+00
## vulnerability 2.794553e-02 5.589658e-03 3.247453e-02
## log(natural_disaster) 5.177133e-04 9.770692e-05 5.971595e-04
## t2m_diff 3.093545e-03 7.686788e-04 3.598603e-03
## dry 3.877120e-05 8.786687e-06 4.503776e-05
## log(prec_5days) 3.135888e-03 6.493175e-04 3.640427e-03
## ghwr_35 4.938657e-05 1.036498e-05 5.755202e-05
## hwf_upp 2.783956e-05 6.394558e-06 3.214628e-05
## unemployment_rate 2.561253e-04 5.501368e-05 3.000632e-04
##
## Simulated z-values:
## Direct Indirect Total
## deflactor 0.2084960 0.1748195 0.2087318
## lifeexp -0.4823058 -0.3628129 -0.4771854
## log(population) -2.6764483 -1.1866022 -2.6436506
## politicalstability -2.5261836 -1.1664236 -2.5114694
## conflictpercapita 2.4701863 1.1280277 2.3451249
## vulnerability 0.3802701 0.2957021 0.3781339
## log(natural_disaster) 0.4289272 0.3375808 0.4270975
## t2m_diff 1.5325522 0.9345096 1.5170767
## dry -1.2174651 -0.8287593 -1.2097548
## log(prec_5days) -0.6967877 -0.5011698 -0.6896076
## ghwr_35 -0.6299611 -0.4414243 -0.6200820
## hwf_upp 1.2718559 0.7949962 1.2596035
## unemployment_rate -0.7249994 -0.5372292 -0.7173343
##
## Simulated p-values:
## Direct Indirect Total
## deflactor 0.8348417 0.86122 0.8346576
## lifeexp 0.6295887 0.71674 0.6332301
## log(population) 0.0074407 0.23538 0.0082017
## politicalstability 0.0115309 0.24344 0.0120230
## conflictpercapita 0.0135043 0.25931 0.0190207
## vulnerability 0.7037449 0.76746 0.7053311
## log(natural_disaster) 0.6679762 0.73568 0.6693083
## t2m_diff 0.1253862 0.35004 0.1292473
## dry 0.2234273 0.40724 0.2263730
## log(prec_5days) 0.4859357 0.61625 0.4904410
## ghwr_35 0.5287200 0.65891 0.5352038
## hwf_upp 0.2034243 0.42662 0.2078124
## unemployment_rate 0.4684524 0.59111 0.4731678
The decomposition of impacts for the spatial lag model reveals that certain variables have significant direct effects on emigration, with limited spillover to neighboring countries. For instance, the log of population shows a significant negative direct effect (\(\approx\) -0.00247, p \(\approx\) 0.0064) and a small negative indirect effect, resulting in a total effect of about -0.00283. This suggests that larger populations are associated with lower per capita emigration, predominantly through local effects. Similarly, political stability has a significant negative impact, with a direct effect of approximately -0.00565 (p \(\approx\) 0.0125) and a modest indirect effect, totaling around -0.00647.
In contrast, conflict per capita exhibits a robust positive direct effect (\(\approx\) 6.006, p \(\approx\) 0.0078) and a significant total effect of about 6.87, indicating that higher levels of conflict substantially increase emigration mainly through its impact on the country itself rather than through spatial spillovers. Other variables such as life expectancy, vulnerability, and the climatic and disaster measures do not show significant direct or indirect effects, suggesting that, in this model, the main drivers of emigration are demographic size, political conditions, and conflict intensity.
Then, we present the results of the models chosen for the
immigrates dependent variable:
For the immigrates dependent variable, we choose the SEM
and SDM models and the results are presented below:
stargazer::stargazer(sem_immi, durb_immi, type = "text")
##
## ======================================================
## Dependent variable:
## ----------------------------
## immigrates
## spatial spatial
## error autoregressive
## (1) (2)
## ------------------------------------------------------
## deflactor 0.00002 0.00002
## (0.00004) (0.00004)
##
## lifeexp 0.0005 0.001
## (0.0004) (0.0004)
##
## log(population) -0.004*** -0.004***
## (0.001) (0.001)
##
## politicalstability 0.003 0.005*
## (0.003) (0.003)
##
## conflictpercapita 1.091 -0.110
## (2.518) (2.620)
##
## vulnerability -0.102*** -0.081**
## (0.033) (0.032)
##
## log(natural_disaster) -0.0001 0.0005
## (0.001) (0.001)
##
## t2m_diff -0.0004 -0.014**
## (0.004) (0.006)
##
## dry 0.0001 -0.0001
## (0.0001) (0.0001)
##
## log(prec_5days) 0.007* 0.008**
## (0.004) (0.004)
##
## ghwr_35 0.0002** 0.0001
## (0.0001) (0.0001)
##
## hwf_upp -0.00000 0.0001*
## (0.00004) (0.0001)
##
## unemployment_rate -0.0004 -0.0003
## (0.0003) (0.0003)
##
## lag.deflactor 0.00004
## (0.0001)
##
## lag.lifeexp -0.001
## (0.001)
##
## lag.log(population) 0.002
## (0.002)
##
## lag.politicalstability -0.002
## (0.005)
##
## lag.conflictpercapita -1.502
## (5.734)
##
## lag.vulnerability 0.021
## (0.058)
##
## lag.log(natural_disaster) 0.003***
## (0.001)
##
## lag.t2m_diff 0.027***
## (0.008)
##
## lag.dry 0.0001
## (0.0001)
##
## lag.log(prec_5days) -0.017**
## (0.007)
##
## lag.ghwr_35 0.0001
## (0.0001)
##
## lag.hwf_upp -0.00001
## (0.0001)
##
## lag.unemployment_rate -0.001
## (0.001)
##
## Constant 0.056 0.072
## (0.044) (0.070)
##
## ------------------------------------------------------
## Observations 232 232
## Log Likelihood 571.513 594.153
## sigma2 0.0004 0.0003
## Akaike Inf. Crit. -1,111.026 -1,130.306
## Wald Test (df = 1) 32.805*** 6.346**
## LR Test (df = 1) 17.528*** 5.137**
## ======================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The spatial error model for immigration (SEM) shows that the local factors have effects similar to those in the OLS model, but with spatially correlated errors. In this SEM specification, log(population) remains strongly negative (\(\approx\) -0.004, significant at the 1% level), suggesting that larger populations are associated with lower per capita immigration. Vulnerability has a highly significant negative coefficient (\(\approx\) -0.109, p<0.01), indicating that more vulnerable countries tend to have lower immigration rates. Other variables, such as deflactor, life expectancy, and conflict per capita, are not statistically significant in this model. The overall model fit, as judged by the Akaike Information Criterion (AIC) and the LR test (which is significant at the 1% level), indicates that incorporating spatial error dependence improves the model compared to a non-spatial approach.
In contrast, the SDM model for immigration incorporates a spatially lagged dependent variable, capturing spillover effects from neighboring countries. In the SDM model, log(population) again shows a significant negative impact (\(\approx\) 0.004*), while political stability turns positive and significant (\(\approx\) 0.006, p<0.05), suggesting that more stable countries and their neighbors tend to attract immigrants. Moreover, t2m_diff becomes significantly negative (\(\approx\) -0.014, p<0.05) and log(prec_5days) remains significantly positive (\(\approx\) 0.008, p<0.05), highlighting the importance of climatic conditions. The significance of the spatial lag coefficients (as evidenced by the Wald and LR tests) confirms that immigration in one country is influenced by the conditions in surrounding countries, emphasizing the need to account for spatial spillovers in the model.
We then use the decomposition of direct and indirect impact for the SDM models:
# -------------------------------
# Decomposition of Impacts for the SDM model
# -------------------------------
# Compute impacts using 1000 simulations for significance testing
durb_impacts <- impacts(durb_immi, listw = lw, R = 1000)
summary_durb_impacts <- summary(durb_impacts, zstats = TRUE)
print(summary_durb_impacts)
## Impact measures (mixed, exact):
## Direct Indirect Total
## deflactor 2.143830e-05 5.429563e-05 7.573393e-05
## lifeexp 5.171874e-04 -9.124008e-04 -3.952134e-04
## log(population) -3.888108e-03 1.240463e-03 -2.647645e-03
## politicalstability 5.423828e-03 -1.620435e-03 3.803393e-03
## conflictpercapita -1.803484e-01 -1.863284e+00 -2.043632e+00
## vulnerability -8.111942e-02 5.004561e-03 -7.611486e-02
## log(natural_disaster) 6.352551e-04 4.059301e-03 4.694556e-03
## t2m_diff -1.289289e-02 2.905414e-02 1.616125e-02
## dry -5.023592e-05 9.766855e-05 4.743262e-05
## log(prec_5days) 7.633884e-03 -1.840814e-02 -1.077426e-02
## ghwr_35 6.230691e-05 1.239295e-04 1.862364e-04
## hwf_upp 1.140365e-04 1.506605e-05 1.291025e-04
## unemployment_rate -3.818690e-04 -1.147578e-03 -1.529447e-03
## ========================================================
## Simulation results ( variance matrix):
## Direct:
##
## Iterations = 1:1000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## deflactor 2.115e-05 4.018e-05 1.271e-06 1.271e-06
## lifeexp 5.219e-04 3.660e-04 1.157e-05 1.133e-05
## log(population) -3.900e-03 1.028e-03 3.250e-05 3.103e-05
## politicalstability 5.242e-03 2.739e-03 8.661e-05 8.661e-05
## conflictpercapita -1.637e-01 2.708e+00 8.563e-02 8.965e-02
## vulnerability -8.070e-02 3.221e-02 1.018e-03 1.018e-03
## log(natural_disaster) 6.485e-04 5.838e-04 1.846e-05 1.846e-05
## t2m_diff -1.313e-02 5.715e-03 1.807e-04 1.807e-04
## dry -4.857e-05 6.315e-05 1.997e-06 1.997e-06
## log(prec_5days) 7.446e-03 3.707e-03 1.172e-04 1.250e-04
## ghwr_35 5.869e-05 5.666e-05 1.792e-06 1.680e-06
## hwf_upp 1.154e-04 6.065e-05 1.918e-06 1.918e-06
## unemployment_rate -3.895e-04 3.208e-04 1.015e-05 9.670e-06
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## deflactor -5.264e-05 -7.425e-06 2.171e-05 4.738e-05 1.026e-04
## lifeexp -1.941e-04 2.747e-04 5.346e-04 7.606e-04 1.233e-03
## log(population) -5.943e-03 -4.601e-03 -3.907e-03 -3.179e-03 -1.962e-03
## politicalstability -4.429e-04 3.402e-03 5.405e-03 7.110e-03 1.056e-02
## conflictpercapita -5.387e+00 -2.047e+00 -2.644e-01 1.724e+00 5.009e+00
## vulnerability -1.442e-01 -1.035e-01 -7.978e-02 -5.909e-02 -1.575e-02
## log(natural_disaster) -4.654e-04 2.527e-04 6.670e-04 1.050e-03 1.790e-03
## t2m_diff -2.382e-02 -1.713e-02 -1.329e-02 -9.120e-03 -1.697e-03
## dry -1.667e-04 -9.437e-05 -4.676e-05 -6.513e-06 7.387e-05
## log(prec_5days) 4.002e-04 4.932e-03 7.423e-03 9.857e-03 1.502e-02
## ghwr_35 -5.458e-05 2.182e-05 5.910e-05 9.776e-05 1.654e-04
## hwf_upp -3.314e-06 7.162e-05 1.143e-04 1.595e-04 2.312e-04
## unemployment_rate -1.018e-03 -6.102e-04 -3.904e-04 -1.687e-04 2.525e-04
##
## ========================================================
## Indirect:
##
## Iterations = 1:1000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## deflactor 5.322e-05 1.901e-04 6.011e-06 5.625e-06
## lifeexp -9.230e-04 6.682e-04 2.113e-05 2.113e-05
## log(population) 1.313e-03 2.255e-03 7.131e-05 7.131e-05
## politicalstability -1.548e-03 5.867e-03 1.855e-04 1.855e-04
## conflictpercapita -2.214e+00 7.164e+00 2.266e-01 2.266e-01
## vulnerability 4.800e-03 7.031e-02 2.223e-03 2.223e-03
## log(natural_disaster) 4.090e-03 1.548e-03 4.896e-05 4.896e-05
## t2m_diff 2.952e-02 7.904e-03 2.499e-04 2.499e-04
## dry 9.758e-05 1.210e-04 3.825e-06 3.825e-06
## log(prec_5days) -1.836e-02 7.801e-03 2.467e-04 2.357e-04
## ghwr_35 1.202e-04 1.398e-04 4.421e-06 4.608e-06
## hwf_upp 1.567e-05 7.607e-05 2.405e-06 2.584e-06
## unemployment_rate -1.163e-03 7.007e-04 2.216e-05 2.330e-05
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## deflactor -2.871e-04 -8.273e-05 4.775e-05 1.774e-04 0.0004325
## lifeexp -2.239e-03 -1.342e-03 -9.271e-04 -4.885e-04 0.0004339
## log(population) -3.287e-03 -2.181e-04 1.243e-03 2.798e-03 0.0057376
## politicalstability -1.385e-02 -5.096e-03 -1.672e-03 1.928e-03 0.0101295
## conflictpercapita -1.580e+01 -6.912e+00 -2.632e+00 2.612e+00 12.0509442
## vulnerability -1.314e-01 -4.343e-02 4.203e-03 5.176e-02 0.1400848
## log(natural_disaster) 1.202e-03 3.046e-03 4.042e-03 5.132e-03 0.0072582
## t2m_diff 1.402e-02 2.394e-02 2.933e-02 3.492e-02 0.0448375
## dry -1.269e-04 1.322e-05 9.804e-05 1.816e-04 0.0003414
## log(prec_5days) -3.342e-02 -2.389e-02 -1.836e-02 -1.314e-02 -0.0035074
## ghwr_35 -1.609e-04 2.674e-05 1.186e-04 2.140e-04 0.0003789
## hwf_upp -1.277e-04 -3.617e-05 1.451e-05 6.655e-05 0.0001657
## unemployment_rate -2.616e-03 -1.596e-03 -1.141e-03 -7.084e-04 0.0002268
##
## ========================================================
## Total:
##
## Iterations = 1:1000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## deflactor 7.437e-05 2.069e-04 6.542e-06 6.542e-06
## lifeexp -4.011e-04 6.380e-04 2.017e-05 1.860e-05
## log(population) -2.586e-03 2.584e-03 8.170e-05 8.170e-05
## politicalstability 3.694e-03 6.213e-03 1.965e-04 1.965e-04
## conflictpercapita -2.378e+00 8.529e+00 2.697e-01 2.697e-01
## vulnerability -7.590e-02 7.818e-02 2.472e-03 2.472e-03
## log(natural_disaster) 4.739e-03 1.763e-03 5.574e-05 5.574e-05
## t2m_diff 1.640e-02 5.817e-03 1.839e-04 1.839e-04
## dry 4.901e-05 1.065e-04 3.368e-06 3.368e-06
## log(prec_5days) -1.092e-02 7.648e-03 2.419e-04 2.306e-04
## ghwr_35 1.788e-04 1.469e-04 4.646e-06 4.646e-06
## hwf_upp 1.311e-04 5.216e-05 1.649e-06 1.813e-06
## unemployment_rate -1.552e-03 7.418e-04 2.346e-05 2.346e-05
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## deflactor -3.012e-04 -6.989e-05 7.078e-05 2.084e-04 4.849e-04
## lifeexp -1.648e-03 -8.080e-04 -3.897e-04 2.084e-05 7.847e-04
## log(population) -7.692e-03 -4.217e-03 -2.664e-03 -8.215e-04 2.726e-03
## politicalstability -8.415e-03 -3.701e-04 3.628e-03 7.874e-03 1.599e-02
## conflictpercapita -1.822e+01 -8.011e+00 -2.633e+00 3.491e+00 1.473e+01
## vulnerability -2.196e-01 -1.301e-01 -7.634e-02 -2.266e-02 7.923e-02
## log(natural_disaster) 1.401e-03 3.534e-03 4.711e-03 5.984e-03 8.017e-03
## t2m_diff 4.646e-03 1.254e-02 1.643e-02 2.020e-02 2.752e-02
## dry -1.517e-04 -2.572e-05 4.647e-05 1.192e-04 2.644e-04
## log(prec_5days) -2.604e-02 -1.613e-02 -1.078e-02 -5.796e-03 3.647e-03
## ghwr_35 -1.221e-04 8.668e-05 1.798e-04 2.783e-04 4.476e-04
## hwf_upp 3.010e-05 9.527e-05 1.315e-04 1.661e-04 2.350e-04
## unemployment_rate -3.094e-03 -2.032e-03 -1.557e-03 -1.074e-03 -6.832e-05
##
## ========================================================
## Simulated standard errors
## Direct Indirect Total
## deflactor 4.017944e-05 1.900947e-04 2.068625e-04
## lifeexp 3.659666e-04 6.681873e-04 6.379875e-04
## log(population) 1.027761e-03 2.254919e-03 2.583682e-03
## politicalstability 2.738768e-03 5.866680e-03 6.213434e-03
## conflictpercapita 2.707866e+00 7.164425e+00 8.529174e+00
## vulnerability 3.220773e-02 7.031168e-02 7.818240e-02
## log(natural_disaster) 5.838395e-04 1.548387e-03 1.762604e-03
## t2m_diff 5.715171e-03 7.903544e-03 5.816873e-03
## dry 6.315157e-05 1.209611e-04 1.065109e-04
## log(prec_5days) 3.706901e-03 7.800640e-03 7.648074e-03
## ghwr_35 5.665554e-05 1.397935e-04 1.469131e-04
## hwf_upp 6.064983e-05 7.606655e-05 5.215567e-05
## unemployment_rate 3.208434e-04 7.007179e-04 7.418062e-04
##
## Simulated z-values:
## Direct Indirect Total
## deflactor 0.52635156 0.27997025 0.3595111
## lifeexp 1.42612554 -1.38141989 -0.6287475
## log(population) -3.79441250 0.58250074 -1.0009972
## politicalstability 1.91399641 -0.26379491 0.5945814
## conflictpercapita -0.06045129 -0.30903448 -0.2787782
## vulnerability -2.50559275 0.06826743 -0.9707997
## log(natural_disaster) 1.11078534 2.64163564 2.6885195
## t2m_diff -2.29696403 3.73547061 2.8186821
## dry -0.76912813 0.80671094 0.4601316
## log(prec_5days) 2.00855165 -2.35422853 -1.4276780
## ghwr_35 1.03594241 0.85953195 1.2173782
## hwf_upp 1.90287511 0.20602028 2.5132512
## unemployment_rate -1.21383494 -1.65977313 -2.0928427
##
## Simulated p-values:
## Direct Indirect Total
## deflactor 0.59864396 0.77950033 0.7192127
## lifeexp 0.15383211 0.16714990 0.5295144
## log(population) 0.00014799 0.56022944 0.3168282
## politicalstability 0.05562061 0.79193796 0.5521233
## conflictpercapita 0.95179621 0.75729529 0.7804150
## vulnerability 0.01222463 0.94557275 0.3316480
## log(natural_disaster) 0.26666076 0.00825068 0.0071770
## t2m_diff 0.02162082 0.00018736 0.0048221
## dry 0.44181725 0.41983304 0.6454218
## log(prec_5days) 0.04458470 0.01856120 0.1533846
## ghwr_35 0.30022901 0.39004710 0.2234603
## hwf_upp 0.05705684 0.83677507 0.0119624
## unemployment_rate 0.22481076 0.09696010 0.0363632
The decomposition of impacts from the Spatial Durbin Model (SDM) for emigration provides insight into both the direct effects on a country’s own emigration rate and the indirect spillover effects on its neighbors. For example, the coefficient for log(population) is significantly negative in its direct impact (\(\approx\) -0.004, p<0.001), indicating that larger populations reduce per capita emigration locally, while its indirect effect is small and non‐significant, resulting in a significant overall reduction. Similarly, political stability has a positive direct effect (\(\approx\) 0.0056, p \(\approx 0.041\)) but a negative spillover that nearly offsets it, leading to a modest, non‐significant total effect. In contrast, vulnerability shows a significant negative direct impact (\(\approx\) -0.088, p \(\approx\) 0.0046), suggesting that increased vulnerability reduces emigration locally, though its indirect effect is negligible.
Other variables demonstrate interesting spillover dynamics. For instance, log(natural_disaster) has a non‐significant direct effect but a significant positive indirect effect (p \(\approx\) 0.003), leading to a significant overall positive total impact—implying that higher levels of natural disasters in one country may indirectly boost emigration in neighboring regions. Additionally, t2m_diff exhibits a significant negative direct effect (\(\approx\) -0.0126, p \(\approx\) 0.021) and a significant positive indirect effect (\(\approx\) 0.0283, p \(\approx\) 0.0005), culminating in a significant positive total effect (\(\approx\) 0.0157, p \(\approx\) 0.011), which suggests that temperature differences may dampen emigration locally while encouraging it regionally. Finally, while log(prec_5days) shows a marginally positive direct effect, its significant negative indirect effect (p \(\approx\) 0.031) results in a non-significant total effect, underscoring the complexity of spillover processes in environmental determinants of migration.
Let us now perform a similar spatial econometric analysis on the data observed during the period 1990-1995.
Let us first extract all the covariates regarding the 1990-1995 period:
my_covariates_9095 <- final_covariates[final_covariates$period == "1990-1995", ]
head(my_covariates_9095)
Then, we create the variables if total in and out flows to the country-level, again based on our reverse-negative estimate of the migrant flows that we chose in the first part of this analysis:
#Select the 1990-1995 period for the mig_data df
migration_9095 <- mig_data[mig_data$period == "1990-1995", ]
# 1. Aggregating emigration (outflows) by origin country
emigration_data_9095 <- migration_9095 %>%
group_by(origin) %>%
summarise(total_out = sum(reverse_neg))
# 2. Aggregating immigration (inflows) by destination country
immigration_data_9095 <- migration_9095 %>%
group_by(dest) %>%
summarise(total_in = sum(reverse_neg))
# 3. Merge emigration and immigration data on country code
migration_summary_9095 <- full_join(emigration_data_9095, immigration_data_9095,
by = c("origin" = "dest")) %>%
rename("Country Code" = origin) # Rename origin to Country
# View the final summary
print(migration_summary_9095)
## # A tibble: 232 × 3
## `Country Code` total_out total_in
## <chr> <dbl> <dbl>
## 1 ABW 514 8441
## 2 AFG 56179 2887291
## 3 AGO 65502 163499
## 4 AIA 89 762
## 5 ALB 230306 7439
## 6 AND 139 2695
## 7 ARE 31520 489190
## 8 ARG 187710 118912
## 9 ARM 321596 238645
## 10 ASM 744 3241
## # ℹ 222 more rows
Then we merge all the datasets we need for the rest of our analysis:
# Merging emigration and immigration data with country-level explanatory variables (my_covariates_period)
final_df_9095 <- left_join(my_covariates_9095, migration_summary_9095, by = "Country Code")
# Select relevant columns from world_contours_transformed
world_contours_select_9095 <- world_contours_transformed %>%
select(ISO3, VISUALIZATION_NAME, geometry)
# Join final_df with world_contours_select by Country Code and ISO3
world_9095 <- left_join(final_df_9095, world_contours_select_9095, by = c("Country Code" = "ISO3"))
# Sorting the data to get the top 10 countries by immigration rate
top_immigration_9095 <- world_9095 %>%
arrange(desc(total_in)) %>%
select(VISUALIZATION_NAME, total_in) %>%
head(10)
top_emigration_9095 <- world_9095 %>%
arrange(desc(total_out)) %>%
select(VISUALIZATION_NAME, total_out) %>%
head(10)
head(world_9095)
Finally, we create the two dependent variables of our models : emigrates and immigrates by dividing the total in and out flows by the population size. We also create the net variable that computes the differences between immigration rate and emigration rate.
# Calculate immigration, emigration rates and net migration
world_final_9095 <- world_9095 %>%
mutate(
emigrates = total_out / population, # Outflows per capita
immigrates = total_in / population, # Inflows per capita
net = immigrates - emigrates # Net migration per capita
)
# Sorting the data to get the top 10 countries by immigration rate
top_immigration_rate_9095 <- world_final_9095 %>%
arrange(desc(immigrates)) %>%
select(VISUALIZATION_NAME, immigrates) %>%
head(10)
# Sorting the data to get the top 10 countries by emigration rate
top_emigration_rate_9095<- world_final_9095 %>%
arrange(desc(emigrates)) %>%
select(VISUALIZATION_NAME, emigrates) %>%
head(10)
# Sorting the data to get the top 10 countries by net migration rate
top_net_migration_9095 <- world_final_9095 %>%
arrange(desc(net)) %>%
select(VISUALIZATION_NAME, net) %>%
head(10)
Now that we have constructed our dependent variables, lets us represent the top 10 countries having a high immigration rate and the top 10 countries having a high emigration rate and let us compare with the results we had for the 2020-2024 period:
# Scatter plot for top 10 countries with highest emigration rates
ggplot(top_emigration_rate_9095, aes(x = reorder(VISUALIZATION_NAME, emigrates), y = emigrates)) +
geom_point(shape = 1) +
coord_flip() +
labs(title = "Top Outflow",
x = "Country",
y = "Emigration Rate") +
theme_minimal()
# Scatter plot for top 10 countries with highest immigration rates
ggplot(top_immigration_rate_9095, aes(x = reorder(VISUALIZATION_NAME, immigrates), y = immigrates)) +
geom_point(shape = 1) +
coord_flip() +
labs(title = "Top Inflow",
x = "Country",
y = "Immigration Rate") +
theme_minimal()
ggplot(top_net_migration_9095, aes(x = reorder(VISUALIZATION_NAME, net), y = net)) +
geom_point(shape = 1) +
coord_flip() +
labs(title = "Top 10 Countries by Net Migration Rate",
x = "Country",
y = "Net Migration Rate (Immigration - Emigration)") +
theme_minimal()
First, if we focus on the emigration rate top 10 countries, we observe that key countries include Rwanda, Bhutan or Bosnia and Herzegovina. We can imagine that these high rates of emigration could be due to the different wars and conflicts that happened in that period such as the Rwandan Genocide (1994) or the Bosnian War (1992-1995) or due to high political instability for other countries.
Compared to the 1990-1995 period, the 2020-2024 had other primary drivers of emigration than war or political instability (Ukraine), such as climate change (that at first glance does not seem to be the main driver in 1990-1996). Indeed countries like Tonga and Samoa could have faced increased emigration due to climate-related impacts such as sea-level rise and extreme weather events.
So, while economic and political instability as well as war conflits seem to continue to drive emigration, we have climate change that seems to become an important factor in recent years, particularly affecting island nations vulnerable to environmental changes. We will see in this analysis if it is really the case.
Finally, to see these rates more visually, let us plot the choroplet maps of the immigration and emigration rate for the 1940-1950 period:
# Ensure world_contours_transformed is in the correct format for ggplot (sf object)
world_final_transformed_9095 <- st_as_sf(world_final_9095)
# Ensure world_final_transformed has a valid CRS
world_final_transformed_crs_9095 <- st_transform(world_final_transformed_9095, crs = "ESRI:54030")
# Create a choropleth map for immigrates (immigration rates per capita)
immigration_map_9095 <- ggplot(world_final_transformed_9095) +
geom_sf(aes(fill = immigrates), color = NA) +
scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
labs(title = "Immigration Rates (per capita)",
fill = "Immigrates") +
theme_minimal()
immigration_map_9095
# Create a choropleth map for emigrates (emigration rates per capita)
emigration_map_9095 <- ggplot(world_final_transformed_9095) +
geom_sf(aes(fill = emigrates), color = NA) +
scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
labs(title = "Emigration Rates (per capita)",
fill = "Emigrates") +
theme_minimal()
emigration_map_9095
# Create a choropleth map for net migration (net rates per capita)
net_map_9095 <- ggplot(world_final_transformed_9095) +
geom_sf(aes(fill = net), color = NA) +
scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
labs(title = "Net Rates (per capita)",
fill = "Net") +
theme_minimal()
net_map_9095
Before deep diving into the econometric analysis, let us first have a look at the initial correlation between our variables.
# First, select the numeric columns relevant for the analysis.
numeric_data_9095 <- world_final_9095 %>%
select_if(is.numeric)
# Compute the correlation matrix using complete observations
cor_matrix_9095 <- cor(numeric_data_9095, use = "complete.obs")
# Visualize the correlation matrix focusing on the dependent variables
corrplot(cor_matrix_9095, method = "color", tl.cex = 0.8)
# First, select only the numeric explanatory variables (exclude dependent vars and the vars used to build them)
explanatory_vars_9095 <- world_final_9095 %>%
select_if(is.numeric) %>%
select(-emigrates,-total_out, -total_in, -immigrates, -net)
# Calculate correlations with emigrates and immigrates
cor_with_emigrates_9095 <- sapply(explanatory_vars_9095, function(x) cor(x, world_final_9095$emigrates, use = "complete.obs"))
cor_with_immigrates_9095 <- sapply(explanatory_vars_9095, function(x) cor(x, world_final_9095$immigrates, use = "complete.obs"))
# Identify the variable with the highest absolute correlation for emigrates
max_corr_emigrates_9095 <- names(which.max(abs(cor_with_emigrates_9095)))
max_corr_value_emigrates_9095 <- cor_with_emigrates_9095[max_corr_emigrates_9095]
# Identify the variable with the highest absolute correlation for immigrates
max_corr_immigrates_9095 <- names(which.max(abs(cor_with_immigrates_9095)))
max_corr_value_immigrates_9095 <- cor_with_immigrates_9095[max_corr_immigrates_9095]
# Display the results
cat("The variable most correlated with emigrates is:", max_corr_emigrates_9095,
"with a correlation of", round(max_corr_value_emigrates_9095, 3), "\n")
## The variable most correlated with emigrates is: lifeexp with a correlation of -0.255
cat("The variable most correlated with immigrates is:", max_corr_immigrates_9095,
"with a correlation of", round(max_corr_value_immigrates_9095, 3), "\n")
## The variable most correlated with immigrates is: dry with a correlation of 0.17
We can see that the emigrates and immigrates variables are not that correlated with the others. The variable most correlated with the emigration rate is the life expectancy with a coefficient of correlation of -0.252. This negative correlation suggests that countries with lower life expectancy tend to have higher emigration rates. This relationship may indicate that factors contributing to lower life expectancy, such as economic instability, or conflict, could drive emigration as individuals seek better living conditions elsewhere.
On the other hand, the variable most correlated with immigration rates is the average number of dry days during the year, with a correlation coefficient of 0.16. This positive correlation suggests that countries with more dry days tend to experience slightly higher immigration rates. This could be due to economic opportunities (as seen above in the top 3 countries receiving the most migrants with the United Arab Emirates or Qatar) or specific policies that attract immigrants despite the climatic conditions.
Also, to continue our comparison of climate events between the 1990-1995 period and the 2020-2024 period one, let us represent the average annual difference temperature (in celsius) during those two periods:
# Select the climatic variable to visualize
climatic_var <- "t2m_diff"
# Create the choropleth map (1990-1995)
ggplot(world_final_transformed_9095) +
geom_sf(aes(fill = !!sym(climatic_var)), color = NA) +
scale_fill_viridis_c(option = "magma", na.value = "grey90") +
labs(title = paste("Global Distribution of temperature difference (1990-1995)"),
fill = climatic_var) +
theme_minimal()
# Create the choropleth map (2020-2024)
ggplot(world_final_transformed) +
geom_sf(aes(fill = !!sym(climatic_var)), color = NA) +
scale_fill_viridis_c(option = "magma", na.value = "grey90") +
labs(title = paste("Global Distribution of temperature difference (2020-2024)"),
fill = climatic_var) +
theme_minimal()
The comparison of these maps clearly shows the growing impact of climate change on global temperatures. As environmental conditions become more challenging in certain regions, it is likely that climate-related factors will play an increasingly significant role in shaping migration patterns. Understanding these trends will thus be crucial for developing policies that address both the causes and consequences of climate-induced migration.
Regarding the choice of the spatial weight matrix, we consider the
same weight matrix as in the 2020-2024 analysis, the matric lw.
Indeed, the contiguity-based neighbors list (contiguity_nb)
will remain the same regardless of the time period because geographic
borders typically do not change over short periods like 1990-1995
vs. 2020-2024. Also, the centroids of countries generally do not change
over time unless there are significant geopolitical changes (but that
are not taken into account here). Thus, since the geographic data
(borders and centroids) remained unchanged, the combined neighbors list
remains the same.
Let us now directly plot a Moran scatterplot and perform a Moran test for both of our dependent variables emigrates and immigrates:
For the emigrates variable:
# Generate the Moran scatter plot using spdep's moran.plot
moran.plot(world_final_transformed_crs_9095$emigrates, lw,
labels = world_final_transformed_crs_9095$`Country Code`,
pch = 19)
# Perform Moran's I test on the dependent variable 'emigrates'
moran_test <- moran.test(world_final_transformed_crs_9095$emigrates, lw)
# Print the test results
print(moran_test)
##
## Moran I test under randomisation
##
## data: world_final_transformed_crs_9095$emigrates
## weights: lw
##
## Moran I statistic standard deviate = 0.87666, p-value = 0.1903
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.033110699 -0.004329004 0.001823921
It seems there is no or slight positive spatial correlation for the variable emigrates when looking at the Moran scatterplot. However, the Moran test’s result confirms that, since we do not reject the null hypothesis (p-value > 0.05) we do not have spatial correlation and the OLM model might be well adapted.
For the immigrates variable:
# Generate the Moran scatter plot using spdep's moran.plot
moran.plot(world_final_transformed_crs_9095$immigrates, lw,
labels = world_final_transformed_crs_9095$`Country Code`,
pch = 19)
# Perform Moran's I test on the dependent variable 'emigrates'
moran_test <- moran.test(world_final_transformed_crs_9095$immigrates, lw)
# Print the test results
print(moran_test)
##
## Moran I test under randomisation
##
## data: world_final_transformed_crs_9095$immigrates
## weights: lw
##
## Moran I statistic standard deviate = 3.659, p-value = 0.0001266
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.151335996 -0.004329004 0.001809931
Looking at the Moran scatterplot, there seems to have positive spatial correlation for the immigrates data.
The p-value of the Moran test being smaller than 0.05, we thus have that immigration rates are significantly spatially correlated.
Let us start by fitting two OLM models (one with the emigrates variable as dependent variable and one with the immigrates variable as dependent one). As we have done and explained for the 2020-2024 period, we also take the logarithm of the variables population, natural disaster, and precipitation.
# OLS model for emigration
model_emigrates_9095 <- lm(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate,
data = world_final_transformed_crs_9095)
# OLS model for immigration
model_immigrates_9095 <- lm(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate,
data = world_final_transformed_crs_9095)
# Print model summaries
summary(model_emigrates_9095)
##
## Call:
## lm(formula = emigrates ~ deflactor + lifeexp + log(population) +
## politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
## ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.039672 -0.016802 -0.005574 0.005327 0.160641
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.965e-01 5.437e-02 3.614 0.000374 ***
## deflactor 4.654e-06 5.039e-06 0.924 0.356726
## lifeexp -1.298e-03 3.417e-04 -3.799 0.000188 ***
## log(population) -2.085e-03 1.602e-03 -1.302 0.194427
## politicalstability 6.689e-03 7.567e-03 0.884 0.377701
## conflictpercapita 6.037e+00 1.294e+01 0.467 0.641223
## vulnerability -6.325e-03 4.687e-02 -0.135 0.892775
## log(natural_disaster) -8.496e-04 7.894e-04 -1.076 0.283004
## t2m_diff -8.191e-03 9.870e-03 -0.830 0.407514
## dry -4.298e-05 5.858e-05 -0.734 0.463965
## log(prec_5days) -1.142e-02 5.091e-03 -2.244 0.025827 *
## ghwr_35 -1.361e-04 7.499e-05 -1.815 0.070850 .
## hwf_upp 8.206e-04 4.578e-04 1.793 0.074405 .
## unemployment_rate -4.166e-05 3.837e-04 -0.109 0.913625
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02898 on 218 degrees of freedom
## Multiple R-squared: 0.1659, Adjusted R-squared: 0.1161
## F-statistic: 3.334 on 13 and 218 DF, p-value: 0.0001143
summary(model_immigrates_9095)
##
## Call:
## lm(formula = immigrates ~ deflactor + lifeexp + log(population) +
## politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
## ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.065551 -0.017981 -0.006006 0.007278 0.199524
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.577e-01 6.135e-02 2.570 0.01084 *
## deflactor 6.762e-07 5.686e-06 0.119 0.90545
## lifeexp -1.930e-04 3.855e-04 -0.501 0.61716
## log(population) -8.125e-03 1.807e-03 -4.496 1.12e-05 ***
## politicalstability -3.897e-03 8.538e-03 -0.456 0.64856
## conflictpercapita 2.113e+01 1.460e+01 1.447 0.14925
## vulnerability -1.120e-01 5.289e-02 -2.118 0.03533 *
## log(natural_disaster) 1.244e-03 8.907e-04 1.397 0.16380
## t2m_diff 1.402e-03 1.114e-02 0.126 0.89994
## dry 1.919e-04 6.610e-05 2.903 0.00408 **
## log(prec_5days) 7.175e-03 5.744e-03 1.249 0.21295
## ghwr_35 6.624e-06 8.462e-05 0.078 0.93767
## hwf_upp -4.285e-04 5.165e-04 -0.830 0.40769
## unemployment_rate -1.211e-03 4.329e-04 -2.796 0.00563 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0327 on 218 degrees of freedom
## Multiple R-squared: 0.2177, Adjusted R-squared: 0.1711
## F-statistic: 4.667 on 13 and 218 DF, p-value: 4.307e-07
The OLS model for emigration explains approximately 11.26% of the variance (Adjusted R² = 0.1126).
The analysis of the OLS regression results for the 1990-1995 period highlights the importance of life expectancy (emigration model), population (both models) and vulnerability (immigration model) in shaping migration patterns. While climate variables such as temperature difference and the number of dry days do not show strong statistical significance in the emigration model, the number of dry days is significantly associated with higher immigration rates. This suggests that while climate factors may not have been primary drivers of emigration in the early 1990s, they could have influenced immigration patterns, possibly due to economic opportunities in regions with specific climatic conditions.
Let us now check for spatial autocorrelation of the residuals :
# Extract residuals
residuals_emigrates_9095 <- residuals(model_emigrates_9095)
residuals_immigrates_9095 <- residuals(model_immigrates_9095)
# Moran I test for spatial autocorrelation in residuals
moran_resid_emigrates_9095 <- moran.test(residuals_emigrates_9095, lw)
moran_resid_immigrates_9095 <- moran.test(residuals_immigrates_9095, lw)
# Print results
print(moran_resid_emigrates_9095)
##
## Moran I test under randomisation
##
## data: residuals_emigrates_9095
## weights: lw
##
## Moran I statistic standard deviate = 0.31776, p-value = 0.3753
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.009331343 -0.004329004 0.001848122
print(moran_resid_immigrates_9095)
##
## Moran I test under randomisation
##
## data: residuals_immigrates_9095
## weights: lw
##
## Moran I statistic standard deviate = 1.0598, p-value = 0.1446
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.041017034 -0.004329004 0.001830805
Based on the Moran’s I test results and the OLS regression outcomes, it appears that the OLS model is reasonably adapted for the emigration data, as there is no significant spatial autocorrelation in the residuals. This suggests that the OLS model captures the underlying patterns in the emigration data well, without significant spatial dependencies affecting the results.
However, for the immigration data, the Moran’s I test indicates weak evidence of positive spatial autocorrelation (at the 10% level) in the residuals. This suggests that there might be some spatial clustering in the immigration data that the OLS model does not fully capture.
We check in the following part which model would be the most adapted for modeling emigrates and immigrates dependent variables:
For the emigrates variable:
res_test_emigrates_9095 <- lm.RStests(model_emigrates_9095, lw, test = "all")
summary(res_test_emigrates_9095)
## Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
## dependence
## data:
## model: lm(formula = emigrates ~ deflactor + lifeexp + log(population) +
## politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) + ghwr_35 +
## hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)
## test weights: lw
##
## statistic parameter p.value
## RSerr 0.044408 1 0.8331
## RSlag 0.022206 1 0.8815
## adjRSerr 1.077875 1 0.2992
## adjRSlag 1.055673 1 0.3042
## SARMA 1.100081 2 0.5769
Since none of the models testes are significant, as anticipated in the last steps, we decide to keep an OLM to model the emigrates variable during the 1990-1995 period.
Let us check with the AIC criteria if we got the same result:
# SLX model
slx_1_9095 <- lm(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)
slx_2_9095 <- lmSLX(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095,
listw = lw)
# SEM model
sem_emi_9095 <- errorsarlm(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095,
listw = lw)
# Lag model
lagm_emi_9095 <- lagsarlm(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095,
listw = lw)
# SDM model
durb_emi_9095 <- lagsarlm(emigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095, listw = lw, Durbin = TRUE)
c(olm = AIC(model_emigrates_9095), slx = AIC(slx_2_9095), lag = AIC(lagm_emi_9095),
sem = AIC(sem_emi_9095), sdm = AIC(durb_emi_9095))
## olm slx lag sem sdm
## -969.1355 -953.1330 -967.1628 -967.1895 -951.1473
Again, the fact of choosing the OLM to model our emigrates variable is confirmed since it is the model that minimizes our AIC criterion.
For the immigrates variable:
res_test_immigrates_9095 <- lm.RStests(model_immigrates_9095, lw, test = "all")
summary(res_test_immigrates_9095)
## Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
## dependence
## data:
## model: lm(formula = immigrates ~ deflactor + lifeexp + log(population)
## + politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) + ghwr_35 +
## hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)
## test weights: lw
##
## statistic parameter p.value
## RSerr 0.85803 1 0.3543
## RSlag 1.81467 1 0.1779
## adjRSerr 0.73575 1 0.3910
## adjRSlag 1.69238 1 0.1933
## SARMA 2.55042 2 0.2794
Again, none of the tests are significant, suggesting that the OLM or an SLX model would be the most adapted to model immigration.
Let us check the AIC criterion:
# SLX model
slx_1_9095 <- lm(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)
slx_2_9095 <- lmSLX(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095,
listw = lw)
# SEM model
sem_immi_9095 <- errorsarlm(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095,
listw = lw)
# Lag model
lagm_immi_9095 <- lagsarlm(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095,
listw = lw)
# SDM model
durb_immi_9095 <- lagsarlm(immigrates ~ deflactor + lifeexp + log(population) +
politicalstability + conflictpercapita + vulnerability +
log(natural_disaster) + t2m_diff + dry + log(prec_5days) +
ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095, listw = lw, Durbin = TRUE)
c(olm = AIC(model_immigrates_9095), slx = AIC(slx_2_9095), lag = AIC(lagm_immi_9095),
sem = AIC(sem_immi_9095), sdm = AIC(durb_immi_9095))
## olm slx lag sem sdm
## -913.1087 -905.6168 -912.8047 -912.1174 -903.7170
The OLS model has a lower AIC than the SLX and SDM models, suggesting it is a better fit compared to these alternatives. However, the Spatial Lag (LAG) model has the lowest AIC, indicating it might be the best fit among the models considered.
Given the weak evidence of spatial lag dependence from the Rao’s score diagnostics (RSlag p-value = 0.1234), the Spatial Lag model could be a reasonable choice, though we have more evidence that the OLM model would be the most appropriate.
Finally, to conclude we have:
For the emigrates variable:
stargazer::stargazer(model_emigrates_9095,type = "text")
##
## =================================================
## Dependent variable:
## ---------------------------
## emigrates
## -------------------------------------------------
## deflactor 0.00000
## (0.00001)
##
## lifeexp -0.001***
## (0.0003)
##
## log(population) -0.002
## (0.002)
##
## politicalstability 0.007
## (0.008)
##
## conflictpercapita 6.037
## (12.937)
##
## vulnerability -0.006
## (0.047)
##
## log(natural_disaster) -0.001
## (0.001)
##
## t2m_diff -0.008
## (0.010)
##
## dry -0.00004
## (0.0001)
##
## log(prec_5days) -0.011**
## (0.005)
##
## ghwr_35 -0.0001*
## (0.0001)
##
## hwf_upp 0.001*
## (0.0005)
##
## unemployment_rate -0.00004
## (0.0004)
##
## Constant 0.197***
## (0.054)
##
## -------------------------------------------------
## Observations 232
## R2 0.166
## Adjusted R2 0.116
## Residual Std. Error 0.029 (df = 218)
## F Statistic 3.334*** (df = 13; 218)
## =================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Since we kept only the OLM model, we get the same results as already explained earlier above. For the immigrates variable:
stargazer::stargazer(model_immigrates_9095,lagm_immi_9095, type = "text")
##
## ============================================================
## Dependent variable:
## --------------------------------------
## immigrates
## OLS spatial
## autoregressive
## (1) (2)
## ------------------------------------------------------------
## deflactor 0.00000 0.00000
## (0.00001) (0.00001)
##
## lifeexp -0.0002 -0.0002
## (0.0004) (0.0004)
##
## log(population) -0.008*** -0.008***
## (0.002) (0.002)
##
## politicalstability -0.004 -0.005
## (0.009) (0.008)
##
## conflictpercapita 21.126 18.174
## (14.597) (14.094)
##
## vulnerability -0.112** -0.115**
## (0.053) (0.051)
##
## log(natural_disaster) 0.001 0.001
## (0.001) (0.001)
##
## t2m_diff 0.001 0.002
## (0.011) (0.011)
##
## dry 0.0002*** 0.0002***
## (0.0001) (0.0001)
##
## log(prec_5days) 0.007 0.007
## (0.006) (0.006)
##
## ghwr_35 0.00001 0.00001
## (0.0001) (0.0001)
##
## hwf_upp -0.0004 -0.0004
## (0.001) (0.0005)
##
## unemployment_rate -0.001*** -0.001***
## (0.0004) (0.0004)
##
## Constant 0.158** 0.158***
## (0.061) (0.059)
##
## ------------------------------------------------------------
## Observations 232 232
## R2 0.218
## Adjusted R2 0.171
## Log Likelihood 472.402
## sigma2 0.001
## Akaike Inf. Crit. -912.805
## Residual Std. Error 0.033 (df = 218)
## F Statistic 4.667*** (df = 13; 218)
## Wald Test 1.707 (df = 1)
## LR Test 1.696 (df = 1)
## ============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The coefficients, standard errors, and significance levels are almost identical between the OLS and LAG model, suggesting minimal spatial dependence in the data for this period (as expected). Thus we will focus on interpret in details the coefficients of the OLM model.
As for the precise interpretation for the 1990-1995 period for the OLM model we have: - log(population): Negative and significant, indicating that larger populations are associated with lower per capita immigration. - vulnerability: Negative and significant, suggesting that higher vulnerability is associated with lower immigration. - dry: Positive and significant, indicating that a higher number of dry days is associated with increased immigration. - unemployment_rate: Negative and significant, suggesting that higher unemployment rates are associated with lower immigration.
The variables t2m_diff, log(prec_5days) and ghwr_35 are not significant in the 1990-1995 period while they were in the 2020-2024 one, so that temperature differences are more influential in recent years, precipitation patterns are increasingly important and heat waves impact immigration in the recent years.
So to conclude on the impact of climatic variables on migratory flows, we have that while dry days were significant in the 1990-1995 period, temperature differences (t2m_diff) and precipitation patterns (log(prec_5days)) have become more influential in the 2020-2024 period. This shift suggests that climate change impacts are increasingly affecting migration patterns.
The aim of this part is to explain the bilateral migration flows using a gravity model. This model can be formalized as
\[ y = \iota N\alpha + X_O \beta_o + X_d \beta_d + G\omega + \epsilon \]
where \(\beta_o\), \(\beta_d\), and \(\omega\) are the vectors of parameters whose dimensions correspond to the number of variables in \(X_o\), \(X_d\), and \(G\), respectively.
\(X_o\) represents the characteristics of the origin countries
\(X_d\) represents the characteristics of the destination countries
\(G\) represents the measures related to a pair of countries, like distances between origin and destination, or indicators for a common spoken language
Let us now, from the migration data, merge the migration flow estimate we have chosen at the beginning of this analysis with explanatory variables.
Note that we will only keep the observations whose origin and
destination are present in the world database.
# Selecting only from the migration data (already filtered on 2020-2024 period) the variables we are interested in
mig_flow <- subset(migration, select = c(dest, origin, reverse_neg))
colnames(mig_flow)[3] <- "y"
# Merging the databases to get the covariates associated to each origin point
# Rename 'Country Code' to 'iso3' for clarity
covariates <- my_covariates_period
names(covariates)[names(covariates) == "Country Code"] <- "iso3"
# Prepare world list
world_codes <- world$'Country Code' # extract valid codes
# Create origin covariates with O_ prefix
cov_origin <- covariates %>%
rename_with(~ paste0("O_", .), .cols = -iso3)
# Create destination covariates with D_ prefix
cov_dest <- covariates %>%
rename_with(~ paste0("D_", .), .cols = -iso3)
# Merge origin covariates
mig_with_origin <- mig_flow %>%
left_join(cov_origin, by = c("origin" = "iso3"))
# Merge destination covariates
mig_full <- mig_with_origin %>%
left_join(cov_dest, by = c("dest" = "iso3"))
# Filter to include only countries in 'world'
data_od <- mig_full %>%
filter(origin %in% world_codes & dest %in% world_codes)
head(data_od)
#Get country centroids
centroids <- world_contours_transformed %>%
mutate(centroid = st_centroid(geometry)) %>%
select(ISO3, centroid)
# Join origin and destination centroids to your data_od
data_od <- data_od %>%
left_join(centroids, by = c("origin" = "ISO3")) %>%
rename(geometry_O = centroid) %>%
left_join(centroids, by = c("dest" = "ISO3")) %>%
rename(geometry_D = centroid)
# Compute pairwise distances (in meters, since Robinson is in meters)
data_od <- data_od %>%
mutate(
dist_m = st_distance(geometry_O, geometry_D, by_element = TRUE),
dist_km = as.numeric(dist_m) / 1000 # convert to kilometers
)
data_od <- data_od %>%
select(-geometry_O, -geometry_D)
Here, we prefer to consider the following model :
\[ log(y+1) = \iota N\alpha + X_O \beta_o + X_d \beta_d + G\omega + \epsilon \]
We prefer to use a log transformation of y in order to first deal
with zeros. Indeed we happen to have 0 migration flows between some
pairs. Log(0) being undefined, we prefer to use log(y + 1)
to handle it. Moreover this transformation allows to reduce the
right-skewness that we have in our migration data where a few country
pairs have large flows but most have small or zeros flows.
Also, this makes the coefficients interpretable as semi-elasticities, for exemple: “A 1-unit increase in destination life expectancy is associated with a X% increase in bilateral migration flows, ceteris paribus.”
data_od$y = log(data_od$y + 1)
df_gravity <- select(data_od, -dest, -origin, -O_period, -D_period, -geometry.x, -geometry.y, -dist_m)
gravity_model <- lm(y ~ ., data = df_gravity)
summary(gravity_model)
##
## Call:
## lm(formula = y ~ ., data = df_gravity)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.6979 -0.9096 -0.4408 0.1324 13.3283
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.611e+00 2.630e-01 6.125 9.13e-10 ***
## O_deflactor -7.864e-04 2.349e-04 -3.348 0.000816 ***
## O_lifeexp 1.347e-02 1.865e-03 7.223 5.17e-13 ***
## O_population 1.474e-09 8.336e-11 17.683 < 2e-16 ***
## O_politicalstability -2.610e-01 1.256e-02 -20.775 < 2e-16 ***
## O_conflictpercapita 3.169e+01 1.548e+01 2.047 0.040678 *
## O_vulnerability -2.180e+00 1.632e-01 -13.356 < 2e-16 ***
## O_natural_disaster 1.154e-08 1.208e-09 9.551 < 2e-16 ***
## O_t2m_diff 3.302e-02 2.048e-02 1.612 0.106912
## O_dry -1.239e-03 2.563e-04 -4.834 1.34e-06 ***
## O_prec_5days 5.090e-04 1.486e-04 3.425 0.000615 ***
## O_ghwr_35 -6.036e-04 3.277e-04 -1.842 0.065528 .
## O_hwf_upp -7.651e-04 1.889e-04 -4.051 5.11e-05 ***
## O_unemployment_rate -8.852e-03 1.704e-03 -5.194 2.06e-07 ***
## D_deflactor -1.434e-03 2.349e-04 -6.102 1.05e-09 ***
## D_lifeexp 1.553e-02 1.865e-03 8.326 < 2e-16 ***
## D_population 2.858e-10 8.336e-11 3.429 0.000607 ***
## D_politicalstability 2.053e-02 1.256e-02 1.635 0.102150
## D_conflictpercapita 2.429e+01 1.548e+01 1.569 0.116702
## D_vulnerability -3.297e+00 1.632e-01 -20.202 < 2e-16 ***
## D_natural_disaster 6.901e-09 1.208e-09 5.713 1.12e-08 ***
## D_t2m_diff 7.559e-02 2.048e-02 3.691 0.000223 ***
## D_dry -2.540e-04 2.563e-04 -0.991 0.321760
## D_prec_5days 4.506e-04 1.486e-04 3.032 0.002432 **
## D_ghwr_35 1.832e-04 3.277e-04 0.559 0.576075
## D_hwf_upp -2.882e-03 1.889e-04 -15.259 < 2e-16 ***
## D_unemployment_rate -1.407e-02 1.704e-03 -8.258 < 2e-16 ***
## dist_km -6.068e-05 1.549e-06 -39.176 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.814 on 53796 degrees of freedom
## Multiple R-squared: 0.1466, Adjusted R-squared: 0.1462
## F-statistic: 342.2 on 27 and 53796 DF, p-value: < 2.2e-16
The gravity model explains our bilateral migration flows not reasonably well with an adjusted R² of 0.15.
Most of the the covariates of the origin and destination country are statistically significant. The distance variables (dist_km) is also statistically significant at the 5% significant level.
As expected, distance between countries has a strong negative effect on migration, confirming that distance acts as a ‘friction’ to migration (probably due to costs, linguistic differences, etc.).
Origin country characteristics such as low political stability, high population, and higher natural disaster exposure are positively associated with emigration. Conversely, vulnerability and unemployment in the origin reduce migration, possibly due to immobility traps.
On the destination side, higher life expectancy, population, and political stability are strong pull factors, while higher vulnerability and unemployment discourage inflows.
# we add the residuals of our model to our dataframe
data_od$residuals <- residuals(gravity_model)
Let us now check for spatial correlation in the residuals using two Moran tests:
# Origin
residuals_origin <- data_od %>%
group_by(origin) %>%
summarise(mean_residual = mean(residuals))
test_origin <- moran.test(residuals_origin$mean_residual, lw)
print(test_origin)
##
## Moran I test under randomisation
##
## data: residuals_origin$mean_residual
## weights: lw
##
## Moran I statistic standard deviate = 1.3808, p-value = 0.08368
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.055957459 -0.004329004 0.001906360
The results of the Moran’s I test for the origin-aggregated residuals suggest some evidence of spatial autocorrelation, though it is quite weak. Specifically, the Moran’s I statistic is 0.0667, with a p-value of 0.05667. This p-value is slightly above the 5% significance threshold, indicating that the spatial autocorrelation is not statistically significant at the conventional 5% level, but it is close to being significant.
Given the Moran’s I statistic is positive, it suggests that there might be some spatial clustering in the residuals, meaning that the residuals at certain origins may be more similar to those at nearby origins than would be expected by chance.
The test result implies that while there may be some spatial structure in the residuals, there is no strong evidence to do implement a spatial regression approach for the origin side of the model.
residuals_dest <- data_od %>%
group_by(dest) %>%
summarise(mean_residual = mean(residuals))
test_dest <- moran.test(residuals_dest$mean_residual, lw)
print(test_dest)
##
## Moran I test under randomisation
##
## data: residuals_dest$mean_residual
## weights: lw
##
## Moran I statistic standard deviate = 3.5386, p-value = 0.0002011
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.149854206 -0.004329004 0.001898533
The Moran’s I test on destination-aggregated residuals indicates significant positive spatial autocorrelation at the 5% significance level (Moran’s I = 0.0795, p-value = 0.031), suggesting that residuals are not randomly distributed in space. This points to potential unobserved spatial factors influencing destination attractiveness, and may need the inclusion a spatial regression approach.
#Get the country-level neighborhood
#combined_nb <- union.nb(contiguity_nb, knn_nb)
# Convert to a row-standardized sparse matrix W (country-level)
W <- as(Matrix::Matrix(as.matrix(nb2mat(combined_nb, style = "W")), sparse = TRUE), "dgCMatrix")
#Build flow-level neighborhood matrices
n <- nrow(W) # number of countries
In <- diag(n)
# Kronecker products for flow-level spatial matrices
Wo <- kronecker(W, In) # origin-based
Wd <- kronecker(In, W) # destination-based
Ww <- kronecker(W, W) # origin-to-destination
Y <- matrix(data_od$residuals, nrow = n, ncol = n, byrow = TRUE)
y_vec <- as.vector(Y) # This is vec(Y), stacking columns
#Compute spatial lags
lag_Wo <- as.vector(Wo %*% y_vec)
lag_Wd <- as.vector(Wd %*% y_vec)
lag_Ww <- as.vector(Ww %*% y_vec)
Finally, we plot all three Moran scatterplots:
#Moran scatter plots
par(mfrow = c(1, 3))
# Wo plot
plot(y_vec, lag_Wo, pch = 19, main = "Wo-based Moran Plot", xlab = "residuals", ylab = "Wo x residuals")
abline(lm(lag_Wo ~ y_vec), col = "red")
# Wd plot
plot(y_vec, lag_Wd, pch = 19, main = "Wd-based Moran Plot", xlab = "residuals", ylab = "Wd x residuals")
abline(lm(lag_Wd ~ y_vec), col = "red")
# Ww plot
plot(y_vec, lag_Ww, pch = 19, main = "Ww-based Moran Plot", xlab = "residuals", ylab = "Ww x residuals")
abline(lm(lag_Ww ~ y_vec), col = "red")
What have we done here ? Well, to evaluate the presence of spatial autocorrelation in the residuals of the estimated model, Moran scatterplots were constructed using three different flow-based spatial weight matrices: one based on the neighborhoods of origin countries (Wo), one on destination countries (Wd), and one capturing the interaction of both origin and destination neighborhoods (Ww).
The Moran plot based on the Wo matrix (left) reveals a positive slope, indicating the potential presence of positive spatial autocorrelation among residuals when flows are grouped by their origin countries’ neighborhoods. This suggests that countries that are geographically proximate as senders of flows tend to exhibit similar residual patterns so that the model may be omitting spatially structured variables relevant to origin-based characteristics.
The Moran plot for the Wd matrix (middle), shows a slightly positive trend line, implying minimal or no spatial autocorrelation in the residuals based on destination-based proximity.
The Ww matrix (right), constructed as the Kronecker product of the origin and destination matrices, captures potential spatial autocorrelation driven by complex origin-to-destination interactions. The corresponding Moran plot shows no clear trend, with the regression line being nearly horizontal. This suggests that there is no systematic interaction-based spatial dependence between origin and destination pairs that is left unexplained by the model.
Abel, G.J., & Cohen, J.E. (2019). Bilateral international migration flow estimates for 200 countries. Scientific Data, 6, 82. https://doi.org/10.1038/s41597-019-0089-3
Berlemann, M., Haustein, E., & Steinhardt, M.F. (2021). From Stocks to Flows – Evidence for the Climate-Migration-Nexus. IZA Discussion Paper No. 14450. https://www.iza.org/publications/dp/14450/from-stocks-to-flows-evidence-for-the-climate-migration-nexus
Laurent, T., Margaretic, P., & Thomas-Agnan, C. (2022). Neighbouring countries and bilateral remittances: a global study. Spatial Economic Analysis, 17(4), 557-584. https://doi.org/10.1080/17421772.2022.2070656
World Bank. (n.d.). Unemployment, total (% of total labor force) (modeled ILO estimate). World Bank Open Data. https://data.worldbank.org/indicator/SL.UEM.TOTL.ZS