Required packages:

library("colorspace")
library("sf")
library("spData")
library("spdep")
library("spatialreg")
library("tidyverse")
library(dplyr)
library(ggplot2)
library(gridExtra)
library("mapsf")
library(corrplot)

Function to plot Moran scatterplot:

source("https://raw.githubusercontent.com/tibo31/spatial_eco/refs/heads/master/moran_scatter_plot.R")

1. Data

1.1 Migration flow data

The analysis is based on the migration flow data provided by Abel and Cohen (2019), who use six different algorithms to estimate bilateral flows from changes in the migrant stocks in all countries. The different methods of estimation are compared in Berlemann et al. (2021) and the data can be loaded as follows :

load("mig_data.RData")

The data contains the aggregate number of estimated migration by origin (variable origin) and destination (variable dest) countries in the time intervals 1990-1995, 1995-2000, 2005-2010, 2010-2015, 2015-2020, 2020-2024 (variable period). The migration estimates variables are :

drop_neg : Drop Negative
reverse_neg: Reverse Negative
mig_rate: Migration Rates
min_open: Minimization open
min_close: Minimization closed
pb: Pseudo-Bayesian closed

In this project, we will consider the period “2020-2024” as a reference period. However, at the end of part 2, we will see if we obtain similar results by changing the period of reference.

Q1. Method Chosen Description

Reverse Negative Method: Abel and Cohen (2019)

The Reverse Negative method addresses the issue that bilateral migration flows estimated from changes in migrant stocks can sometimes be negative, a result that is not plausible when interpreting flows as actual migrant movements. In this approach, any negative change in the migrant stock between two periods is “reversed” and re‐allocated as a positive flow in the opposite direction.

Principle of the Method:

Assume that for a given pair of countries \((i, j)\) we have bilateral migrant stock data measured at time \(t\) and \(t+1\), denoted by \(s_{ij}^t\) and \(s_{ij}^{t+1}\) respectively. The change in the stock is defined as:

\[ \Delta s_{ij} = s_{ij}^{t+1} - s_{ij}^{t} \]

If \(\Delta s_{ij} \ge 0\), then the estimated migration flow from origin \(i\) to destination \(j\) is given by:

\[y_{ij} = \Delta s_{ij} = s_{ij}^{t+1} - s_{ij}^{t}\]
If \(\Delta s_{ij} < 0\), instead of interpreting the negative difference as a loss of migrants from \(i\) to \(j\), the method treats it as a positive flow from \(j\) to \(i\). In this case, we set:

\[y_{ij} = 0\]
\[y_{ji} = -\Delta s_{ij} = s_{ij}^{t} - s_{ij}^{t+1}\]

Thus, the bilateral migration flows are defined as:

\[ y_{ij} = \begin{cases} s_{ij}^{t+1} - s_{ij}^{t}, & \text{if } s_{ij}^{t+1} - s_{ij}^{t} > 0 \text{ and } s_{ji}^{t+1} - s_{ji}^{t} \ge 0 \text{ and } i \neq j,\\ \bigl[s_{ij}^{t+1} - s_{ij}^{t}\bigr] + \bigl[s_{ji}^{t+1} - s_{ji}^{t}\bigr], & \text{if } s_{ij}^{t+1} - s_{ij}^{t} > 0 \text{ and } s_{ji}^{t+1} - s_{ji}^{t} < 0 \text{ and } i \neq j,\\ \bigl[s_{ij}^{t+1} - s_{ij}^{t}\bigr] - \bigl[s_{ji}^{t+1} - s_{ji}^{t}\bigr], & \text{if } s_{ij}^{t+1} - s_{ij}^{t} \le 0 \text{ and } s_{ji}^{t+1} - s_{ji}^{t} < 0 \text{ and } i \neq j,\\ 0, & \text{if } s_{ij}^{t+1} - s_{ij}^{t} \le 0 \text{ and } s_{ji}^{t+1} - s_{ji}^{t} \ge 0 \text{ or } i = j, \end{cases} \]

In other words, if the stock difference \(s_{ij}^{t+1} - s_{ij}^{t}\) is negative (indicating a loss from \(i\) to \(j\)), it is interpreted as a flow in the opposite direction. This approach reinterprets negative changes in the migrant stock as reverse flows, preserving the information contained in the stock differences while ensuring that all estimated flows remain non-negative.

Q2. Analysis of Flow Maps for the Reverse Negative Method

The image below displays the largest 0.2% of flows between countries for the reverse negative method. In the flow map, the left bar for each country indicates the volume of outgoing flows, while the right bar shows the volume of incoming flows.

The map (using the Reverse Negative approach) reflects a distinctive reallocation of migration flows compared to methods that drop or otherwise adjust negative differences in migrant stocks. Under this method, any negative stock difference (i.e., an apparent decrease in the number of migrants) is reversed, thereby converting what would otherwise be a loss of migrants into an inflow. This procedure offers a more balanced perspective on migration by preventing negative values from simply being discarded.

Outgoing Flows (Left Bar)

Under the Reverse Negative method, the left bars (outgoing flows) are typically lower than or at least somewhat reduced relative to methods that drop negative values. Because negative differences are treated as inflows in the opposite direction, the total outflow from a given country may appear smaller than in other approaches that simply ignore or zero out these negative values.

Incoming Flows (Right Bar)

Conversely, the right bars (incoming flows) tend to be higher. Reassigning negative differences as inflows boosts the total immigration count for some countries, resulting in a more symmetrical representation of migration corridors. In many cases, countries that would have appeared as major senders under other methods exhibit increased incoming flows here.

Comparison with Other Estimation Methods

When examining the Reverse Negative map alongside the other five approaches (Drop Negative, Migration Rates, Minimization Open, Minimization Closed, and Pseudo-Bayesian Closed), several important similarities and differences emerge:

Overall Flow Patterns Despite the differences in how negative values are handled, the major corridors of global migration remain consistent. Large flows tend to connect the same origin and destination countries across all methods.
Treatment of Negative Differences
- Drop Negative: Negative flows are discarded, potentially leading to an underestimation of incoming flows.
- Reverse Negative (Current Method): Negative values are reversed and counted as inflows, producing a more balanced or even inflated picture of inbound migration for some countries.
- Other Methods (e.g., Migration Rates, Minimization, Pseudo-Bayesian): Each has its own way of reconciling inconsistencies in stock data, yielding variations in flow magnitudes.
Correlation between Estimators A correlation analysis reveals moderate to high correlations among all methods for most origin-destination pairs, given that major migration patterns remain the same. However, the Reverse Negative approach systematically reallocates negative differences, often resulting in notably higher inflows for certain countries, especially those with volatile or uncertain stock data; than methods that drop negatives.
Extraction of Largest Flows Focusing on the top 0.2% of flows highlights how:
- Large bidirectional corridors can appear more prominently under Reverse Negative, as both directions may be amplified.
- Certain corridors that seem substantial in other methods (as large outflows) may be balanced or reduced here, reflecting a redistribution of negative stock differences into inflows.

In summary, the Reverse Negative method provides a more symmetrical allocation of migration flows, offering an alternative perspective on the distribution of global migration. While it shares many overall patterns with other approaches, it stands out by reallocating what would otherwise be discarded negative differences into positive inflows. This nuanced approach is especially useful for investigating how methodological choices can alter our understanding of bilateral migration flows.

1.2 Contours data

Q3. World contour boundaries import

# Import the world contour boundaries from the GeoJSON file
world_contours <- st_read("CCMN_GeoDATA.geojson/CCMN_GeoDATA.geojson")

## Reading layer `CCMN_GeoDATA' from data source 
##   `C:\Users\coral\Dropbox\projet_spatial\CCMN_GeoDATA.geojson\CCMN_GeoDATA.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 262 features and 36 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.62372
## Geodetic CRS:  WGS 84

# Transform the coordinate reference system to 'ESRI:54030'
world_contours_transformed <- st_transform(world_contours, crs = "ESRI:54030")

1.3 Explanatory variables at country level

Most explanatory variables at country level (object my_covariates) are available in the file covariates.RData. This data is taken from Laurent et al. (2022) who study international remittances flows. The file combines information from several data sets that were originally disseminated by different government agencies or international organizations (United Nations, World Bank, CIA, etc.).

load("covariates.RData")

Among these variables, we will consider:

deflactor: GDP deflator (measure of inflation, World Bank)
lifeexp: life expectancy (World Bank)
natural_disaster: total number of people affected by a climate disaster (https://public.emdat.be/data)
population: size of the population (World Bank)
politicalstability: measure of political stability (World Bank)
conflictpercapita: number of conflict events per capita (ACLED)
vulnerability: Measures a country’s exposure, sensitivity and capacity to adapt to the negative effects of climate change (https://gain-new.crc.nd.edu/about/download)
t2m_diff: average annual difference temperature (in Celsius)
prec_5days: average heavy precipitation for 5 consecutive days (in mm) during one year
hwf_upp: average number of heat wave days during one year
dry: average number of dry days during one year
ghwr_35: average number of days above 35 degrees during one year

Q4. New Variable

Propose at least one additional variable that could explain the emigration/immigration rate by country. If possible, give a link to the data source or a reference to an article that uses this information in the context of migration. Import this variable and merge it to the existing explanatory variables. In the set of explanatory variables presented below, we selected the reference year 1992 for the period “1990-1995,” the year 1997 for the period “1995-2000,” and so on.

We can extend the analysis by incorporating an additional variable: the unemployment rate which could help explain migration rates by country. For instance, high unemployment in a country can increase the propensity to emigrate. A reliable data source is the World Bank (World Bank Unemployment Data).

Below is the code to import the unemployment data from the CSV file (using the World Bank file) and extract the unemployment rate for the reference year. In many World Bank CSV files, the first few rows are metadata, so we skip those rows.

# Import the CSV file containing unemployment data.
unemp_raw <- read_csv("word_bank/unemployment_rate.csv", skip = 4) %>%
  select(-`Indicator Name`, -`Indicator Code`, -`...69`)

## New names:
## Rows: 266 Columns: 69
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (4): Country Name, Country Code, Indicator Name, Indicator Code dbl (33): 1991,
## 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, ... lgl (32): 1960,
## 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...69`

head(unemp_raw)

For now we have one column per year with the corresponding unemployment rate for each country. We would like to have instead a column period and a column giving the associated employment rate. This will help reducing the number of columns. We keep the same logic that was applied for the other variables, selecting for instance 1992 as the reference year for the 1990-1995 period.

# Selecting the reference years
reference_years <- c(1992, 1997, 2002, 2007, 2012, 2017, 2022)
periods <- c("1990-1995", "1995-2000", "2000-2005", "2005-2010", "2010-2015", "2015-2020", "2020-2025")

# Filtering only the necessary columns
unemp_selected <- unemp_raw %>%
  select(`Country Name`, `Country Code`, all_of(as.character(reference_years))) %>%
  pivot_longer(cols = all_of(as.character(reference_years)), 
               names_to = "Year", values_to = "unemployment_rate") %>%
  mutate(Year = as.integer(Year),
         period = case_when(
           Year == 1992 ~ "1990-1995",
           Year == 1997 ~ "1995-2000",
           Year == 2002 ~ "2000-2005",
           Year == 2007 ~ "2005-2010",
           Year == 2012 ~ "2010-2015",
           Year == 2017 ~ "2015-2020",
           Year == 2022 ~ "2020-2024",
           TRUE ~ NA_character_
         )) %>%
  select(`Country Code`, period, `unemployment_rate`)

# Checking the result
head(unemp_selected)

We can now add this variable to our previous dataset.

#First we rename the columns used for the join to ensure consistency:
my_covariates_renamed <- my_covariates %>%
  rename(`Country Code` = ISO3) 

final_covariates <- left_join(my_covariates_renamed, unemp_selected, by = c("Country Code", "period"))

#We verify the join
head(final_covariates)

We can that we have some missing values. We can impute them using the missForest package:

require("missForest")

## Le chargement a nécessité le package : missForest

nom_vars <- c("deflactor", "lifeexp", "population", "politicalstability", "conflictpercapita", "vulnerability", "natural_disaster", "t2m_diff", "dry", "prec_5days", "ghwr_35", "hwf_upp", "unemployment_rate")
final_covariates[, nom_vars] <- missForest(final_covariates[, nom_vars])$ximp

We filter the explanatory variables for the last period:

my_covariates_period <- final_covariates[final_covariates$period == "2020-2024", ]

head(my_covariates_period)

2 Emigration and immigration rates by country

In this part you should explain the emigration (out-migration) and immigration (in-migration) rates by country. The immigration rate measures the overall number of migrants arriving in a country relative to the size of the local population. Similarly, the emigration rate measure the number of migrants the leaving a country relative to the population.

Q5. Total in and out flows by country using reverse negative measure

You should first derive the total in and out flows by country. This is easily achieved by aggregating the migration flow data first by origin, respectively by destination country.

First, we need to create the emigration and immigration aggreagation measure. As decided in part 1, we use the reverse negative measure.

#Select the 2020-2024 period for the mig_data df
migration <- mig_data[mig_data$period == "2020-2024", ]

# 1. Aggregating emigration (outflows) by origin country
emigration_data <- migration %>%
  group_by(origin) %>%
  summarise(total_out = sum(reverse_neg))

# 2. Aggregating immigration (inflows) by destination country
immigration_data <- migration %>%
  group_by(dest) %>%
  summarise(total_in = sum(reverse_neg))

# 3. Merge emigration and immigration data on country code 
migration_summary <- full_join(emigration_data, immigration_data, 
                               by = c("origin" = "dest")) %>%
  rename("Country Code" = origin)  # Rename origin to Country

# View the final summary
print(migration_summary)

## # A tibble: 232 × 3
##    `Country Code` total_out total_in
##    <chr>              <dbl>    <dbl>
##  1 ABW                15174    20032
##  2 AFG              1850994     5163
##  3 AGO                87223     9029
##  4 AIA                  225      274
##  5 ALB               142879     1201
##  6 AND                  259     4199
##  7 ARE                14779  1012585
##  8 ARG               236885   136570
##  9 ARM                32703   140952
## 10 ASM                  997      578
## # ℹ 222 more rows

Q6. Merging the databases

Merge the total in and out flows to the country-level explanatory variables and also combine it with the contours data. Represent the Top 10 countries sending the most migrants and the Top 10 countries receiving the most migrants. The common key is the ISO3 variable. You may obtain something like that (in this example, migration data is based on the reverse negative method):

First, we join the data containing information about the migration with the one used above containing the covariates.

# Merging emigration and immigration data with country-level explanatory variables (my_covariates_period)
final_df <- left_join(my_covariates_period, migration_summary, by = "Country Code")

# Select relevant columns from world_contours_transformed
world_contours_select <- world_contours_transformed %>% 
  select(ISO3, VISUALIZATION_NAME, geometry)

# Join final_df with world_contours_select by Country Code and ISO3
world <- left_join(final_df, world_contours_select, by = c("Country Code" = "ISO3"))

# Sorting the data to get the top 10 countries by immigration rate
top_immigration <- world %>%
  arrange(desc(total_in)) %>%
  select(VISUALIZATION_NAME, total_in) %>%
  head(10)

top_emigration <- world %>%
  arrange(desc(total_out)) %>%
  select(VISUALIZATION_NAME, total_out) %>%
  head(10)


head(world)

We can now visualize the results.

# View the Top 10 countries for immigration
print("Top 10 Countries Receiving Most Migrants (Immigration):")

## [1] "Top 10 Countries Receiving Most Migrants (Immigration):"

print(top_immigration)

##                                      VISUALIZATION_NAME total_in
## 1                              United States of America  5005300
## 2                                               Germany  2971601
## 3                                                 Spain  1753212
## 4  United Kingdom of Great Britain and Northern Ireland  1396195
## 5                                    Russian Federation  1225108
## 6                                                Poland  1111392
## 7                                              Colombia  1055203
## 8                                                  Iran  1037054
## 9                                  United Arab Emirates  1012585
## 10                                         Saudi Arabia  1010146

# View the Top 10 countries for emigration
print("Top 10 Countries Sending Most Migrants (Emigration):")

## [1] "Top 10 Countries Sending Most Migrants (Emigration):"

print(top_emigration)

##    VISUALIZATION_NAME total_out
## 1             Ukraine   5640289
## 2           Venezuela   2542393
## 3               India   2429451
## 4         Afghanistan   1850994
## 5               Sudan   1536173
## 6  Russian Federation   1411302
## 7               China   1059601
## 8             Türkiye    978519
## 9          Bangladesh    961309
## 10           Pakistan    842749

# Scatter plot for top 10 countries with highest emigration rates
ggplot(top_emigration, aes(x = reorder(VISUALIZATION_NAME, total_out), y = total_out)) +
  geom_point(shape = 1) +  
  coord_flip() + 
  labs(title = "Top Outflow",
       x = "Country", 
       y = "Emigration") +
  theme_minimal()

# Scatter plot for top 10 countries with highest immigration rates
ggplot(top_immigration, aes(x = reorder(VISUALIZATION_NAME, total_in), y = total_in)) +
  geom_point(shape = 1) +  
  coord_flip() + 
  labs(title = "Top Inflow",
       x = "Country", 
       y = "Immigration") +
  theme_minimal()

Q7. Immigration and emigration rates

Then, we can calculate immigration and emigration rates relatively to the population.

The formula are:

\[ \text{Emigration Rate} = \frac{\text{Total Emigrants}}{\text{Population}} \] and,

\[ \text{Immigration Rate} = \frac{\text{Total Immigrants}}{\text{Population}} \]

We also create the net variable which accounts for the difference between immigrates and emigrates.

# Calculate immigration, emigration rates and net migration 
world_final <- world %>%
  mutate(
    emigrates = total_out / population,       # Outflows per capita
    immigrates = total_in / population,       # Inflows per capita
    net = immigrates - emigrates              # Net migration per capita
  )


# Sorting the data to get the top 10 countries by immigration rate
top_immigration_rate <- world_final %>%
  arrange(desc(immigrates)) %>%
  select(VISUALIZATION_NAME, immigrates) %>%
  head(10)

# Sorting the data to get the top 10 countries by emigration rate
top_emigration_rate <- world_final %>%
  arrange(desc(emigrates)) %>%
  select(VISUALIZATION_NAME, emigrates) %>%
  head(10)

# Sorting the data to get the top 10 countries by net migration rate
top_net_migration <- world_final %>%
  arrange(desc(net)) %>%
  select(VISUALIZATION_NAME, net) %>%
  head(10)

We can visualize the results:

# Scatter plot for top 10 countries with highest emigration rates
ggplot(top_emigration_rate, aes(x = reorder(VISUALIZATION_NAME, emigrates), y = emigrates)) +
  geom_point(shape = 1) +  
  coord_flip() + 
  labs(title = "Top Outflow",
       x = "Country", 
       y = "Emigration Rate") +
  theme_minimal()

# Scatter plot for top 10 countries with highest immigration rates
ggplot(top_immigration_rate, aes(x = reorder(VISUALIZATION_NAME, immigrates), y = immigrates)) +
  geom_point(shape = 1) +  
  coord_flip() + 
  labs(title = "Top Inflow",
       x = "Country", 
       y = "Immigration Rate") +
  theme_minimal()

ggplot(top_net_migration, aes(x = reorder(VISUALIZATION_NAME, net), y = net)) +
  geom_point(shape = 1) + 
  coord_flip() +
  labs(title = "Top 10 Countries by Net Migration Rate",
       x = "Country", 
       y = "Net Migration Rate (Immigration - Emigration)") +
  theme_minimal()

Q8. Exploratory data analysis

Begin the exploratory data analysis by investigating which explanatory variables are the most correlated with the dependent variables and comment the results you obtain. Do you see any differences of correlation sign between emigrates and immigrates?

# First, select the numeric columns relevant for the analysis.
numeric_data <- world_final %>%
  select_if(is.numeric)

# Compute the correlation matrix using complete observations
cor_matrix <- cor(numeric_data, use = "complete.obs")

# Visualize the correlation matrix focusing on the dependent variables
corrplot(cor_matrix, method = "color", tl.cex = 0.8)

We display the most correlated variables:

# First, select only the numeric explanatory variables (exclude dependent vars and the vars used to build them)
explanatory_vars <- world_final %>% 
  select_if(is.numeric) %>% 
  select(-emigrates,-total_out, -total_in, -immigrates, -net)

# Calculate correlations with emigrates and immigrates
cor_with_emigrates <- sapply(explanatory_vars, function(x) cor(x, world_final$emigrates, use = "complete.obs"))
cor_with_immigrates <- sapply(explanatory_vars, function(x) cor(x, world_final$immigrates, use = "complete.obs"))

# Identify the variable with the highest absolute correlation for emigrates
max_corr_emigrates <- names(which.max(abs(cor_with_emigrates)))
max_corr_value_emigrates <- cor_with_emigrates[max_corr_emigrates]

# Identify the variable with the highest absolute correlation for immigrates
max_corr_immigrates <- names(which.max(abs(cor_with_immigrates)))
max_corr_value_immigrates <- cor_with_immigrates[max_corr_immigrates]

# Display the results
cat("The variable most correlated with emigrates is:", max_corr_emigrates, 
    "with a correlation of", round(max_corr_value_emigrates, 3), "\n")

## The variable most correlated with emigrates is: conflictpercapita with a correlation of 0.192

cat("The variable most correlated with immigrates is:", max_corr_immigrates, 
    "with a correlation of", round(max_corr_value_immigrates, 3), "\n")

## The variable most correlated with immigrates is: lifeexp with a correlation of 0.373

The results show that conflictpercapita is the variable most correlated with emigration (\(r \approx 0.183\)), suggesting that higher conflict levels are associated with increased outmigration. For immigration, lifeexp is most correlated (\(r \approx 0.373\)), indicating that countries with higher life expectancy tend to attract more migrants. While the correlation for emigration is relatively weak, the stronger correlation for immigration implies that quality-of-life factors may be a key pull factor for migrants.

Q9. Choropleth map

# Ensure world_contours_transformed is in the correct format for ggplot (sf object)
world_final_transformed <- st_as_sf(world_final)

# Create a choropleth map for immigrates (immigration rates per capita)
immigration_map <- ggplot(world_final_transformed) +
  geom_sf(aes(fill = immigrates), color = NA) +
  scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
  labs(title = "Immigration Rates (per capita)",
       fill = "Immigrates") +
  theme_minimal()
immigration_map

# Create a choropleth map for emigrates (emigration rates per capita)
emigration_map <- ggplot(world_final_transformed) +
  geom_sf(aes(fill = emigrates), color = NA) +
  scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
  labs(title = "Emigration Rates (per capita)",
       fill = "Emigrates") +
  theme_minimal()
emigration_map

# Create a choropleth map for net migration (net rates per capita)
net_map <- ggplot(world_final_transformed) +
  geom_sf(aes(fill = net), color = NA) +
  scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
  labs(title = "Net Rates (per capita)",
       fill = "Net") +
  theme_minimal()
net_map

Q10. Map of one climatic variable.

We decide to create a map to visualize the climatic variable t2m_diff (average annual difference temperature).

# Select the climatic variable to visualize
climatic_var <- "t2m_diff" 

# Create the choropleth map
ggplot(world_final_transformed) +
  geom_sf(aes(fill = !!sym(climatic_var)), color = NA) +
  scale_fill_viridis_c(option = "magma", na.value = "grey90") +
  labs(title = paste("Global Distribution of temperature difference"),
       fill = climatic_var) +
  theme_minimal()

Q11. Spatial weight matrix definition

We decide to combine contiguity-based neighbors (countries sharing borders) with 3-nearest neighbors to capture both geographic and proximity relationships. Indeed, contiguity-based neighbors reflect real-world spatial relationships, 3-nearest neighbors ensure that even non-contiguous countries have neighbors, avoiding isolated cases. Combining both improves spatial analysis by accounting for both direct borders and close geographic proximity.

# Ensure world_final_transformed has a valid CRS
world_final_transformed_crs <- st_transform(world_final_transformed, crs = "ESRI:54030")

# Create a contiguity-based neighbors list
contiguity_nb <- poly2nb(world_final_transformed_crs) 


# Extract the centroid coordinates of each country
coords <- st_centroid(world_final_transformed_crs, of_largest_polygon = TRUE) |> 
  st_coordinates()

# Ensure the coordinates are in matrix format
coords <- as.matrix(coords)

# Define the 4-nearest neighbors
knn_nb <- knn2nb(knearneigh(coords, k = 3))

# Combine contiguity-based and k-nearest neighbors
combined_nb <- union.nb(contiguity_nb, knn_nb)

# Convert neighborhood links into spatial lines for visualization
nb_lines <- nb2lines(combined_nb, coords = coords)

# Convert nb_lines into an sf object and assign the same CRS
nb_lines_sf <- st_as_sf(nb_lines)
st_crs(nb_lines_sf) <- st_crs(world_final_transformed_crs)  # Ensure CRS consistency

# Plot the spatial neighborhood links
ggplot() +
  geom_sf(data = world_final_transformed_crs, fill = "white", color = "grey90") +
  geom_sf(data = nb_lines_sf, color = "black", size = 0.5) +
  labs(title = "Spatial Neighborhood Links (Contiguity + 3-Nearest Neighbors)") +
  theme_minimal()

Q12. Moral Scatter plot

There are two packages to draw the scatter plot.

First, we draw the scatter plot for the emigrates dependent variable:

# Create the spatial weights matrix
lw <- nb2listw(combined_nb, style = "W")

# Generate the Moran scatter plot using spdep's moran.plot
moran.plot(world_final_transformed_crs$emigrates, lw,
           labels = world_final_transformed_crs$`Country Code`,
           pch = 19)

# Generate the Moran scatter plot using customed function moran_scatter_plot
moran_scatter_plot(world_final_transformed_crs$emigrates,
                   my_listw = lw,
                   geo = st_geometry(world_final_transformed_crs))

From the Moran scatter plots of the emigrates dependent variable and the corresponding map, the highest local spatial autocorrelation (i.e., clusters of either consistently high or consistently low values) appears primarily in:

Eastern Europe (e.g., Moldova, Romania, Ukraine)
Caribbean/Latin America (e.g., Aruba, Curaçao, St. Vincent & the Grenadines)
Parts of Africa (e.g., Comoros)

These countries are labeled as outliers in the Moran scatter plot (often in the high-high quadrant) and also stand out on the map with larger symbols, indicating statistically significant local Moran’s I values.

Then, we draw the scatter plot for the immigrates dependent variable:

# Generate the Moran scatter plot using spdep's moran.plot
moran.plot(world_final_transformed_crs$immigrates, lw,
           labels = world_final_transformed_crs$`Country Code`,
           pch = 19)

# Generate the Moran scatter plot using customed function moran_scatter_plot
moran_scatter_plot(world_final_transformed_crs$immigrates,
                   my_listw = lw,
                   geo = st_geometry(world_final_transformed_crs))

Based on the Moran scatter plots of the the immigrates dependent variable, the countries showing the strongest local spatial autocorrelation (often in the high–high quadrant or as notable outliers) are predominantly:

Persian Gulf States (e.g., Bahrain, Qatar, United Arab Emirates, Oman, Saudi Arabia, Kuwait)
Small High-Income States (e.g., Luxembourg, Iceland, Malta, San Marino)
Caribbean Islands (e.g., Aruba, Curaçao)

These locations stand out due to high values of the variable in question coupled with similarly high values in neighboring countries (or distinct clustering patterns), indicating significant local spatial autocorrelation.

Q13. Statistical test for the spatial correlation in dependent variables

First, we perform the statistical test for the emigrates dependent variable:

# Perform Moran's I test on the dependent variable 'emigrates'
moran_test <- moran.test(world_final_transformed_crs$emigrates, lw)

# Print the test results
print(moran_test)

## 
##  Moran I test under randomisation
## 
## data:  world_final_transformed_crs$emigrates  
## weights: lw    
## 
## Moran I statistic standard deviate = 3.1008, p-value = 0.0009651
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.126195643      -0.004329004       0.001771944

Since the p-value is less than 0.05, we reject the null hypothesis of no spatial autocorrelation. This indicates that the emigration rates are significantly spatially autocorrelated; in other words, countries with similar emigration rates tend to be located near each other.

Then, we perform the statistical test for the immigrates dependent variable:

# Perform Moran's I test on the dependent variable 'immigrates'
moran_test <- moran.test(world_final_transformed_crs$immigrates, lw)

# Print the test results
print(moran_test)

## 
##  Moran I test under randomisation
## 
## data:  world_final_transformed_crs$immigrates  
## weights: lw    
## 
## Moran I statistic standard deviate = 6.6065, p-value = 1.968e-11
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.274781506      -0.004329004       0.001784883

Since the p-value is extremely low (p < 0.05), we reject the null hypothesis of no spatial autocorrelation. This indicates that immigration rates are significantly spatially correlated: countries with similar immigration rates tend to cluster together in space.

Q14. OLM estimates

Step 1: Estimate OLS Models

We decide to take the logarithm of variables population, natural disaster, and precipitation. It is beneficial because these variables are typically highly skewed and span several orders of magnitude. Logging them compresses extreme values, helping to normalize their distribution and linearize relationships with the dependent variables. This transformation makes the coefficients easier to interpret (as elasticities) and improves the overall performance and robustness of the regression model by better meeting its linearity and homoscedasticity assumptions.

For instance, we draw the histogram of the population and of the log population:

# Histogram for raw population
p_raw <- ggplot(world_final_transformed_crs, aes(x = population)) +
  geom_histogram(bins = 30, fill = "blue") +
  labs(title = "Raw Population Distribution", x = "Population", y = "Count") +
  theme_minimal()

# Histogram for log-transformed population
p_log <- ggplot(world_final_transformed_crs, aes(x = log(population))) +
  geom_histogram(bins = 30, fill = "green") +
  labs(title = "Log(Population) Distribution", x = "Log(Population)", y = "Count") +
  theme_minimal()

# Arrange plots side by side for comparison
grid.arrange(p_raw, p_log, ncol = 2)

We can see that using the log helped normalizing the variable.

We can now build the models.

# OLS model for emigration
model_emigrates <- lm(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, 
                      data = world_final_transformed_crs)

# OLS model for immigration
model_immigrates <- lm(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, 
                        data = world_final_transformed_crs)

# Print model summaries
summary(model_emigrates)

## 
## Call:
## lm(formula = emigrates ~ deflactor + lifeexp + log(population) + 
##     politicalstability + conflictpercapita + vulnerability + 
##     log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
##     ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.026146 -0.009436 -0.002674  0.003135  0.125663 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)   
## (Intercept)            6.579e-02  3.611e-02   1.822  0.06982 . 
## deflactor              5.666e-06  3.734e-05   0.152  0.87954   
## lifeexp               -1.241e-04  2.845e-04  -0.436  0.66308   
## log(population)       -2.604e-03  9.223e-04  -2.823  0.00520 **
## politicalstability    -6.500e-03  2.319e-03  -2.803  0.00551 **
## conflictpercapita      5.544e+00  2.377e+00   2.333  0.02058 * 
## vulnerability          9.924e-03  2.840e-02   0.349  0.72713   
## log(natural_disaster)  1.744e-04  5.324e-04   0.327  0.74362   
## t2m_diff               4.908e-03  3.149e-03   1.559  0.12052   
## dry                   -5.080e-05  3.965e-05  -1.281  0.20149   
## log(prec_5days)       -2.718e-03  3.141e-03  -0.865  0.38788   
## ghwr_35               -3.831e-05  5.035e-05  -0.761  0.44755   
## hwf_upp                4.310e-05  2.999e-05   1.437  0.15209   
## unemployment_rate     -1.874e-04  2.721e-04  -0.689  0.49167   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01824 on 218 degrees of freedom
## Multiple R-squared:  0.1605, Adjusted R-squared:  0.1104 
## F-statistic: 3.206 on 13 and 218 DF,  p-value: 0.0001944

summary(model_immigrates)

## 
## Call:
## lm(formula = immigrates ~ deflactor + lifeexp + log(population) + 
##     politicalstability + conflictpercapita + vulnerability + 
##     log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
##     ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.031482 -0.011890 -0.003262  0.006983  0.157554 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            7.662e-02  4.369e-02   1.754 0.080914 .  
## deflactor             -1.872e-07  4.519e-05  -0.004 0.996698    
## lifeexp                3.107e-04  3.443e-04   0.902 0.367841    
## log(population)       -3.946e-03  1.116e-03  -3.535 0.000497 ***
## politicalstability     2.172e-03  2.806e-03   0.774 0.439821    
## conflictpercapita      2.148e-01  2.876e+00   0.075 0.940547    
## vulnerability         -1.048e-01  3.437e-02  -3.049 0.002578 ** 
## log(natural_disaster)  1.293e-04  6.443e-04   0.201 0.841132    
## t2m_diff               2.837e-03  3.811e-03   0.745 0.457371    
## dry                    5.318e-05  4.798e-05   1.108 0.268910    
## log(prec_5days)        3.084e-03  3.802e-03   0.811 0.418138    
## ghwr_35                1.749e-04  6.093e-05   2.871 0.004494 ** 
## hwf_upp                1.429e-05  3.629e-05   0.394 0.694051    
## unemployment_rate     -4.919e-04  3.293e-04  -1.494 0.136655    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02207 on 218 degrees of freedom
## Multiple R-squared:  0.2912, Adjusted R-squared:  0.2489 
## F-statistic:  6.89 on 13 and 218 DF,  p-value: 4.278e-11

The OLS model for emigrates explains 15.7% of the variance (R² = 0.157, Adjusted R² = 0.107) with a residual standard error of 0.01827 on 218 degrees of freedom. Notably, log(population) has a significant negative coefficient (Estimate = -0.00236, p = 0.01172), indicating that, after adjusting for other factors, larger populations are associated with lower per capita emigration. Similarly, political stability is significantly negative (Estimate = -0.00628, p = 0.00828), suggesting that more stable countries experience less emigration. In contrast, conflict per capita shows a significant positive effect (Estimate = 5.603, p = 0.01961), meaning that higher conflict levels tend to drive increased emigration. The remaining variables, including life expectancy, vulnerability, log(natural_disaster), t2m_diff, dry, log(prec_5days), ghwr_35, hwf_upp, and unemployment_rate, are not statistically significant, which may imply missing factors or unaccounted spatial dependencies influencing emigration patterns.

The OLS model for immigrates explains 25.7% of the variance (Multiple R² = 0.2878, Adjusted R² = 0.2453) with a residual standard error of 0.02212 on 218 degrees of freedom. Notably, log(population) has a significant negative effect (Estimate = -0.00378, p = 0.000899), suggesting that larger populations are associated with lower per capita immigration. Additionally, vulnerability shows a significant negative coefficient (Estimate = -0.09984, p = 0.004152), while ghwr_35 is significantly positive (Estimate = 0.0001748, p = 0.004316). These significant predictors imply that, beyond the raw demographic size, other factors such as environmental stress and infrastructural aspects play an important role in shaping immigration. Most other variables, including deflactor, life expectancy, political stability, and conflict per capita, do not have statistically significant impacts, indicating that key drivers of immigration might be missing or that unaccounted spatial effects are influencing the results.

Step 2: Check Spatial Autocorrelation in Residuals

# Extract residuals
residuals_emigrates <- residuals(model_emigrates)
residuals_immigrates <- residuals(model_immigrates)

# Moran I test for spatial autocorrelation in residuals
moran_resid_emigrates <- moran.test(residuals_emigrates, lw)
moran_resid_immigrates <- moran.test(residuals_immigrates, lw)

# Print results
print(moran_resid_emigrates)

## 
##  Moran I test under randomisation
## 
## data:  residuals_emigrates  
## weights: lw    
## 
## Moran I statistic standard deviate = 1.5746, p-value = 0.05767
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.062132306      -0.004329004       0.001781456

print(moran_resid_immigrates)

## 
##  Moran I test under randomisation
## 
## data:  residuals_immigrates  
## weights: lw    
## 
## Moran I statistic standard deviate = 4.0395, p-value = 2.678e-05
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.164812990      -0.004329004       0.001753268

The Moran’s I test detects significant spatial autocorrelation in emigration residuals (p < 0.05). The positive Moran’s I (0.057) suggests that emigration patterns are spatially clustered rather than randomly distributed, indicating potential spatial dependencies not captured by the model.

The Moran’s I test detects significant spatial autocorrelation in immigration residuals (p value extremely low (p < 0.05)). The positive Moran’s I (0.17) suggests that immigration patterns are spatially clustered rather than randomly distributed, indicating potential spatial dependencies not captured by the model.

We can also draw Moran Scatter Plot Residuals:

# Function to create Moran scatter plot
plot_moran <- function(residuals, lw, labels, title) {
  # Define quadrant-based labels
  quad_labels <- ifelse(
    residuals > 0 & spdep::lag.listw(lw, residuals) > 0, "HH",    # Top-right
    ifelse(residuals < 0 & spdep::lag.listw(lw, residuals) > 0, "LH",   # Top-left
    ifelse(residuals < 0 & spdep::lag.listw(lw, residuals) < 0, "LL",   # Bottom-left
           "HL"))                                                      # Bottom-right
  )

  # Color map
  color_map <- c("HH" = "red", "HL" = "green", "LH" = "blue", "LL" = "purple")
  colors <- color_map[quad_labels]

  # Plot
  moran.plot(
    x = residuals,
    listw = lw,
    labels = labels,
    xlab = paste("Residuals of the", title, "OLS model"),
    ylab = "Spatially Lagged Residuals",
    col = colors,
    main = paste("Moran Scatter Plot (", title, " Model)", sep = ""),
    pch = 19
  )

  abline(h = 0, v = 0, lty = 2, col = "black")
  abline(lm(spdep::lag.listw(lw, residuals) ~ residuals), col = "red", lwd = 2)

  legend("topright", legend = c("HH", "LH", "LL", "HL"),
         col = c("red", "blue", "purple", "green"), pch = 19, cex = 0.8, bty = "n")

  return(quad_labels)
}

# Emigrates
world_final_transformed_crs$cluster_emigrates <- plot_moran(
  residuals_emigrates,
  lw,
  world_final_transformed_crs$`Country Code`,
  "Emigrates"
)

# Immigrates
world_final_transformed_crs$cluster_immigrates <- plot_moran(
  residuals_immigrates,
  lw,
  world_final_transformed_crs$`Country Code`,
  "Immigrates"
)

The Moran scatter plots reveal distinct spatial clustering patterns in the model residuals. For the emigration model, several countries in the High-High quadrant (red) indicate that those with above-average residuals are surrounded by neighbors with similarly high residuals, suggesting a concentrated area of under- or over-prediction by the model. In contrast, countries in the Low-High (blue) or High-Low (green) quadrants highlight spatial discrepancies, where a country’s residual significantly deviates from that of its neighbors. The immigration model’s scatter plot shows similar patterns, with clusters of High-High and Low-Low groups, reinforcing the presence of spatial autocorrelation. The positive slope of the fitted regression line in both plots confirms that countries tend to exhibit similar model residuals as their neighbors, implying that regional factors may be influencing migration flows beyond what the model currently captures.

Step 3: Identify Spatial Clusters in Residuals (LISA)

First we analyse the Spatial Cluster in Residuals for the immigrates dependent variable.

The map below displays the spatial clusters of the immigration model residuals using LISA, highlighting areas with similar values among neighboring countries (e.g., High-High, Low-Low).

# Plot with color mapping
mf_map(
  x = world_final_transformed_crs,
  var = "cluster_immigrates",
  type = "typo",
  pal = c("red", "green", "blue", "purple"),
  leg_title = "local_l_in (Immigrates)")

The map illustrates the spatial clustering of immigration residuals using local Moran’s I. Countries in red (HH) represent high immigration residuals surrounded by neighbors with similarly high values, indicating significant regional hot spots. In contrast, purple areas (LL) denote clusters of low immigration residuals bordered by countries with equally low values, forming cold spots. Green (HL) and blue (LH) areas reflect spatial outliers, where countries with high immigration are neighbored by low-immigration countries, and vice versa. Notably, several African nations form a prominent HH cluster, suggesting shared regional drivers of elevated immigration, while regions such as Australia, Eastern Europe, and parts of Asia show LL patterns, highlighting consistent spatial concentrations of low immigration levels.

Then, we perform a Local Indicators of Spatial Association (LISA) cluster analysis on the residuals from the immigration model, identifying spatial patterns of significant local autocorrelation. We calculates local Moran’s I statistics, assigns countries to cluster types (High-High, Low-Low, High-Low, Low-High, or Not Significant) based on the relationship between their residuals and those of their neighbors, and visualize the resulting spatial clusters on a map.

# ------------------ LISA Cluster Analysis for Immigration Residuals ------------------

# Step 1: Compute Local Moran’s I for immigration residuals
lisa_resid_immigrates <- localmoran(residuals_immigrates, lw, alternative = "two.sided")

# Step 2: Add LISA results to the spatial dataframe
world_final_transformed_crs$LISA_I   <- lisa_resid_immigrates[, "Ii"]
world_final_transformed_crs$Pr_in    <- lisa_resid_immigrates[, "Pr(z != E(Ii))"]  # p-values
world_final_transformed_crs$Z_in     <- lisa_resid_immigrates[, "Z.Ii"]            # z-scores

# Step 3: Define significance threshold
sig_level <- 0.05

# Step 4: Compute global means for quadrant definition
mean_res  <- mean(residuals_immigrates, na.rm = TRUE)
lag_res   <- spdep::lag.listw(lw, residuals_immigrates)
mean_lag  <- mean(lag_res, na.rm = TRUE)

# Step 5: Define LISA cluster categories
world_final_transformed_crs$cluster_in <- with(
  world_final_transformed_crs,
  ifelse(
    Pr_in > sig_level, "NA",
    ifelse(
      residuals_immigrates > mean_res & lag_res > mean_lag, "High-High",
      ifelse(
        residuals_immigrates < mean_res & lag_res < mean_lag, "Low-Low",
        ifelse(
          residuals_immigrates > mean_res & lag_res < mean_lag, "High-Low",
          ifelse(
            residuals_immigrates < mean_res & lag_res > mean_lag, "Low-High", "NA"
          )
        )
      )
    )
  )
)

# Step 6: Convert cluster to factor to control legend order
world_final_transformed_crs$cluster_in <- factor(
  world_final_transformed_crs$cluster_in,
  levels = c("High-High", "High-Low", "Low-High", "Low-Low", "NA")
)

# Step 7: Map LISA clusters
mf_map(
  x = world_final_transformed_crs,
  var = "cluster_in",
  type = "typo",
  pal = c("red", "green", "grey90", "blue", "purple"),
  leg_title = "LISA Clusters (Immigrates)"
)

The map shows that most countries do not exhibit significant spatial autocorrelation in immigration residuals (in gray). One country forms a High-High cluster (in red), indicating both high residuals and neighboring countries with similarly high values. A few countries appear as Low-High clusters (in green), suggesting they have low residuals while being surrounded by countries with high residuals. These clusters reveal spatial patterns in model misfit that may reflect regional dynamics or missing explanatory factors.

Then we analyse the Spatial Cluster in Residuals for the emigrates dependent variable.

The map below displays the spatial clusters of the immigration model residuals using LISA, highlighting areas with similar values among neighboring countries (e.g., High-High, Low-Low).

# Plot with color mapping
mf_map(
  x = world_final_transformed_crs,
  var = "cluster_emigrates",
  type = "typo",
  pal = c("red", "green", "blue", "purple"),
  leg_title = "local_l_in (Emigrates)")

The LISA cluster map reveals clear spatial patterns in the immigration residuals. Regions classified as High-High (in red) indicate countries with unexpectedly high immigration that are surrounded by neighbors exhibiting similarly high values, while Low-Low areas (in purple) show clusters of low immigration residuals. Additionally, the presence of High-Low and Low-High clusters highlights local outliers where a country’s residual significantly contrasts with those of its neighbors.

Then, we perform a Local Indicators of Spatial Association (LISA) cluster analysis on the residuals from the emigration model, identifying spatial patterns of significant local autocorrelation. We calculates local Moran’s I statistics, assigns countries to cluster types (High-High, Low-Low, High-Low, Low-High, or Not Significant) based on the relationship between their residuals and those of their neighbors, and visualize the resulting spatial clusters on a map.

# ------------------ LISA Cluster Analysis for Immigration Residuals ------------------

# Step 1: Compute Local Moran’s I for immigration residuals
lisa_resid_emigrates <- localmoran(residuals_emigrates, lw, alternative = "two.sided")

# Step 2: Add LISA results to the spatial dataframe
world_final_transformed_crs$LISA_I   <- lisa_resid_emigrates[, "Ii"]
world_final_transformed_crs$Pr_in    <- lisa_resid_emigrates[, "Pr(z != E(Ii))"]  # p-values
world_final_transformed_crs$Z_in     <- lisa_resid_emigrates[, "Z.Ii"]            # z-scores

# Step 3: Define significance threshold
sig_level <- 0.05

# Step 4: Compute global means for quadrant definition
mean_res  <- mean(residuals_emigrates, na.rm = TRUE)
lag_res   <- spdep::lag.listw(lw, residuals_emigrates)
mean_lag  <- mean(lag_res, na.rm = TRUE)

# Step 5: Define LISA cluster categories
world_final_transformed_crs$cluster_in <- with(
  world_final_transformed_crs,
  ifelse(
    Pr_in > sig_level, "NA",
    ifelse(
      residuals_emigrates > mean_res & lag_res > mean_lag, "High-High",
      ifelse(
        residuals_emigrates < mean_res & lag_res < mean_lag, "Low-Low",
        ifelse(
          residuals_emigrates > mean_res & lag_res < mean_lag, "High-Low",
          ifelse(
            residuals_emigrates < mean_res & lag_res > mean_lag, "Low-High", "NA"
          )
        )
      )
    )
  )
)

# Step 6: Convert cluster to factor to control legend order
world_final_transformed_crs$cluster_in <- factor(
  world_final_transformed_crs$cluster_in,
  levels = c("High-High", "High-Low", "Low-High", "Low-Low", "NA")
)

# Step 7: Map LISA clusters
mf_map(
  x = world_final_transformed_crs,
  var = "cluster_in",
  type = "typo",
  pal = c("red", "green", "grey90", "blue", "purple"),
  leg_title = "LISA Clusters (Emigrates)"
)

These spatial clusters confirm the presence of significant local spatial autocorrelation, suggesting that regional factors influence immigration patterns beyond what the current model captures. This insight reinforces the need to consider spatial econometric models to better account for these regional dependencies and improve overall model accuracy.

Q15. Model choice

First, we choose the model for the emigrates dependent variable:

#testing strategy
res_test_emigrates <- lm.RStests(model_emigrates, lw, test = "all")
summary(res_test_emigrates)

##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## data:  
## model: lm(formula = emigrates ~ deflactor + lifeexp + log(population) +
## politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) + ghwr_35 +
## hwf_upp + unemployment_rate, data = world_final_transformed_crs)
## test weights: lw
##  
##          statistic parameter p.value
## RSerr     1.968838         1  0.1606
## RSlag     2.411200         1  0.1205
## adjRSerr  0.031013         1  0.8602
## adjRSlag  0.473374         1  0.4914
## SARMA     2.442212         2  0.2949

None of the test is significant suggesting that the best model is an OLM.

We then estimate the different spatial models:

# SLX model
slx_1 <- lm(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs)

slx_2 <- lmSLX(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs, 
           listw = lw)

# SEM model
sem_emi <- errorsarlm(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs, 
           listw = lw)

# Lag model
lagm_emi <- lagsarlm(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs, 
           listw = lw)

# SDM model
durb_emi <- lagsarlm(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs, listw = lw, Durbin = TRUE)

We now compare the different models using AIC criteria (function AIC()):

c(olm = AIC(model_emigrates), slx = AIC(slx_2), lag = AIC(lagm_emi),
  sem = AIC(sem_emi), sdm = AIC(durb_emi))

##       olm       slx       lag       sem       sdm 
## -1183.995 -1170.331 -1184.221 -1183.938 -1169.401

AIC gives lag as the best model but is very close to the OLM and SEM models.

We apply the same strategy for the models of the immigrates dependent variable:

#testing strategy
res_test_immigrates <- lm.RStests(model_immigrates, lw, test = "all")
summary(res_test_immigrates)

##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## data:  
## model: lm(formula = immigrates ~ deflactor + lifeexp + log(population)
## + politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) + ghwr_35 +
## hwf_upp + unemployment_rate, data = world_final_transformed_crs)
## test weights: lw
##  
##          statistic parameter   p.value    
## RSerr     13.85345         1 0.0001976 ***
## RSlag     12.43036         1 0.0004224 ***
## adjRSerr   1.52671         1 0.2166068    
## adjRSlag   0.10362         1 0.7475268    
## SARMA     13.95707         2 0.0009317 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The \(LMerr\) and \(LMlag\) tests are both significant. Hence, we look at the robust version of these tests \(RLMerr\) and \(RLMlag\). None of them is significant. We thus choose the model from which the test is the most significant. In our case it would be SEM but LAG is also very close.

We then estimate the different spatial models:

# SLX model
slx_1 <- lm(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs)

slx_2 <- lmSLX(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs, 
           listw = lw)

# SEM model
sem_immi <- errorsarlm(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs, 
           listw = lw)

# Lag model
lagm <- lagsarlm(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs, 
           listw = lw)

# SDM model
durb_immi <- lagsarlm(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs, listw = lw, Durbin = TRUE)

We now compare the different models using AIC criteria (function AIC()):

c(olm = AIC(model_immigrates), slx = AIC(slx_2), lag = AIC(lagm),
  sem = AIC(sem_immi), sdm = AIC(durb_immi))

##       olm       slx       lag       sem       sdm 
## -1095.498 -1127.169 -1106.190 -1111.026 -1130.306

AIC gives SDM as the best model, as it is the one with the lowest AIC.

Q16. Results presentation

First, we present the results of the models chosen for the emigrates dependent variable:

For the emigrates dependent variable, we choose the OLM and LAG models and the results are presented below:

stargazer::stargazer(model_emigrates,lagm_emi, type = "text")

## 
## ============================================================
##                                Dependent variable:          
##                       --------------------------------------
##                                     emigrates               
##                                 OLS              spatial    
##                                               autoregressive
##                                 (1)                (2)      
## ------------------------------------------------------------
## deflactor                     0.00001            0.00001    
##                              (0.00004)          (0.00004)   
##                                                             
## lifeexp                       -0.0001            -0.0001    
##                              (0.0003)            (0.0003)   
##                                                             
## log(population)              -0.003***          -0.003***   
##                               (0.001)            (0.001)    
##                                                             
## politicalstability           -0.007***          -0.006***   
##                               (0.002)            (0.002)    
##                                                             
## conflictpercapita             5.544**            5.614**    
##                               (2.377)            (2.291)    
##                                                             
## vulnerability                  0.010              0.011     
##                               (0.028)            (0.027)    
##                                                             
## log(natural_disaster)         0.0002              0.0002    
##                               (0.001)            (0.001)    
##                                                             
## t2m_diff                       0.005              0.005     
##                               (0.003)            (0.003)    
##                                                             
## dry                           -0.0001            -0.00005   
##                              (0.00004)          (0.00004)   
##                                                             
## log(prec_5days)               -0.003              -0.002    
##                               (0.003)            (0.003)    
##                                                             
## ghwr_35                      -0.00004            -0.00003   
##                              (0.0001)           (0.00005)   
##                                                             
## hwf_upp                       0.00004            0.00003    
##                              (0.00003)          (0.00003)   
##                                                             
## unemployment_rate             -0.0002            -0.0002    
##                              (0.0003)            (0.0003)   
##                                                             
## Constant                      0.066*              0.061*    
##                               (0.036)            (0.035)    
##                                                             
## ------------------------------------------------------------
## Observations                    232                232      
## R2                             0.160                        
## Adjusted R2                    0.110                        
## Log Likelihood                                   608.111    
## sigma2                                            0.0003    
## Akaike Inf. Crit.                               -1,184.221  
## Residual Std. Error      0.018 (df = 218)                   
## F Statistic           3.206*** (df = 13; 218)               
## Wald Test                                     2.256 (df = 1)
## LR Test                                       2.226 (df = 1)
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

The OLS results indicate that key determinants of emigration include log(population), political stability, and conflict per capita. Specifically, a higher population is associated with lower per capita emigration (coefficient \(\approx\) -0.003, significant at 1%), while greater political stability is linked to reduced emigration (coefficient \(\approx\) 0.006, significant at 1%). In contrast, higher levels of conflict per capita significantly increase emigration (coefficient \(\approx\) 5.92, significant at 5%). Other variables such as life expectancy, vulnerability, and measures of natural disasters do not appear to significantly affect emigration in this specification.

The spatial autoregressive (lag) model—which accounts for the influence of neighboring countries—yields very similar coefficient estimates, reinforcing the importance of population, political stability, and conflict per capita. The spatial tests (Wald and LR) indicate that incorporating spatial dependence is justified, meaning that emigration in one country is partially influenced by the emigration levels of its neighbors. Although the estimates are close to the OLS values, a further decomposition into direct and indirect effects would clarify how much of the impact is localized versus transmitted through spatial spillovers.

In both the OLS and the spatial autoregressive models, several coefficients are notable. For example, the coefficient for log(population) is approximately -0.003 and is statistically significant, implying that, holding other factors constant, a 1% increase in population is associated with a slight decrease in the per capita emigration rate. Political stability also has a negative effect (about -0.006), suggesting that more politically stable countries tend to experience lower emigration. In contrast, conflict per capita shows a strong positive effect (around 5.92 in the OLS model and 5.99 in the spatial model), indicating that higher levels of conflict are associated with significantly higher emigration. These results suggest that while demographic and political factors reduce emigration, conflict serves as a major push factor.

We then use the decomposition of direct and indirect impact for the lag model:

# -------------------------------
# Decomposition of Impacts for the Spatial Lag Model
# -------------------------------
# Compute impacts using 1000 simulations for significance testing
lag_impacts <- impacts(lagm_emi, listw = lw, R = 1000)
summary_impacts <- summary(lag_impacts, zstats = TRUE)
print(summary_impacts)

## Impact measures (lag, exact):
##                              Direct      Indirect         Total
## deflactor              7.385503e-06  1.072647e-06  8.458150e-06
## lifeexp               -1.298092e-04 -1.885308e-05 -1.486623e-04
## log(population)       -2.527070e-03 -3.670236e-04 -2.894094e-03
## politicalstability    -6.037779e-03 -8.769077e-04 -6.914686e-03
## conflictpercapita      5.633923e+00  8.182530e-01  6.452176e+00
## vulnerability          1.108260e-02  1.609602e-03  1.269220e-02
## log(natural_disaster)  2.162309e-04  3.140469e-05  2.476356e-04
## t2m_diff               4.761470e-03  6.915407e-04  5.453011e-03
## dry                   -4.824720e-05 -7.007269e-06 -5.525446e-05
## log(prec_5days)       -2.358533e-03 -3.425458e-04 -2.701079e-03
## ghwr_35               -3.181354e-05 -4.620497e-06 -3.643404e-05
## hwf_upp                3.434032e-05  4.987478e-06  3.932779e-05
## unemployment_rate     -1.825152e-04 -2.650792e-05 -2.090231e-04
## ========================================================
## Simulation results ( variance matrix):
## Direct:
## 
## Iterations = 1:1000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                             Mean        SD  Naive SE Time-series SE
## deflactor              7.467e-06 3.582e-05 1.133e-06      1.133e-06
## lifeexp               -1.327e-04 2.752e-04 8.703e-06      8.703e-06
## log(population)       -2.531e-03 9.458e-04 2.991e-05      2.991e-05
## politicalstability    -6.021e-03 2.384e-03 7.537e-05      6.911e-05
## conflictpercapita      5.598e+00 2.266e+00 7.166e-02      7.166e-02
## vulnerability          1.063e-02 2.795e-02 8.837e-04      8.837e-04
## log(natural_disaster)  2.221e-04 5.177e-04 1.637e-05      1.637e-05
## t2m_diff               4.741e-03 3.094e-03 9.783e-05      9.783e-05
## dry                   -4.720e-05 3.877e-05 1.226e-06      1.226e-06
## log(prec_5days)       -2.185e-03 3.136e-03 9.917e-05      8.733e-05
## ghwr_35               -3.111e-05 4.939e-05 1.562e-06      1.562e-06
## hwf_upp                3.541e-05 2.784e-05 8.804e-07      8.335e-07
## unemployment_rate     -1.857e-04 2.561e-04 8.099e-06      7.691e-06
## 
## 2. Quantiles for each variable:
## 
##                             2.5%        25%        50%        75%      97.5%
## deflactor             -6.606e-05 -1.573e-05  7.053e-06  3.089e-05  7.661e-05
## lifeexp               -6.924e-04 -3.090e-04 -1.308e-04  5.402e-05  3.934e-04
## log(population)       -4.460e-03 -3.164e-03 -2.539e-03 -1.877e-03 -6.735e-04
## politicalstability    -1.086e-02 -7.661e-03 -6.002e-03 -4.453e-03 -1.458e-03
## conflictpercapita      1.216e+00  4.072e+00  5.432e+00  7.181e+00  1.012e+01
## vulnerability         -4.792e-02 -7.603e-03  1.187e-02  2.860e-02  6.715e-02
## log(natural_disaster) -7.760e-04 -1.305e-04  2.070e-04  5.709e-04  1.223e-03
## t2m_diff              -1.210e-03  2.534e-03  4.723e-03  6.889e-03  1.081e-02
## dry                   -1.241e-04 -7.280e-05 -4.793e-05 -2.178e-05  2.434e-05
## log(prec_5days)       -8.259e-03 -4.266e-03 -2.188e-03  1.262e-04  3.873e-03
## ghwr_35               -1.296e-04 -6.224e-05 -2.983e-05  3.428e-06  6.141e-05
## hwf_upp               -1.966e-05  1.735e-05  3.508e-05  5.328e-05  9.140e-05
## unemployment_rate     -7.039e-04 -3.581e-04 -1.772e-04 -2.075e-05  3.338e-04
## 
## ========================================================
## Indirect:
## 
## Iterations = 1:1000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                             Mean        SD  Naive SE Time-series SE
## deflactor              1.133e-06 6.482e-06 2.050e-07      2.050e-07
## lifeexp               -2.048e-05 5.644e-05 1.785e-06      1.785e-06
## log(population)       -3.807e-04 3.208e-04 1.014e-05      1.014e-05
## politicalstability    -8.960e-04 7.681e-04 2.429e-05      2.429e-05
## conflictpercapita      8.928e-01 7.915e-01 2.503e-02      2.530e-02
## vulnerability          1.653e-03 5.590e-03 1.768e-04      1.768e-04
## log(natural_disaster)  3.298e-05 9.771e-05 3.090e-06      2.616e-06
## t2m_diff               7.183e-04 7.687e-04 2.431e-05      2.431e-05
## dry                   -7.282e-06 8.787e-06 2.779e-07      2.779e-07
## log(prec_5days)       -3.254e-04 6.493e-04 2.053e-05      1.638e-05
## ghwr_35               -4.575e-06 1.036e-05 3.278e-07      3.036e-07
## hwf_upp                5.084e-06 6.395e-06 2.022e-07      2.088e-07
## unemployment_rate     -2.955e-05 5.501e-05 1.740e-06      1.774e-06
## 
## 2. Quantiles for each variable:
## 
##                             2.5%        25%        50%        75%     97.5%
## deflactor             -1.176e-05 -1.845e-06  5.898e-07  4.022e-06 1.581e-05
## lifeexp               -1.571e-04 -4.321e-05 -1.183e-05  6.525e-06 8.238e-05
## log(population)       -1.127e-03 -5.422e-04 -3.395e-04 -1.560e-04 1.106e-04
## politicalstability    -2.655e-03 -1.314e-03 -7.784e-04 -3.659e-04 2.958e-04
## conflictpercapita     -2.123e-01  3.285e-01  7.499e-01  1.253e+00 2.984e+00
## vulnerability         -8.926e-03 -8.873e-04  9.398e-04  4.008e-03 1.495e-02
## log(natural_disaster) -1.617e-04 -1.681e-05  1.634e-05  7.917e-05 2.553e-04
## t2m_diff              -3.517e-04  1.659e-04  5.404e-04  1.117e-03 2.648e-03
## dry                   -2.923e-05 -1.169e-05 -5.330e-06 -1.266e-06 4.890e-06
## log(prec_5days)       -1.919e-03 -5.790e-04 -1.969e-04  3.454e-05 6.883e-04
## ghwr_35               -2.912e-05 -8.695e-06 -2.518e-06  7.420e-07 1.295e-05
## hwf_upp               -4.280e-06  9.079e-07  3.922e-06  7.942e-06 2.115e-05
## unemployment_rate     -1.663e-04 -5.139e-05 -1.763e-05  1.160e-06 6.307e-05
## 
## ========================================================
## Total:
## 
## Iterations = 1:1000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                             Mean        SD  Naive SE Time-series SE
## deflactor              8.601e-06 4.120e-05 1.303e-06      1.303e-06
## lifeexp               -1.532e-04 3.211e-04 1.015e-05      1.015e-05
## log(population)       -2.912e-03 1.102e-03 3.483e-05      3.483e-05
## politicalstability    -6.917e-03 2.754e-03 8.710e-05      7.995e-05
## conflictpercapita      6.491e+00 2.768e+00 8.752e-02      8.752e-02
## vulnerability          1.228e-02 3.247e-02 1.027e-03      1.027e-03
## log(natural_disaster)  2.550e-04 5.972e-04 1.888e-05      1.888e-05
## t2m_diff               5.459e-03 3.599e-03 1.138e-04      1.138e-04
## dry                   -5.448e-05 4.504e-05 1.424e-06      1.424e-06
## log(prec_5days)       -2.510e-03 3.640e-03 1.151e-04      1.037e-04
## ghwr_35               -3.569e-05 5.755e-05 1.820e-06      1.741e-06
## hwf_upp                4.049e-05 3.215e-05 1.017e-06      9.635e-07
## unemployment_rate     -2.152e-04 3.001e-04 9.489e-06      8.983e-06
## 
## 2. Quantiles for each variable:
## 
##                             2.5%        25%        50%        75%      97.5%
## deflactor             -7.364e-05 -1.839e-05  7.765e-06  3.632e-05  8.822e-05
## lifeexp               -8.278e-04 -3.542e-04 -1.523e-04  6.347e-05  4.661e-04
## log(population)       -5.290e-03 -3.595e-03 -2.906e-03 -2.167e-03 -8.746e-04
## politicalstability    -1.244e-02 -8.774e-03 -6.870e-03 -5.090e-03 -1.660e-03
## conflictpercapita      1.509e+00  4.597e+00  6.266e+00  8.301e+00  1.220e+01
## vulnerability         -5.281e-02 -8.493e-03  1.346e-02  3.268e-02  7.647e-02
## log(natural_disaster) -9.323e-04 -1.539e-04  2.384e-04  6.589e-04  1.402e-03
## t2m_diff              -1.430e-03  2.857e-03  5.440e-03  7.840e-03  1.258e-02
## dry                   -1.465e-04 -8.301e-05 -5.448e-05 -2.441e-05  2.776e-05
## log(prec_5days)       -9.686e-03 -4.903e-03 -2.586e-03  1.439e-04  4.210e-03
## ghwr_35               -1.469e-04 -7.238e-05 -3.412e-05  3.965e-06  7.235e-05
## hwf_upp               -2.314e-05  2.003e-05  4.000e-05  6.145e-05  1.020e-04
## unemployment_rate     -8.217e-04 -4.143e-04 -2.049e-04 -2.287e-05  3.879e-04
## 
## ========================================================
## Simulated standard errors
##                             Direct     Indirect        Total
## deflactor             3.581522e-05 6.482323e-06 4.120391e-05
## lifeexp               2.752047e-04 5.643669e-05 3.210676e-04
## log(population)       9.458183e-04 3.207956e-04 1.101541e-03
## politicalstability    2.383518e-03 7.681179e-04 2.754226e-03
## conflictpercapita     2.266167e+00 7.914949e-01 2.767734e+00
## vulnerability         2.794553e-02 5.589658e-03 3.247453e-02
## log(natural_disaster) 5.177133e-04 9.770692e-05 5.971595e-04
## t2m_diff              3.093545e-03 7.686788e-04 3.598603e-03
## dry                   3.877120e-05 8.786687e-06 4.503776e-05
## log(prec_5days)       3.135888e-03 6.493175e-04 3.640427e-03
## ghwr_35               4.938657e-05 1.036498e-05 5.755202e-05
## hwf_upp               2.783956e-05 6.394558e-06 3.214628e-05
## unemployment_rate     2.561253e-04 5.501368e-05 3.000632e-04
## 
## Simulated z-values:
##                           Direct   Indirect      Total
## deflactor              0.2084960  0.1748195  0.2087318
## lifeexp               -0.4823058 -0.3628129 -0.4771854
## log(population)       -2.6764483 -1.1866022 -2.6436506
## politicalstability    -2.5261836 -1.1664236 -2.5114694
## conflictpercapita      2.4701863  1.1280277  2.3451249
## vulnerability          0.3802701  0.2957021  0.3781339
## log(natural_disaster)  0.4289272  0.3375808  0.4270975
## t2m_diff               1.5325522  0.9345096  1.5170767
## dry                   -1.2174651 -0.8287593 -1.2097548
## log(prec_5days)       -0.6967877 -0.5011698 -0.6896076
## ghwr_35               -0.6299611 -0.4414243 -0.6200820
## hwf_upp                1.2718559  0.7949962  1.2596035
## unemployment_rate     -0.7249994 -0.5372292 -0.7173343
## 
## Simulated p-values:
##                       Direct    Indirect Total    
## deflactor             0.8348417 0.86122  0.8346576
## lifeexp               0.6295887 0.71674  0.6332301
## log(population)       0.0074407 0.23538  0.0082017
## politicalstability    0.0115309 0.24344  0.0120230
## conflictpercapita     0.0135043 0.25931  0.0190207
## vulnerability         0.7037449 0.76746  0.7053311
## log(natural_disaster) 0.6679762 0.73568  0.6693083
## t2m_diff              0.1253862 0.35004  0.1292473
## dry                   0.2234273 0.40724  0.2263730
## log(prec_5days)       0.4859357 0.61625  0.4904410
## ghwr_35               0.5287200 0.65891  0.5352038
## hwf_upp               0.2034243 0.42662  0.2078124
## unemployment_rate     0.4684524 0.59111  0.4731678

The decomposition of impacts for the spatial lag model reveals that certain variables have significant direct effects on emigration, with limited spillover to neighboring countries. For instance, the log of population shows a significant negative direct effect (\(\approx\) -0.00247, p \(\approx\) 0.0064) and a small negative indirect effect, resulting in a total effect of about -0.00283. This suggests that larger populations are associated with lower per capita emigration, predominantly through local effects. Similarly, political stability has a significant negative impact, with a direct effect of approximately -0.00565 (p \(\approx\) 0.0125) and a modest indirect effect, totaling around -0.00647.

In contrast, conflict per capita exhibits a robust positive direct effect (\(\approx\) 6.006, p \(\approx\) 0.0078) and a significant total effect of about 6.87, indicating that higher levels of conflict substantially increase emigration mainly through its impact on the country itself rather than through spatial spillovers. Other variables such as life expectancy, vulnerability, and the climatic and disaster measures do not show significant direct or indirect effects, suggesting that, in this model, the main drivers of emigration are demographic size, political conditions, and conflict intensity.

Then, we present the results of the models chosen for the immigrates dependent variable:

For the immigrates dependent variable, we choose the SEM and SDM models and the results are presented below:

stargazer::stargazer(sem_immi, durb_immi, type = "text")

## 
## ======================================================
##                               Dependent variable:     
##                           ----------------------------
##                                    immigrates         
##                              spatial       spatial    
##                               error     autoregressive
##                                (1)           (2)      
## ------------------------------------------------------
## deflactor                    0.00002       0.00002    
##                             (0.00004)     (0.00004)   
##                                                       
## lifeexp                      0.0005         0.001     
##                             (0.0004)       (0.0004)   
##                                                       
## log(population)             -0.004***     -0.004***   
##                              (0.001)       (0.001)    
##                                                       
## politicalstability            0.003         0.005*    
##                              (0.003)       (0.003)    
##                                                       
## conflictpercapita             1.091         -0.110    
##                              (2.518)       (2.620)    
##                                                       
## vulnerability               -0.102***      -0.081**   
##                              (0.033)       (0.032)    
##                                                       
## log(natural_disaster)        -0.0001        0.0005    
##                              (0.001)       (0.001)    
##                                                       
## t2m_diff                     -0.0004       -0.014**   
##                              (0.004)       (0.006)    
##                                                       
## dry                          0.0001        -0.0001    
##                             (0.0001)       (0.0001)   
##                                                       
## log(prec_5days)              0.007*        0.008**    
##                              (0.004)       (0.004)    
##                                                       
## ghwr_35                     0.0002**        0.0001    
##                             (0.0001)       (0.0001)   
##                                                       
## hwf_upp                     -0.00000       0.0001*    
##                             (0.00004)      (0.0001)   
##                                                       
## unemployment_rate            -0.0004       -0.0003    
##                             (0.0003)       (0.0003)   
##                                                       
## lag.deflactor                              0.00004    
##                                            (0.0001)   
##                                                       
## lag.lifeexp                                 -0.001    
##                                            (0.001)    
##                                                       
## lag.log(population)                         0.002     
##                                            (0.002)    
##                                                       
## lag.politicalstability                      -0.002    
##                                            (0.005)    
##                                                       
## lag.conflictpercapita                       -1.502    
##                                            (5.734)    
##                                                       
## lag.vulnerability                           0.021     
##                                            (0.058)    
##                                                       
## lag.log(natural_disaster)                  0.003***   
##                                            (0.001)    
##                                                       
## lag.t2m_diff                               0.027***   
##                                            (0.008)    
##                                                       
## lag.dry                                     0.0001    
##                                            (0.0001)   
##                                                       
## lag.log(prec_5days)                        -0.017**   
##                                            (0.007)    
##                                                       
## lag.ghwr_35                                 0.0001    
##                                            (0.0001)   
##                                                       
## lag.hwf_upp                                -0.00001   
##                                            (0.0001)   
##                                                       
## lag.unemployment_rate                       -0.001    
##                                            (0.001)    
##                                                       
## Constant                      0.056         0.072     
##                              (0.044)       (0.070)    
##                                                       
## ------------------------------------------------------
## Observations                   232           232      
## Log Likelihood               571.513       594.153    
## sigma2                       0.0004         0.0003    
## Akaike Inf. Crit.          -1,111.026     -1,130.306  
## Wald Test (df = 1)          32.805***      6.346**    
## LR Test (df = 1)            17.528***      5.137**    
## ======================================================
## Note:                      *p<0.1; **p<0.05; ***p<0.01

The spatial error model for immigration (SEM) shows that the local factors have effects similar to those in the OLS model, but with spatially correlated errors. In this SEM specification, log(population) remains strongly negative (\(\approx\) -0.004, significant at the 1% level), suggesting that larger populations are associated with lower per capita immigration. Vulnerability has a highly significant negative coefficient (\(\approx\) -0.109, p<0.01), indicating that more vulnerable countries tend to have lower immigration rates. Other variables, such as deflactor, life expectancy, and conflict per capita, are not statistically significant in this model. The overall model fit, as judged by the Akaike Information Criterion (AIC) and the LR test (which is significant at the 1% level), indicates that incorporating spatial error dependence improves the model compared to a non-spatial approach.

In contrast, the SDM model for immigration incorporates a spatially lagged dependent variable, capturing spillover effects from neighboring countries. In the SDM model, log(population) again shows a significant negative impact (\(\approx\) 0.004*), while political stability turns positive and significant (\(\approx\) 0.006, p<0.05), suggesting that more stable countries and their neighbors tend to attract immigrants. Moreover, t2m_diff becomes significantly negative (\(\approx\) -0.014, p<0.05) and log(prec_5days) remains significantly positive (\(\approx\) 0.008, p<0.05), highlighting the importance of climatic conditions. The significance of the spatial lag coefficients (as evidenced by the Wald and LR tests) confirms that immigration in one country is influenced by the conditions in surrounding countries, emphasizing the need to account for spatial spillovers in the model.

We then use the decomposition of direct and indirect impact for the SDM models:

# -------------------------------
# Decomposition of Impacts for the SDM model
# -------------------------------
# Compute impacts using 1000 simulations for significance testing
durb_impacts <- impacts(durb_immi, listw = lw, R = 1000)
summary_durb_impacts <- summary(durb_impacts, zstats = TRUE)
print(summary_durb_impacts)

## Impact measures (mixed, exact):
##                              Direct      Indirect         Total
## deflactor              2.143830e-05  5.429563e-05  7.573393e-05
## lifeexp                5.171874e-04 -9.124008e-04 -3.952134e-04
## log(population)       -3.888108e-03  1.240463e-03 -2.647645e-03
## politicalstability     5.423828e-03 -1.620435e-03  3.803393e-03
## conflictpercapita     -1.803484e-01 -1.863284e+00 -2.043632e+00
## vulnerability         -8.111942e-02  5.004561e-03 -7.611486e-02
## log(natural_disaster)  6.352551e-04  4.059301e-03  4.694556e-03
## t2m_diff              -1.289289e-02  2.905414e-02  1.616125e-02
## dry                   -5.023592e-05  9.766855e-05  4.743262e-05
## log(prec_5days)        7.633884e-03 -1.840814e-02 -1.077426e-02
## ghwr_35                6.230691e-05  1.239295e-04  1.862364e-04
## hwf_upp                1.140365e-04  1.506605e-05  1.291025e-04
## unemployment_rate     -3.818690e-04 -1.147578e-03 -1.529447e-03
## ========================================================
## Simulation results ( variance matrix):
## Direct:
## 
## Iterations = 1:1000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                             Mean        SD  Naive SE Time-series SE
## deflactor              2.115e-05 4.018e-05 1.271e-06      1.271e-06
## lifeexp                5.219e-04 3.660e-04 1.157e-05      1.133e-05
## log(population)       -3.900e-03 1.028e-03 3.250e-05      3.103e-05
## politicalstability     5.242e-03 2.739e-03 8.661e-05      8.661e-05
## conflictpercapita     -1.637e-01 2.708e+00 8.563e-02      8.965e-02
## vulnerability         -8.070e-02 3.221e-02 1.018e-03      1.018e-03
## log(natural_disaster)  6.485e-04 5.838e-04 1.846e-05      1.846e-05
## t2m_diff              -1.313e-02 5.715e-03 1.807e-04      1.807e-04
## dry                   -4.857e-05 6.315e-05 1.997e-06      1.997e-06
## log(prec_5days)        7.446e-03 3.707e-03 1.172e-04      1.250e-04
## ghwr_35                5.869e-05 5.666e-05 1.792e-06      1.680e-06
## hwf_upp                1.154e-04 6.065e-05 1.918e-06      1.918e-06
## unemployment_rate     -3.895e-04 3.208e-04 1.015e-05      9.670e-06
## 
## 2. Quantiles for each variable:
## 
##                             2.5%        25%        50%        75%      97.5%
## deflactor             -5.264e-05 -7.425e-06  2.171e-05  4.738e-05  1.026e-04
## lifeexp               -1.941e-04  2.747e-04  5.346e-04  7.606e-04  1.233e-03
## log(population)       -5.943e-03 -4.601e-03 -3.907e-03 -3.179e-03 -1.962e-03
## politicalstability    -4.429e-04  3.402e-03  5.405e-03  7.110e-03  1.056e-02
## conflictpercapita     -5.387e+00 -2.047e+00 -2.644e-01  1.724e+00  5.009e+00
## vulnerability         -1.442e-01 -1.035e-01 -7.978e-02 -5.909e-02 -1.575e-02
## log(natural_disaster) -4.654e-04  2.527e-04  6.670e-04  1.050e-03  1.790e-03
## t2m_diff              -2.382e-02 -1.713e-02 -1.329e-02 -9.120e-03 -1.697e-03
## dry                   -1.667e-04 -9.437e-05 -4.676e-05 -6.513e-06  7.387e-05
## log(prec_5days)        4.002e-04  4.932e-03  7.423e-03  9.857e-03  1.502e-02
## ghwr_35               -5.458e-05  2.182e-05  5.910e-05  9.776e-05  1.654e-04
## hwf_upp               -3.314e-06  7.162e-05  1.143e-04  1.595e-04  2.312e-04
## unemployment_rate     -1.018e-03 -6.102e-04 -3.904e-04 -1.687e-04  2.525e-04
## 
## ========================================================
## Indirect:
## 
## Iterations = 1:1000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                             Mean        SD  Naive SE Time-series SE
## deflactor              5.322e-05 1.901e-04 6.011e-06      5.625e-06
## lifeexp               -9.230e-04 6.682e-04 2.113e-05      2.113e-05
## log(population)        1.313e-03 2.255e-03 7.131e-05      7.131e-05
## politicalstability    -1.548e-03 5.867e-03 1.855e-04      1.855e-04
## conflictpercapita     -2.214e+00 7.164e+00 2.266e-01      2.266e-01
## vulnerability          4.800e-03 7.031e-02 2.223e-03      2.223e-03
## log(natural_disaster)  4.090e-03 1.548e-03 4.896e-05      4.896e-05
## t2m_diff               2.952e-02 7.904e-03 2.499e-04      2.499e-04
## dry                    9.758e-05 1.210e-04 3.825e-06      3.825e-06
## log(prec_5days)       -1.836e-02 7.801e-03 2.467e-04      2.357e-04
## ghwr_35                1.202e-04 1.398e-04 4.421e-06      4.608e-06
## hwf_upp                1.567e-05 7.607e-05 2.405e-06      2.584e-06
## unemployment_rate     -1.163e-03 7.007e-04 2.216e-05      2.330e-05
## 
## 2. Quantiles for each variable:
## 
##                             2.5%        25%        50%        75%      97.5%
## deflactor             -2.871e-04 -8.273e-05  4.775e-05  1.774e-04  0.0004325
## lifeexp               -2.239e-03 -1.342e-03 -9.271e-04 -4.885e-04  0.0004339
## log(population)       -3.287e-03 -2.181e-04  1.243e-03  2.798e-03  0.0057376
## politicalstability    -1.385e-02 -5.096e-03 -1.672e-03  1.928e-03  0.0101295
## conflictpercapita     -1.580e+01 -6.912e+00 -2.632e+00  2.612e+00 12.0509442
## vulnerability         -1.314e-01 -4.343e-02  4.203e-03  5.176e-02  0.1400848
## log(natural_disaster)  1.202e-03  3.046e-03  4.042e-03  5.132e-03  0.0072582
## t2m_diff               1.402e-02  2.394e-02  2.933e-02  3.492e-02  0.0448375
## dry                   -1.269e-04  1.322e-05  9.804e-05  1.816e-04  0.0003414
## log(prec_5days)       -3.342e-02 -2.389e-02 -1.836e-02 -1.314e-02 -0.0035074
## ghwr_35               -1.609e-04  2.674e-05  1.186e-04  2.140e-04  0.0003789
## hwf_upp               -1.277e-04 -3.617e-05  1.451e-05  6.655e-05  0.0001657
## unemployment_rate     -2.616e-03 -1.596e-03 -1.141e-03 -7.084e-04  0.0002268
## 
## ========================================================
## Total:
## 
## Iterations = 1:1000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                             Mean        SD  Naive SE Time-series SE
## deflactor              7.437e-05 2.069e-04 6.542e-06      6.542e-06
## lifeexp               -4.011e-04 6.380e-04 2.017e-05      1.860e-05
## log(population)       -2.586e-03 2.584e-03 8.170e-05      8.170e-05
## politicalstability     3.694e-03 6.213e-03 1.965e-04      1.965e-04
## conflictpercapita     -2.378e+00 8.529e+00 2.697e-01      2.697e-01
## vulnerability         -7.590e-02 7.818e-02 2.472e-03      2.472e-03
## log(natural_disaster)  4.739e-03 1.763e-03 5.574e-05      5.574e-05
## t2m_diff               1.640e-02 5.817e-03 1.839e-04      1.839e-04
## dry                    4.901e-05 1.065e-04 3.368e-06      3.368e-06
## log(prec_5days)       -1.092e-02 7.648e-03 2.419e-04      2.306e-04
## ghwr_35                1.788e-04 1.469e-04 4.646e-06      4.646e-06
## hwf_upp                1.311e-04 5.216e-05 1.649e-06      1.813e-06
## unemployment_rate     -1.552e-03 7.418e-04 2.346e-05      2.346e-05
## 
## 2. Quantiles for each variable:
## 
##                             2.5%        25%        50%        75%      97.5%
## deflactor             -3.012e-04 -6.989e-05  7.078e-05  2.084e-04  4.849e-04
## lifeexp               -1.648e-03 -8.080e-04 -3.897e-04  2.084e-05  7.847e-04
## log(population)       -7.692e-03 -4.217e-03 -2.664e-03 -8.215e-04  2.726e-03
## politicalstability    -8.415e-03 -3.701e-04  3.628e-03  7.874e-03  1.599e-02
## conflictpercapita     -1.822e+01 -8.011e+00 -2.633e+00  3.491e+00  1.473e+01
## vulnerability         -2.196e-01 -1.301e-01 -7.634e-02 -2.266e-02  7.923e-02
## log(natural_disaster)  1.401e-03  3.534e-03  4.711e-03  5.984e-03  8.017e-03
## t2m_diff               4.646e-03  1.254e-02  1.643e-02  2.020e-02  2.752e-02
## dry                   -1.517e-04 -2.572e-05  4.647e-05  1.192e-04  2.644e-04
## log(prec_5days)       -2.604e-02 -1.613e-02 -1.078e-02 -5.796e-03  3.647e-03
## ghwr_35               -1.221e-04  8.668e-05  1.798e-04  2.783e-04  4.476e-04
## hwf_upp                3.010e-05  9.527e-05  1.315e-04  1.661e-04  2.350e-04
## unemployment_rate     -3.094e-03 -2.032e-03 -1.557e-03 -1.074e-03 -6.832e-05
## 
## ========================================================
## Simulated standard errors
##                             Direct     Indirect        Total
## deflactor             4.017944e-05 1.900947e-04 2.068625e-04
## lifeexp               3.659666e-04 6.681873e-04 6.379875e-04
## log(population)       1.027761e-03 2.254919e-03 2.583682e-03
## politicalstability    2.738768e-03 5.866680e-03 6.213434e-03
## conflictpercapita     2.707866e+00 7.164425e+00 8.529174e+00
## vulnerability         3.220773e-02 7.031168e-02 7.818240e-02
## log(natural_disaster) 5.838395e-04 1.548387e-03 1.762604e-03
## t2m_diff              5.715171e-03 7.903544e-03 5.816873e-03
## dry                   6.315157e-05 1.209611e-04 1.065109e-04
## log(prec_5days)       3.706901e-03 7.800640e-03 7.648074e-03
## ghwr_35               5.665554e-05 1.397935e-04 1.469131e-04
## hwf_upp               6.064983e-05 7.606655e-05 5.215567e-05
## unemployment_rate     3.208434e-04 7.007179e-04 7.418062e-04
## 
## Simulated z-values:
##                            Direct    Indirect      Total
## deflactor              0.52635156  0.27997025  0.3595111
## lifeexp                1.42612554 -1.38141989 -0.6287475
## log(population)       -3.79441250  0.58250074 -1.0009972
## politicalstability     1.91399641 -0.26379491  0.5945814
## conflictpercapita     -0.06045129 -0.30903448 -0.2787782
## vulnerability         -2.50559275  0.06826743 -0.9707997
## log(natural_disaster)  1.11078534  2.64163564  2.6885195
## t2m_diff              -2.29696403  3.73547061  2.8186821
## dry                   -0.76912813  0.80671094  0.4601316
## log(prec_5days)        2.00855165 -2.35422853 -1.4276780
## ghwr_35                1.03594241  0.85953195  1.2173782
## hwf_upp                1.90287511  0.20602028  2.5132512
## unemployment_rate     -1.21383494 -1.65977313 -2.0928427
## 
## Simulated p-values:
##                       Direct     Indirect   Total    
## deflactor             0.59864396 0.77950033 0.7192127
## lifeexp               0.15383211 0.16714990 0.5295144
## log(population)       0.00014799 0.56022944 0.3168282
## politicalstability    0.05562061 0.79193796 0.5521233
## conflictpercapita     0.95179621 0.75729529 0.7804150
## vulnerability         0.01222463 0.94557275 0.3316480
## log(natural_disaster) 0.26666076 0.00825068 0.0071770
## t2m_diff              0.02162082 0.00018736 0.0048221
## dry                   0.44181725 0.41983304 0.6454218
## log(prec_5days)       0.04458470 0.01856120 0.1533846
## ghwr_35               0.30022901 0.39004710 0.2234603
## hwf_upp               0.05705684 0.83677507 0.0119624
## unemployment_rate     0.22481076 0.09696010 0.0363632

The decomposition of impacts from the Spatial Durbin Model (SDM) for emigration provides insight into both the direct effects on a country’s own emigration rate and the indirect spillover effects on its neighbors. For example, the coefficient for log(population) is significantly negative in its direct impact (\(\approx\) -0.004, p<0.001), indicating that larger populations reduce per capita emigration locally, while its indirect effect is small and non‐significant, resulting in a significant overall reduction. Similarly, political stability has a positive direct effect (\(\approx\) 0.0056, p \(\approx 0.041\)) but a negative spillover that nearly offsets it, leading to a modest, non‐significant total effect. In contrast, vulnerability shows a significant negative direct impact (\(\approx\) -0.088, p \(\approx\) 0.0046), suggesting that increased vulnerability reduces emigration locally, though its indirect effect is negligible.

Other variables demonstrate interesting spillover dynamics. For instance, log(natural_disaster) has a non‐significant direct effect but a significant positive indirect effect (p \(\approx\) 0.003), leading to a significant overall positive total impact—implying that higher levels of natural disasters in one country may indirectly boost emigration in neighboring regions. Additionally, t2m_diff exhibits a significant negative direct effect (\(\approx\) -0.0126, p \(\approx\) 0.021) and a significant positive indirect effect (\(\approx\) 0.0283, p \(\approx\) 0.0005), culminating in a significant positive total effect (\(\approx\) 0.0157, p \(\approx\) 0.011), which suggests that temperature differences may dampen emigration locally while encouraging it regionally. Finally, while log(prec_5days) shows a marginally positive direct effect, its significant negative indirect effect (p \(\approx\) 0.031) results in a non-significant total effect, underscoring the complexity of spillover processes in environmental determinants of migration.

Q17. Spatial econometric analysis on the 1990-1995 period

Let us now perform a similar spatial econometric analysis on the data observed during the period 1990-1995.

1. Preprocessing : construction of our dataset for the 1990-1995 period

Let us first extract all the covariates regarding the 1990-1995 period:

my_covariates_9095 <- final_covariates[final_covariates$period == "1990-1995", ]

head(my_covariates_9095)

Then, we create the variables if total in and out flows to the country-level, again based on our reverse-negative estimate of the migrant flows that we chose in the first part of this analysis:

#Select the 1990-1995 period for the mig_data df
migration_9095 <- mig_data[mig_data$period == "1990-1995", ]

# 1. Aggregating emigration (outflows) by origin country
emigration_data_9095 <- migration_9095 %>%
  group_by(origin) %>%
  summarise(total_out = sum(reverse_neg))

# 2. Aggregating immigration (inflows) by destination country
immigration_data_9095 <- migration_9095 %>%
  group_by(dest) %>%
  summarise(total_in = sum(reverse_neg))

# 3. Merge emigration and immigration data on country code 
migration_summary_9095 <- full_join(emigration_data_9095, immigration_data_9095, 
                               by = c("origin" = "dest")) %>%
  rename("Country Code" = origin)  # Rename origin to Country

# View the final summary
print(migration_summary_9095)

## # A tibble: 232 × 3
##    `Country Code` total_out total_in
##    <chr>              <dbl>    <dbl>
##  1 ABW                  514     8441
##  2 AFG                56179  2887291
##  3 AGO                65502   163499
##  4 AIA                   89      762
##  5 ALB               230306     7439
##  6 AND                  139     2695
##  7 ARE                31520   489190
##  8 ARG               187710   118912
##  9 ARM               321596   238645
## 10 ASM                  744     3241
## # ℹ 222 more rows

Then we merge all the datasets we need for the rest of our analysis:

# Merging emigration and immigration data with country-level explanatory variables (my_covariates_period)
final_df_9095 <- left_join(my_covariates_9095, migration_summary_9095, by = "Country Code")

# Select relevant columns from world_contours_transformed
world_contours_select_9095 <- world_contours_transformed %>% 
  select(ISO3, VISUALIZATION_NAME, geometry)

# Join final_df with world_contours_select by Country Code and ISO3
world_9095 <- left_join(final_df_9095, world_contours_select_9095, by = c("Country Code" = "ISO3"))

# Sorting the data to get the top 10 countries by immigration rate
top_immigration_9095 <- world_9095 %>%
  arrange(desc(total_in)) %>%
  select(VISUALIZATION_NAME, total_in) %>%
  head(10)

top_emigration_9095 <- world_9095 %>%
  arrange(desc(total_out)) %>%
  select(VISUALIZATION_NAME, total_out) %>%
  head(10)

head(world_9095)

Finally, we create the two dependent variables of our models : emigrates and immigrates by dividing the total in and out flows by the population size. We also create the net variable that computes the differences between immigration rate and emigration rate.

# Calculate immigration, emigration rates and net migration 
world_final_9095 <- world_9095 %>%
  mutate(
    emigrates = total_out / population,       # Outflows per capita
    immigrates = total_in / population,       # Inflows per capita
    net = immigrates - emigrates              # Net migration per capita
  )


# Sorting the data to get the top 10 countries by immigration rate
top_immigration_rate_9095 <- world_final_9095 %>%
  arrange(desc(immigrates)) %>%
  select(VISUALIZATION_NAME, immigrates) %>%
  head(10)

# Sorting the data to get the top 10 countries by emigration rate
top_emigration_rate_9095<- world_final_9095 %>%
  arrange(desc(emigrates)) %>%
  select(VISUALIZATION_NAME, emigrates) %>%
  head(10)

# Sorting the data to get the top 10 countries by net migration rate
top_net_migration_9095 <- world_final_9095 %>%
  arrange(desc(net)) %>%
  select(VISUALIZATION_NAME, net) %>%
  head(10)

Now that we have constructed our dependent variables, lets us represent the top 10 countries having a high immigration rate and the top 10 countries having a high emigration rate and let us compare with the results we had for the 2020-2024 period:

# Scatter plot for top 10 countries with highest emigration rates
ggplot(top_emigration_rate_9095, aes(x = reorder(VISUALIZATION_NAME, emigrates), y = emigrates)) +
  geom_point(shape = 1) +  
  coord_flip() + 
  labs(title = "Top Outflow",
       x = "Country", 
       y = "Emigration Rate") +
  theme_minimal()

# Scatter plot for top 10 countries with highest immigration rates
ggplot(top_immigration_rate_9095, aes(x = reorder(VISUALIZATION_NAME, immigrates), y = immigrates)) +
  geom_point(shape = 1) +  
  coord_flip() + 
  labs(title = "Top Inflow",
       x = "Country", 
       y = "Immigration Rate") +
  theme_minimal()

ggplot(top_net_migration_9095, aes(x = reorder(VISUALIZATION_NAME, net), y = net)) +
  geom_point(shape = 1) + 
  coord_flip() +
  labs(title = "Top 10 Countries by Net Migration Rate",
       x = "Country", 
       y = "Net Migration Rate (Immigration - Emigration)") +
  theme_minimal()

First, if we focus on the emigration rate top 10 countries, we observe that key countries include Rwanda, Bhutan or Bosnia and Herzegovina. We can imagine that these high rates of emigration could be due to the different wars and conflicts that happened in that period such as the Rwandan Genocide (1994) or the Bosnian War (1992-1995) or due to high political instability for other countries.

Compared to the 1990-1995 period, the 2020-2024 had other primary drivers of emigration than war or political instability (Ukraine), such as climate change (that at first glance does not seem to be the main driver in 1990-1996). Indeed countries like Tonga and Samoa could have faced increased emigration due to climate-related impacts such as sea-level rise and extreme weather events.

So, while economic and political instability as well as war conflits seem to continue to drive emigration, we have climate change that seems to become an important factor in recent years, particularly affecting island nations vulnerable to environmental changes. We will see in this analysis if it is really the case.

Finally, to see these rates more visually, let us plot the choroplet maps of the immigration and emigration rate for the 1940-1950 period:

# Ensure world_contours_transformed is in the correct format for ggplot (sf object)
world_final_transformed_9095 <- st_as_sf(world_final_9095)

# Ensure world_final_transformed has a valid CRS
world_final_transformed_crs_9095 <- st_transform(world_final_transformed_9095, crs = "ESRI:54030")

# Create a choropleth map for immigrates (immigration rates per capita)
immigration_map_9095 <- ggplot(world_final_transformed_9095) +
  geom_sf(aes(fill = immigrates), color = NA) +
  scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
  labs(title = "Immigration Rates (per capita)",
       fill = "Immigrates") +
  theme_minimal()
immigration_map_9095

# Create a choropleth map for emigrates (emigration rates per capita)
emigration_map_9095 <- ggplot(world_final_transformed_9095) +
  geom_sf(aes(fill = emigrates), color = NA) +
  scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
  labs(title = "Emigration Rates (per capita)",
       fill = "Emigrates") +
  theme_minimal()
emigration_map_9095

# Create a choropleth map for net migration (net rates per capita)
net_map_9095 <- ggplot(world_final_transformed_9095) +
  geom_sf(aes(fill = net), color = NA) +
  scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
  labs(title = "Net Rates (per capita)",
       fill = "Net") +
  theme_minimal()
net_map_9095

2. Exploratory analysis

Before deep diving into the econometric analysis, let us first have a look at the initial correlation between our variables.

# First, select the numeric columns relevant for the analysis.
numeric_data_9095 <- world_final_9095 %>%
  select_if(is.numeric)

# Compute the correlation matrix using complete observations
cor_matrix_9095 <- cor(numeric_data_9095, use = "complete.obs")

# Visualize the correlation matrix focusing on the dependent variables
corrplot(cor_matrix_9095, method = "color", tl.cex = 0.8)

# First, select only the numeric explanatory variables (exclude dependent vars and the vars used to build them)
explanatory_vars_9095 <- world_final_9095 %>% 
  select_if(is.numeric) %>% 
  select(-emigrates,-total_out, -total_in, -immigrates, -net)

# Calculate correlations with emigrates and immigrates
cor_with_emigrates_9095 <- sapply(explanatory_vars_9095, function(x) cor(x, world_final_9095$emigrates, use = "complete.obs"))
cor_with_immigrates_9095 <- sapply(explanatory_vars_9095, function(x) cor(x, world_final_9095$immigrates, use = "complete.obs"))

# Identify the variable with the highest absolute correlation for emigrates
max_corr_emigrates_9095 <- names(which.max(abs(cor_with_emigrates_9095)))
max_corr_value_emigrates_9095 <- cor_with_emigrates_9095[max_corr_emigrates_9095]

# Identify the variable with the highest absolute correlation for immigrates
max_corr_immigrates_9095 <- names(which.max(abs(cor_with_immigrates_9095)))
max_corr_value_immigrates_9095 <- cor_with_immigrates_9095[max_corr_immigrates_9095]

# Display the results
cat("The variable most correlated with emigrates is:", max_corr_emigrates_9095, 
    "with a correlation of", round(max_corr_value_emigrates_9095, 3), "\n")

## The variable most correlated with emigrates is: lifeexp with a correlation of -0.255

cat("The variable most correlated with immigrates is:", max_corr_immigrates_9095, 
    "with a correlation of", round(max_corr_value_immigrates_9095, 3), "\n")

## The variable most correlated with immigrates is: dry with a correlation of 0.17

We can see that the emigrates and immigrates variables are not that correlated with the others. The variable most correlated with the emigration rate is the life expectancy with a coefficient of correlation of -0.252. This negative correlation suggests that countries with lower life expectancy tend to have higher emigration rates. This relationship may indicate that factors contributing to lower life expectancy, such as economic instability, or conflict, could drive emigration as individuals seek better living conditions elsewhere.

On the other hand, the variable most correlated with immigration rates is the average number of dry days during the year, with a correlation coefficient of 0.16. This positive correlation suggests that countries with more dry days tend to experience slightly higher immigration rates. This could be due to economic opportunities (as seen above in the top 3 countries receiving the most migrants with the United Arab Emirates or Qatar) or specific policies that attract immigrants despite the climatic conditions.

Also, to continue our comparison of climate events between the 1990-1995 period and the 2020-2024 period one, let us represent the average annual difference temperature (in celsius) during those two periods:

# Select the climatic variable to visualize
climatic_var <- "t2m_diff" 

# Create the choropleth map (1990-1995)
ggplot(world_final_transformed_9095) +
  geom_sf(aes(fill = !!sym(climatic_var)), color = NA) +
  scale_fill_viridis_c(option = "magma", na.value = "grey90") +
  labs(title = paste("Global Distribution of temperature difference (1990-1995)"),
       fill = climatic_var) +
  theme_minimal()

# Create the choropleth map (2020-2024)
ggplot(world_final_transformed) +
  geom_sf(aes(fill = !!sym(climatic_var)), color = NA) +
  scale_fill_viridis_c(option = "magma", na.value = "grey90") +
  labs(title = paste("Global Distribution of temperature difference (2020-2024)"),
       fill = climatic_var) +
  theme_minimal()

The comparison of these maps clearly shows the growing impact of climate change on global temperatures. As environmental conditions become more challenging in certain regions, it is likely that climate-related factors will play an increasingly significant role in shaping migration patterns. Understanding these trends will thus be crucial for developing policies that address both the causes and consequences of climate-induced migration.

3. Spatial weight matrix and spatial autocorrelation in the dependent variables

Regarding the choice of the spatial weight matrix, we consider the same weight matrix as in the 2020-2024 analysis, the matric lw. Indeed, the contiguity-based neighbors list (contiguity_nb) will remain the same regardless of the time period because geographic borders typically do not change over short periods like 1990-1995 vs. 2020-2024. Also, the centroids of countries generally do not change over time unless there are significant geopolitical changes (but that are not taken into account here). Thus, since the geographic data (borders and centroids) remained unchanged, the combined neighbors list remains the same.

Let us now directly plot a Moran scatterplot and perform a Moran test for both of our dependent variables emigrates and immigrates:

For the emigrates variable:

# Generate the Moran scatter plot using spdep's moran.plot
moran.plot(world_final_transformed_crs_9095$emigrates, lw,
           labels = world_final_transformed_crs_9095$`Country Code`,
           pch = 19)

# Perform Moran's I test on the dependent variable 'emigrates'
moran_test <- moran.test(world_final_transformed_crs_9095$emigrates, lw)

# Print the test results
print(moran_test)

## 
##  Moran I test under randomisation
## 
## data:  world_final_transformed_crs_9095$emigrates  
## weights: lw    
## 
## Moran I statistic standard deviate = 0.87666, p-value = 0.1903
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.033110699      -0.004329004       0.001823921

It seems there is no or slight positive spatial correlation for the variable emigrates when looking at the Moran scatterplot. However, the Moran test’s result confirms that, since we do not reject the null hypothesis (p-value > 0.05) we do not have spatial correlation and the OLM model might be well adapted.

For the immigrates variable:

# Generate the Moran scatter plot using spdep's moran.plot
moran.plot(world_final_transformed_crs_9095$immigrates, lw,
           labels = world_final_transformed_crs_9095$`Country Code`,
           pch = 19)

# Perform Moran's I test on the dependent variable 'emigrates'
moran_test <- moran.test(world_final_transformed_crs_9095$immigrates, lw)

# Print the test results
print(moran_test)

## 
##  Moran I test under randomisation
## 
## data:  world_final_transformed_crs_9095$immigrates  
## weights: lw    
## 
## Moran I statistic standard deviate = 3.659, p-value = 0.0001266
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.151335996      -0.004329004       0.001809931

Looking at the Moran scatterplot, there seems to have positive spatial correlation for the immigrates data.

The p-value of the Moran test being smaller than 0.05, we thus have that immigration rates are significantly spatially correlated.

4. Model selection

Let us start by fitting two OLM models (one with the emigrates variable as dependent variable and one with the immigrates variable as dependent one). As we have done and explained for the 2020-2024 period, we also take the logarithm of the variables population, natural disaster, and precipitation.

# OLS model for emigration
model_emigrates_9095 <- lm(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, 
                      data = world_final_transformed_crs_9095)

# OLS model for immigration
model_immigrates_9095 <- lm(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, 
                        data = world_final_transformed_crs_9095)

# Print model summaries
summary(model_emigrates_9095)

## 
## Call:
## lm(formula = emigrates ~ deflactor + lifeexp + log(population) + 
##     politicalstability + conflictpercapita + vulnerability + 
##     log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
##     ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.039672 -0.016802 -0.005574  0.005327  0.160641 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            1.965e-01  5.437e-02   3.614 0.000374 ***
## deflactor              4.654e-06  5.039e-06   0.924 0.356726    
## lifeexp               -1.298e-03  3.417e-04  -3.799 0.000188 ***
## log(population)       -2.085e-03  1.602e-03  -1.302 0.194427    
## politicalstability     6.689e-03  7.567e-03   0.884 0.377701    
## conflictpercapita      6.037e+00  1.294e+01   0.467 0.641223    
## vulnerability         -6.325e-03  4.687e-02  -0.135 0.892775    
## log(natural_disaster) -8.496e-04  7.894e-04  -1.076 0.283004    
## t2m_diff              -8.191e-03  9.870e-03  -0.830 0.407514    
## dry                   -4.298e-05  5.858e-05  -0.734 0.463965    
## log(prec_5days)       -1.142e-02  5.091e-03  -2.244 0.025827 *  
## ghwr_35               -1.361e-04  7.499e-05  -1.815 0.070850 .  
## hwf_upp                8.206e-04  4.578e-04   1.793 0.074405 .  
## unemployment_rate     -4.166e-05  3.837e-04  -0.109 0.913625    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02898 on 218 degrees of freedom
## Multiple R-squared:  0.1659, Adjusted R-squared:  0.1161 
## F-statistic: 3.334 on 13 and 218 DF,  p-value: 0.0001143

summary(model_immigrates_9095)

## 
## Call:
## lm(formula = immigrates ~ deflactor + lifeexp + log(population) + 
##     politicalstability + conflictpercapita + vulnerability + 
##     log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
##     ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.065551 -0.017981 -0.006006  0.007278  0.199524 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            1.577e-01  6.135e-02   2.570  0.01084 *  
## deflactor              6.762e-07  5.686e-06   0.119  0.90545    
## lifeexp               -1.930e-04  3.855e-04  -0.501  0.61716    
## log(population)       -8.125e-03  1.807e-03  -4.496 1.12e-05 ***
## politicalstability    -3.897e-03  8.538e-03  -0.456  0.64856    
## conflictpercapita      2.113e+01  1.460e+01   1.447  0.14925    
## vulnerability         -1.120e-01  5.289e-02  -2.118  0.03533 *  
## log(natural_disaster)  1.244e-03  8.907e-04   1.397  0.16380    
## t2m_diff               1.402e-03  1.114e-02   0.126  0.89994    
## dry                    1.919e-04  6.610e-05   2.903  0.00408 ** 
## log(prec_5days)        7.175e-03  5.744e-03   1.249  0.21295    
## ghwr_35                6.624e-06  8.462e-05   0.078  0.93767    
## hwf_upp               -4.285e-04  5.165e-04  -0.830  0.40769    
## unemployment_rate     -1.211e-03  4.329e-04  -2.796  0.00563 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0327 on 218 degrees of freedom
## Multiple R-squared:  0.2177, Adjusted R-squared:  0.1711 
## F-statistic: 4.667 on 13 and 218 DF,  p-value: 4.307e-07

The OLS model for emigration explains approximately 11.26% of the variance (Adjusted R² = 0.1126).

The analysis of the OLS regression results for the 1990-1995 period highlights the importance of life expectancy (emigration model), population (both models) and vulnerability (immigration model) in shaping migration patterns. While climate variables such as temperature difference and the number of dry days do not show strong statistical significance in the emigration model, the number of dry days is significantly associated with higher immigration rates. This suggests that while climate factors may not have been primary drivers of emigration in the early 1990s, they could have influenced immigration patterns, possibly due to economic opportunities in regions with specific climatic conditions.

Let us now check for spatial autocorrelation of the residuals :

# Extract residuals
residuals_emigrates_9095 <- residuals(model_emigrates_9095)
residuals_immigrates_9095 <- residuals(model_immigrates_9095)

# Moran I test for spatial autocorrelation in residuals
moran_resid_emigrates_9095 <- moran.test(residuals_emigrates_9095, lw)
moran_resid_immigrates_9095 <- moran.test(residuals_immigrates_9095, lw)

# Print results
print(moran_resid_emigrates_9095)

## 
##  Moran I test under randomisation
## 
## data:  residuals_emigrates_9095  
## weights: lw    
## 
## Moran I statistic standard deviate = 0.31776, p-value = 0.3753
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.009331343      -0.004329004       0.001848122

print(moran_resid_immigrates_9095)

## 
##  Moran I test under randomisation
## 
## data:  residuals_immigrates_9095  
## weights: lw    
## 
## Moran I statistic standard deviate = 1.0598, p-value = 0.1446
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.041017034      -0.004329004       0.001830805

Based on the Moran’s I test results and the OLS regression outcomes, it appears that the OLS model is reasonably adapted for the emigration data, as there is no significant spatial autocorrelation in the residuals. This suggests that the OLS model captures the underlying patterns in the emigration data well, without significant spatial dependencies affecting the results.

However, for the immigration data, the Moran’s I test indicates weak evidence of positive spatial autocorrelation (at the 10% level) in the residuals. This suggests that there might be some spatial clustering in the immigration data that the OLS model does not fully capture.

We check in the following part which model would be the most adapted for modeling emigrates and immigrates dependent variables:

For the emigrates variable:

res_test_emigrates_9095 <- lm.RStests(model_emigrates_9095, lw, test = "all")
summary(res_test_emigrates_9095)

##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## data:  
## model: lm(formula = emigrates ~ deflactor + lifeexp + log(population) +
## politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) + ghwr_35 +
## hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)
## test weights: lw
##  
##          statistic parameter p.value
## RSerr     0.044408         1  0.8331
## RSlag     0.022206         1  0.8815
## adjRSerr  1.077875         1  0.2992
## adjRSlag  1.055673         1  0.3042
## SARMA     1.100081         2  0.5769

Since none of the models testes are significant, as anticipated in the last steps, we decide to keep an OLM to model the emigrates variable during the 1990-1995 period.

Let us check with the AIC criteria if we got the same result:

# SLX model
slx_1_9095 <- lm(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)

slx_2_9095 <- lmSLX(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095, 
           listw = lw)

# SEM model
sem_emi_9095 <- errorsarlm(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095, 
           listw = lw)

# Lag model
lagm_emi_9095 <- lagsarlm(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095, 
           listw = lw)

# SDM model
durb_emi_9095 <- lagsarlm(emigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095, listw = lw, Durbin = TRUE)

c(olm = AIC(model_emigrates_9095), slx = AIC(slx_2_9095), lag = AIC(lagm_emi_9095),
  sem = AIC(sem_emi_9095), sdm = AIC(durb_emi_9095))

##       olm       slx       lag       sem       sdm 
## -969.1355 -953.1330 -967.1628 -967.1895 -951.1473

Again, the fact of choosing the OLM to model our emigrates variable is confirmed since it is the model that minimizes our AIC criterion.

For the immigrates variable:

res_test_immigrates_9095 <- lm.RStests(model_immigrates_9095, lw, test = "all")
summary(res_test_immigrates_9095)

##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## data:  
## model: lm(formula = immigrates ~ deflactor + lifeexp + log(population)
## + politicalstability + conflictpercapita + vulnerability +
## log(natural_disaster) + t2m_diff + dry + log(prec_5days) + ghwr_35 +
## hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)
## test weights: lw
##  
##          statistic parameter p.value
## RSerr      0.85803         1  0.3543
## RSlag      1.81467         1  0.1779
## adjRSerr   0.73575         1  0.3910
## adjRSlag   1.69238         1  0.1933
## SARMA      2.55042         2  0.2794

Again, none of the tests are significant, suggesting that the OLM or an SLX model would be the most adapted to model immigration.

Let us check the AIC criterion:

# SLX model
slx_1_9095 <- lm(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095)

slx_2_9095 <- lmSLX(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095, 
           listw = lw)

# SEM model
sem_immi_9095 <- errorsarlm(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095, 
           listw = lw)

# Lag model
lagm_immi_9095 <- lagsarlm(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095, 
           listw = lw)

# SDM model
durb_immi_9095 <- lagsarlm(immigrates ~ deflactor + lifeexp + log(population) + 
                      politicalstability + conflictpercapita + vulnerability + 
                      log(natural_disaster) + t2m_diff + dry + log(prec_5days) + 
                      ghwr_35 + hwf_upp + unemployment_rate, data = world_final_transformed_crs_9095, listw = lw, Durbin = TRUE)

c(olm = AIC(model_immigrates_9095), slx = AIC(slx_2_9095), lag = AIC(lagm_immi_9095),
  sem = AIC(sem_immi_9095), sdm = AIC(durb_immi_9095))

##       olm       slx       lag       sem       sdm 
## -913.1087 -905.6168 -912.8047 -912.1174 -903.7170

The OLS model has a lower AIC than the SLX and SDM models, suggesting it is a better fit compared to these alternatives. However, the Spatial Lag (LAG) model has the lowest AIC, indicating it might be the best fit among the models considered.

Given the weak evidence of spatial lag dependence from the Rao’s score diagnostics (RSlag p-value = 0.1234), the Spatial Lag model could be a reasonable choice, though we have more evidence that the OLM model would be the most appropriate.

Finally, to conclude we have:

For the emigrates variable:

stargazer::stargazer(model_emigrates_9095,type = "text")

## 
## =================================================
##                           Dependent variable:    
##                       ---------------------------
##                                emigrates         
## -------------------------------------------------
## deflactor                       0.00000          
##                                (0.00001)         
##                                                  
## lifeexp                        -0.001***         
##                                (0.0003)          
##                                                  
## log(population)                 -0.002           
##                                 (0.002)          
##                                                  
## politicalstability               0.007           
##                                 (0.008)          
##                                                  
## conflictpercapita                6.037           
##                                (12.937)          
##                                                  
## vulnerability                   -0.006           
##                                 (0.047)          
##                                                  
## log(natural_disaster)           -0.001           
##                                 (0.001)          
##                                                  
## t2m_diff                        -0.008           
##                                 (0.010)          
##                                                  
## dry                            -0.00004          
##                                (0.0001)          
##                                                  
## log(prec_5days)                -0.011**          
##                                 (0.005)          
##                                                  
## ghwr_35                        -0.0001*          
##                                (0.0001)          
##                                                  
## hwf_upp                         0.001*           
##                                (0.0005)          
##                                                  
## unemployment_rate              -0.00004          
##                                (0.0004)          
##                                                  
## Constant                       0.197***          
##                                 (0.054)          
##                                                  
## -------------------------------------------------
## Observations                      232            
## R2                               0.166           
## Adjusted R2                      0.116           
## Residual Std. Error        0.029 (df = 218)      
## F Statistic             3.334*** (df = 13; 218)  
## =================================================
## Note:                 *p<0.1; **p<0.05; ***p<0.01

Since we kept only the OLM model, we get the same results as already explained earlier above. For the immigrates variable:

stargazer::stargazer(model_immigrates_9095,lagm_immi_9095, type = "text")

## 
## ============================================================
##                                Dependent variable:          
##                       --------------------------------------
##                                     immigrates              
##                                 OLS              spatial    
##                                               autoregressive
##                                 (1)                (2)      
## ------------------------------------------------------------
## deflactor                     0.00000            0.00000    
##                              (0.00001)          (0.00001)   
##                                                             
## lifeexp                       -0.0002            -0.0002    
##                              (0.0004)            (0.0004)   
##                                                             
## log(population)              -0.008***          -0.008***   
##                               (0.002)            (0.002)    
##                                                             
## politicalstability            -0.004              -0.005    
##                               (0.009)            (0.008)    
##                                                             
## conflictpercapita             21.126              18.174    
##                              (14.597)            (14.094)   
##                                                             
## vulnerability                -0.112**            -0.115**   
##                               (0.053)            (0.051)    
##                                                             
## log(natural_disaster)          0.001              0.001     
##                               (0.001)            (0.001)    
##                                                             
## t2m_diff                       0.001              0.002     
##                               (0.011)            (0.011)    
##                                                             
## dry                          0.0002***          0.0002***   
##                              (0.0001)            (0.0001)   
##                                                             
## log(prec_5days)                0.007              0.007     
##                               (0.006)            (0.006)    
##                                                             
## ghwr_35                       0.00001            0.00001    
##                              (0.0001)            (0.0001)   
##                                                             
## hwf_upp                       -0.0004            -0.0004    
##                               (0.001)            (0.0005)   
##                                                             
## unemployment_rate            -0.001***          -0.001***   
##                              (0.0004)            (0.0004)   
##                                                             
## Constant                      0.158**            0.158***   
##                               (0.061)            (0.059)    
##                                                             
## ------------------------------------------------------------
## Observations                    232                232      
## R2                             0.218                        
## Adjusted R2                    0.171                        
## Log Likelihood                                   472.402    
## sigma2                                            0.001     
## Akaike Inf. Crit.                                -912.805   
## Residual Std. Error      0.033 (df = 218)                   
## F Statistic           4.667*** (df = 13; 218)               
## Wald Test                                     1.707 (df = 1)
## LR Test                                       1.696 (df = 1)
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

The coefficients, standard errors, and significance levels are almost identical between the OLS and LAG model, suggesting minimal spatial dependence in the data for this period (as expected). Thus we will focus on interpret in details the coefficients of the OLM model.

As for the precise interpretation for the 1990-1995 period for the OLM model we have: - log(population): Negative and significant, indicating that larger populations are associated with lower per capita immigration. - vulnerability: Negative and significant, suggesting that higher vulnerability is associated with lower immigration. - dry: Positive and significant, indicating that a higher number of dry days is associated with increased immigration. - unemployment_rate: Negative and significant, suggesting that higher unemployment rates are associated with lower immigration.

The variables t2m_diff, log(prec_5days) and ghwr_35 are not significant in the 1990-1995 period while they were in the 2020-2024 one, so that temperature differences are more influential in recent years, precipitation patterns are increasingly important and heat waves impact immigration in the recent years.

So to conclude on the impact of climatic variables on migratory flows, we have that while dry days were significant in the 1990-1995 period, temperature differences (t2m_diff) and precipitation patterns (log(prec_5days)) have become more influential in the 2020-2024 period. This shift suggests that climate change impacts are increasingly affecting migration patterns.

3. Bilateral migration flows

The aim of this part is to explain the bilateral migration flows using a gravity model. This model can be formalized as

\[ y = \iota N\alpha + X_O \beta_o + X_d \beta_d + G\omega + \epsilon \]

where \(\beta_o\), \(\beta_d\), and \(\omega\) are the vectors of parameters whose dimensions correspond to the number of variables in \(X_o\), \(X_d\), and \(G\), respectively.

\(X_o\) represents the characteristics of the origin countries
\(X_d\) represents the characteristics of the destination countries
\(G\) represents the measures related to a pair of countries, like distances between origin and destination, or indicators for a common spoken language

Q18. Merging migration flow estimate with explanatory variables

Let us now, from the migration data, merge the migration flow estimate we have chosen at the beginning of this analysis with explanatory variables.

Note that we will only keep the observations whose origin and destination are present in the world database.

# Selecting only from the migration data (already filtered on 2020-2024 period) the variables we are interested in 
mig_flow <- subset(migration, select = c(dest, origin, reverse_neg))
colnames(mig_flow)[3] <- "y"
 
# Merging the databases to get the covariates associated to each origin point

# Rename 'Country Code' to 'iso3' for clarity
covariates <- my_covariates_period
names(covariates)[names(covariates) == "Country Code"] <- "iso3"

# Prepare world list
world_codes <- world$'Country Code'  # extract valid codes

# Create origin covariates with O_ prefix
cov_origin <- covariates %>%
  rename_with(~ paste0("O_", .), .cols = -iso3)

# Create destination covariates with D_ prefix
cov_dest <- covariates %>%
  rename_with(~ paste0("D_", .), .cols = -iso3)

# Merge origin covariates
mig_with_origin <- mig_flow %>%
  left_join(cov_origin, by = c("origin" = "iso3"))

# Merge destination covariates
mig_full <- mig_with_origin %>%
  left_join(cov_dest, by = c("dest" = "iso3"))

# Filter to include only countries in 'world'
data_od <- mig_full %>%
  filter(origin %in% world_codes & dest %in% world_codes)

head(data_od)

Q19. Computing the distances between each pairs O-D

#Get country centroids
centroids <- world_contours_transformed %>%
  mutate(centroid = st_centroid(geometry)) %>%
  select(ISO3, centroid)

# Join origin and destination centroids to your data_od
data_od <- data_od %>%
  left_join(centroids, by = c("origin" = "ISO3")) %>%
  rename(geometry_O = centroid) %>%
  left_join(centroids, by = c("dest" = "ISO3")) %>%
  rename(geometry_D = centroid)

# Compute pairwise distances (in meters, since Robinson is in meters)
data_od <- data_od %>%
  mutate(
    dist_m = st_distance(geometry_O, geometry_D, by_element = TRUE),
    dist_km = as.numeric(dist_m) / 1000  # convert to kilometers
  )

data_od <- data_od %>%
  select(-geometry_O, -geometry_D)

Q20. Estimating a gravity model

Here, we prefer to consider the following model :

\[ log(y+1) = \iota N\alpha + X_O \beta_o + X_d \beta_d + G\omega + \epsilon \]

We prefer to use a log transformation of y in order to first deal with zeros. Indeed we happen to have 0 migration flows between some pairs. Log(0) being undefined, we prefer to use log(y + 1) to handle it. Moreover this transformation allows to reduce the right-skewness that we have in our migration data where a few country pairs have large flows but most have small or zeros flows.

Also, this makes the coefficients interpretable as semi-elasticities, for exemple: “A 1-unit increase in destination life expectancy is associated with a X% increase in bilateral migration flows, ceteris paribus.”

data_od$y = log(data_od$y + 1)

df_gravity <- select(data_od, -dest, -origin, -O_period, -D_period, -geometry.x, -geometry.y, -dist_m)

gravity_model <- lm(y ~ ., data = df_gravity)

summary(gravity_model)

## 
## Call:
## lm(formula = y ~ ., data = df_gravity)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.6979 -0.9096 -0.4408  0.1324 13.3283 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1.611e+00  2.630e-01   6.125 9.13e-10 ***
## O_deflactor          -7.864e-04  2.349e-04  -3.348 0.000816 ***
## O_lifeexp             1.347e-02  1.865e-03   7.223 5.17e-13 ***
## O_population          1.474e-09  8.336e-11  17.683  < 2e-16 ***
## O_politicalstability -2.610e-01  1.256e-02 -20.775  < 2e-16 ***
## O_conflictpercapita   3.169e+01  1.548e+01   2.047 0.040678 *  
## O_vulnerability      -2.180e+00  1.632e-01 -13.356  < 2e-16 ***
## O_natural_disaster    1.154e-08  1.208e-09   9.551  < 2e-16 ***
## O_t2m_diff            3.302e-02  2.048e-02   1.612 0.106912    
## O_dry                -1.239e-03  2.563e-04  -4.834 1.34e-06 ***
## O_prec_5days          5.090e-04  1.486e-04   3.425 0.000615 ***
## O_ghwr_35            -6.036e-04  3.277e-04  -1.842 0.065528 .  
## O_hwf_upp            -7.651e-04  1.889e-04  -4.051 5.11e-05 ***
## O_unemployment_rate  -8.852e-03  1.704e-03  -5.194 2.06e-07 ***
## D_deflactor          -1.434e-03  2.349e-04  -6.102 1.05e-09 ***
## D_lifeexp             1.553e-02  1.865e-03   8.326  < 2e-16 ***
## D_population          2.858e-10  8.336e-11   3.429 0.000607 ***
## D_politicalstability  2.053e-02  1.256e-02   1.635 0.102150    
## D_conflictpercapita   2.429e+01  1.548e+01   1.569 0.116702    
## D_vulnerability      -3.297e+00  1.632e-01 -20.202  < 2e-16 ***
## D_natural_disaster    6.901e-09  1.208e-09   5.713 1.12e-08 ***
## D_t2m_diff            7.559e-02  2.048e-02   3.691 0.000223 ***
## D_dry                -2.540e-04  2.563e-04  -0.991 0.321760    
## D_prec_5days          4.506e-04  1.486e-04   3.032 0.002432 ** 
## D_ghwr_35             1.832e-04  3.277e-04   0.559 0.576075    
## D_hwf_upp            -2.882e-03  1.889e-04 -15.259  < 2e-16 ***
## D_unemployment_rate  -1.407e-02  1.704e-03  -8.258  < 2e-16 ***
## dist_km              -6.068e-05  1.549e-06 -39.176  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.814 on 53796 degrees of freedom
## Multiple R-squared:  0.1466, Adjusted R-squared:  0.1462 
## F-statistic: 342.2 on 27 and 53796 DF,  p-value: < 2.2e-16

The gravity model explains our bilateral migration flows not reasonably well with an adjusted R² of 0.15.

Most of the the covariates of the origin and destination country are statistically significant. The distance variables (dist_km) is also statistically significant at the 5% significant level.

As expected, distance between countries has a strong negative effect on migration, confirming that distance acts as a ‘friction’ to migration (probably due to costs, linguistic differences, etc.).

Origin country characteristics such as low political stability, high population, and higher natural disaster exposure are positively associated with emigration. Conversely, vulnerability and unemployment in the origin reduce migration, possibly due to immobility traps.

On the destination side, higher life expectancy, population, and political stability are strong pull factors, while higher vulnerability and unemployment discourage inflows.

Q21. Investigating spatial correlation in the residuals

# we add the residuals of our model to our dataframe
data_od$residuals <- residuals(gravity_model)

Let us now check for spatial correlation in the residuals using two Moran tests:

for the residuals aggregated at origin level:

# Origin
residuals_origin <- data_od %>%
group_by(origin) %>%
summarise(mean_residual = mean(residuals))

test_origin <- moran.test(residuals_origin$mean_residual, lw)
print(test_origin)

## 
##  Moran I test under randomisation
## 
## data:  residuals_origin$mean_residual  
## weights: lw    
## 
## Moran I statistic standard deviate = 1.3808, p-value = 0.08368
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.055957459      -0.004329004       0.001906360

The results of the Moran’s I test for the origin-aggregated residuals suggest some evidence of spatial autocorrelation, though it is quite weak. Specifically, the Moran’s I statistic is 0.0667, with a p-value of 0.05667. This p-value is slightly above the 5% significance threshold, indicating that the spatial autocorrelation is not statistically significant at the conventional 5% level, but it is close to being significant.

Given the Moran’s I statistic is positive, it suggests that there might be some spatial clustering in the residuals, meaning that the residuals at certain origins may be more similar to those at nearby origins than would be expected by chance.

The test result implies that while there may be some spatial structure in the residuals, there is no strong evidence to do implement a spatial regression approach for the origin side of the model.

for the residuals aggregated at destination level:

residuals_dest <- data_od %>%
group_by(dest) %>%
summarise(mean_residual = mean(residuals))

test_dest <- moran.test(residuals_dest$mean_residual, lw)
print(test_dest)

## 
##  Moran I test under randomisation
## 
## data:  residuals_dest$mean_residual  
## weights: lw    
## 
## Moran I statistic standard deviate = 3.5386, p-value = 0.0002011
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.149854206      -0.004329004       0.001898533

The Moran’s I test on destination-aggregated residuals indicates significant positive spatial autocorrelation at the 5% significance level (Moran’s I = 0.0795, p-value = 0.031), suggesting that residuals are not randomly distributed in space. This points to potential unobserved spatial factors influencing destination attractiveness, and may need the inclusion a spatial regression approach.

Q22. Spatial lags of the residuals and Moran scatterplots

#Get the country-level neighborhood
#combined_nb <- union.nb(contiguity_nb, knn_nb)

# Convert to a row-standardized sparse matrix W (country-level)
W <- as(Matrix::Matrix(as.matrix(nb2mat(combined_nb, style = "W")), sparse = TRUE), "dgCMatrix")

#Build flow-level neighborhood matrices
n <- nrow(W)  # number of countries

In <- diag(n)

# Kronecker products for flow-level spatial matrices
Wo <- kronecker(W, In)  # origin-based
Wd <- kronecker(In, W)  # destination-based
Ww <- kronecker(W, W)   # origin-to-destination

Y <- matrix(data_od$residuals, nrow = n, ncol = n, byrow = TRUE)
y_vec <- as.vector(Y)  # This is vec(Y), stacking columns

#Compute spatial lags
lag_Wo <- as.vector(Wo %*% y_vec)
lag_Wd <- as.vector(Wd %*% y_vec)
lag_Ww <- as.vector(Ww %*% y_vec)

Finally, we plot all three Moran scatterplots:

#Moran scatter plots
par(mfrow = c(1, 3))

# Wo plot
plot(y_vec, lag_Wo, pch = 19, main = "Wo-based Moran Plot", xlab = "residuals", ylab = "Wo x residuals")
abline(lm(lag_Wo ~ y_vec), col = "red")

# Wd plot
plot(y_vec, lag_Wd, pch = 19, main = "Wd-based Moran Plot", xlab = "residuals", ylab = "Wd x residuals")
abline(lm(lag_Wd ~ y_vec), col = "red")

# Ww plot
plot(y_vec, lag_Ww, pch = 19, main = "Ww-based Moran Plot", xlab = "residuals", ylab = "Ww x residuals")
abline(lm(lag_Ww ~ y_vec), col = "red")

What have we done here ? Well, to evaluate the presence of spatial autocorrelation in the residuals of the estimated model, Moran scatterplots were constructed using three different flow-based spatial weight matrices: one based on the neighborhoods of origin countries (Wo), one on destination countries (Wd), and one capturing the interaction of both origin and destination neighborhoods (Ww).

The Moran plot based on the Wo matrix (left) reveals a positive slope, indicating the potential presence of positive spatial autocorrelation among residuals when flows are grouped by their origin countries’ neighborhoods. This suggests that countries that are geographically proximate as senders of flows tend to exhibit similar residual patterns so that the model may be omitting spatially structured variables relevant to origin-based characteristics.

The Moran plot for the Wd matrix (middle), shows a slightly positive trend line, implying minimal or no spatial autocorrelation in the residuals based on destination-based proximity.

The Ww matrix (right), constructed as the Kronecker product of the origin and destination matrices, captures potential spatial autocorrelation driven by complex origin-to-destination interactions. The corresponding Moran plot shows no clear trend, with the regression line being nearly horizontal. This suggests that there is no systematic interaction-based spatial dependence between origin and destination pairs that is left unexplained by the model.

Bibliography

Abel, G.J., & Cohen, J.E. (2019). Bilateral international migration flow estimates for 200 countries. Scientific Data, 6, 82. https://doi.org/10.1038/s41597-019-0089-3
Berlemann, M., Haustein, E., & Steinhardt, M.F. (2021). From Stocks to Flows – Evidence for the Climate-Migration-Nexus. IZA Discussion Paper No. 14450. https://www.iza.org/publications/dp/14450/from-stocks-to-flows-evidence-for-the-climate-migration-nexus
Laurent, T., Margaretic, P., & Thomas-Agnan, C. (2022). Neighbouring countries and bilateral remittances: a global study. Spatial Economic Analysis, 17(4), 557-584. https://doi.org/10.1080/17421772.2022.2070656
World Bank. (n.d.). Unemployment, total (% of total labor force) (modeled ILO estimate). World Bank Open Data. https://data.worldbank.org/indicator/SL.UEM.TOTL.ZS

Project in Spatial Econometrics

Emma Bastien, Coralie Sorbet

1. Data

1.1 Migration flow data

Q1. Method Chosen Description

Q2. Analysis of Flow Maps for the Reverse Negative Method

1.2 Contours data

Q3. World contour boundaries import

1.3 Explanatory variables at country level

Q4. New Variable

2 Emigration and immigration rates by country

Q5. Total in and out flows by country using reverse negative measure

Q6. Merging the databases

Q7. Immigration and emigration rates

Q8. Exploratory data analysis

Q9. Choropleth map

Q10. Map of one climatic variable.

Q11. Spatial weight matrix definition

Q12. Moral Scatter plot

Q13. Statistical test for the spatial correlation in dependent variables

Q14. OLM estimates

Q15. Model choice

Q16. Results presentation

Q17. Spatial econometric analysis on the 1990-1995 period

1. Preprocessing : construction of our dataset for the 1990-1995 period

2. Exploratory analysis

3. Spatial weight matrix and spatial autocorrelation in the dependent variables

4. Model selection

3. Bilateral migration flows

Q18. Merging migration flow estimate with explanatory variables

Q19. Computing the distances between each pairs O-D

Q20. Estimating a gravity model

Q21. Investigating spatial correlation in the residuals

Q22. Spatial lags of the residuals and Moran scatterplots

Bibliography