Introduction

Back in those innocent days of 2015, the “migration crisis” in Europe was dominating the headlines all over the continent. At the time I’d been interested in diving into the data around the actual migration figures in European countries, to see if there was any data signal behind the narrative (there is btw). As the way life goes I never got around to it but recently I was able to block out some time to dive into the data, sourced from the EU itself.

This is a simple descriptive analysis including any explorations, warts and all. I hope the final visualisations are able to capute the essence of the when, where and scale of the crisis.

Also, please remember that the data used here is for total immigration and emmigration for each country and so includes migration within the region, as well as migration to / from the region.

Import and Clean

First we import the data from Eurostat and clean it to the best we can. Overall the data quality is pretty excellent.

emmigration_raw <- readr::read_tsv("./data/tps00177_emmigration.tsv")
immigration_raw <- readr::read_tsv("./data/tps00176_immigration.tsv")
population_raw <- readr::read_tsv("./data/tps00001_population.tsv")
gdp_raw <- readr::read_tsv("./data/tec00001_GDP.tsv")
# Country codes are ISO standard, except for Greece 
# (GR -> EL) and United Kingdom (GB -> UK).
# http://www.nationsonline.org/oneworld/country_code_list.htm
country_codes <- readr::read_csv("./data/world_country_codes.csv")

I had to filter out certain years and countries in order to get a complete dataset and have a consistent set of countries to compare. Randomly adding/subtracting countries in certain years could give false impressions.
On the chopping block was Bulgaria, Belgium, Liechtenstein and Romania. You’ll have to ask their respective national statistics offices why they didn’t report their data for certain years! The year 2005 was quite incomplete so was also dropped.

imm_clean <-
  immigration_raw %>% mutate_at(.vars = vars(starts_with("2")),
                                .funs = stringr::str_extract, 
                                 pattern = "[0-9]+") %>% 
  dplyr::rename(variable = "citizen,agedef,age,unit,sex,geo\\time") %>% 
  separate(variable, into = c("a", "b", "c", "d", "e", "country_code"),
           sep = ",") %>%
  filter(b == "COMPLET") %>% 
  select(-a, -b, -c, -d, -e) %>% 
  gather(key = year, value = immigrants, `2005`:`2016`) %>% 
  mutate(immigrants = as.numeric(immigrants))

emm_clean <-
  emmigration_raw %>% mutate_at(.vars = vars(starts_with("2")),
                                .funs = stringr::str_extract, 
                                 pattern = "[0-9]+") %>% 
  dplyr::rename(variable = "citizen,agedef,age,unit,sex,geo\\time") %>% 
  separate(variable, into = c("a", "b", "c", "d", "e", "country_code"),
           sep = ",") %>% 
  filter(b == "COMPLET") %>% 
  select(-a, -b, -c, -d, -e) %>% 
  gather(key = year, value = emmigrants, `2005`:`2016`) %>% 
  mutate(emmigrants = as.numeric(emmigrants))

pop_clean <-
  population_raw %>% mutate_at(.vars = vars(starts_with("2")),
                                .funs = stringr::str_extract, 
                                 pattern = "[0-9]+") %>% 
  dplyr::rename(variable = "indic_de,geo\\time") %>% 
  separate(variable, into = c("a", "country_code"),
           sep = ",") %>% 
  select(-a) %>% 
  gather(key = year, value = population, `2006`:`2017`) %>% 
  mutate(population = as.numeric(population))
  

gdp_clean <- 
  gdp_raw %>% 
  mutate_at(.vars = vars(starts_with("2")),
                                .funs = stringr::str_extract, 
                                 pattern = "[0-9]+") %>% 
  dplyr::rename(variable = "na_item,unit,geo\\time") %>% 
  separate(variable, into = c("a", "gdp_metric", "country_code"),
           sep = ",") %>% 
  filter(gdp_metric %in% c("CP_EUR_HAB", "CP_MPPS")) %>% 
  select(-a) %>% 
  gather(key = year, value = gdp, `2006`:`2017`) %>% 
  mutate(gdp = as.numeric(gdp))
  

migration_clean <- 
  imm_clean %>% 
  left_join(emm_clean, by = c("country_code", "year")) %>% 
  left_join(pop_clean, by = c("country_code", "year")) %>%
  left_join(filter(gdp_clean, gdp_metric == "CP_EUR_HAB"),
            by = c("country_code", "year")) %>%
  left_join(filter(gdp_clean, gdp_metric == "CP_MPPS"),
            by = c("country_code", "year")) %>%
  left_join(country_codes, by = c("country_code" = "a2_code")) %>% 
  mutate(net_migration = immigrants - emmigrants,
         net_migration_per_cap = net_migration / population) %>% 
  rename(gdp_per_capita = gdp.x, gdp_pps = gdp.y) %>% 
  select(-a3_code, -num_code, -gdp_metric.x, -gdp_metric.y) %>% 
  filter(!(country %in% c("Bulgaria", "Belgium",
                          "Liechtenstein", "Romania"))) %>% 
  filter(year != 2005) %>% 
  mutate(gdp_pps_per_net_migrant = gdp_pps / net_migration)

rm(imm_clean, emm_clean, pop_clean, gdp_clean)
rm(immigration_raw, emmigration_raw, population_raw, gdp_raw)

is.na(migration_clean) %>% sum() # No NA's for this subset!
## [1] 0
glimpse(migration_clean)
## Observations: 308
## Variables: 11
## $ country_code            <chr> "AT", "CH", "CY", "CZ", "DE", "DK", "E...
## $ year                    <chr> "2006", "2006", "2006", "2006", "2006"...
## $ immigrants              <dbl> 98535, 127586, 13077, 68183, 661855, 5...
## $ emmigrants              <dbl> 74432, 88218, 2778, 33463, 639064, 467...
## $ population              <dbl> 8254298, 7459128, 744013, 10223577, 82...
## $ gdp_per_capita          <dbl> 32400, 45600, 21700, 12100, 29500, 415...
## $ gdp_pps                 <dbl> 257511, 279431, 18638, 201515, 2334900...
## $ country                 <chr> "Austria", "Switzerland", "Cyprus", "C...
## $ net_migration           <dbl> 24103, 39368, 10299, 34720, 22791, 996...
## $ net_migration_per_cap   <dbl> 0.0029200545, 0.0052778287, 0.01384250...
## $ gdp_pps_per_net_migrant <dbl> 10.683774, 7.097922, 1.809690, 5.80400...

Initial Exploration

List of countries under analysis:

migration_clean$country %>% unique()
##  [1] "Austria"        "Switzerland"    "Cyprus"         "Czech Republic"
##  [5] "Germany"        "Denmark"        "Estonia"        "Greece"        
##  [9] "Spain"          "Finland"        "France"         "Croatia"       
## [13] "Hungary"        "Ireland"        "Iceland"        "Italy"         
## [17] "Lithuania"      "Luxembourg"     "Latvia"         "Malta"         
## [21] "Netherlands"    "Norway"         "Poland"         "Portugal"      
## [25] "Sweden"         "Slovenia"       "Slovakia"       "United Kingdom"
migration_clean$country %>% unique() %>% length()
## [1] 28

Bulgaria, Belgium, Liechtenstein and Romania are not included due to missing data.

Overall there seems to have been a net migration into these European countries for each year of the series. When divided out by country we can see Germany in 2015 was a huge outlier.

migration_year_summary <-
  migration_clean %>% 
  group_by(year) %>% 
  summarise(sum_immigrants = sum(immigrants),
            sum_emmigrants = sum(emmigrants),
            total_population = sum(population),
            total_net_migration = sum(net_migration),
            mean_net_migration_per_capita = mean(net_migration_per_cap))

migration_country_summary <-
  migration_clean %>% 
  group_by(country) %>% 
  summarise(sum_immigrants = sum(immigrants),
            sum_emmigrants = sum(emmigrants),
            mean_population = mean(population),
            total_net_migration = sum(net_migration),
            mean_net_migration_per_capita = mean(net_migration_per_cap))

# European-wide immigration vs emmigration
migration_year_summary %>% ungroup() %>% 
  select(year, sum_immigrants, sum_emmigrants) %>% 
  gather(key = "movement_type", value = "count",
         sum_immigrants:sum_emmigrants) %>% 
  ggplot(aes(year, count, colour = movement_type, group = movement_type)) + 
  geom_line(size = 1.5) +
  blorg::blog_theme +
  scale_colour_manual(values = blorg::blog_palette)

# Country number immigrants
migration_clean %>% ungroup() %>% 
  select(country, year, immigrants) %>% 
  ggplot(aes(year, immigrants, colour = country, group = country)) + 
  geom_line(size = 1.5) +
  blorg::blog_theme 

# Net migration per capita
migration_clean %>% 
  select(country, year, net_migration_per_cap) %>% 
  ggplot(aes(year, net_migration_per_cap * 1000, 
             colour = country, group = country)) + 
  geom_line(size = 1.5) +
  blorg::blog_theme 

Net Migration

If we compare the 3 countries with the largest amount of positive net migration over the period (Germany, Italy, UK) a couple of things stand out.

  1. The rest of the region had large positive net migration before the financial crisis, and it started to recover after 2013.
  2. Germany in 2015 was a ridiculous outlier, comparable to the rest of the region pre-2008.
most_net_migrants <-
  migration_country_summary %>% ungroup() %>% 
  arrange(total_net_migration) %>% 
  select(country, total_net_migration) %>% 
  top_n(3)

migration_clean$countries_most_net_migrants <-
  if_else(migration_clean$country %in% most_net_migrants$country, 
          migration_clean$country, "Other")
  
migration_clean %>% 
  select(countries_most_net_migrants, year, net_migration) %>% 
  group_by(countries_most_net_migrants, year) %>% 
  summarise(total_net_migration = sum(net_migration)) %>% 
  ggplot(aes(year, total_net_migration, 
             colour = countries_most_net_migrants, 
             group = countries_most_net_migrants)) + 
  geom_path(size = 1.5) +
  blorg::blog_theme +
  scale_colour_manual(values = blorg::blog_palette)

This is just a different way of visualising the above using bars. Note the huge dip in the 2009-2013 period for the overall region which gets a bit lost in the previous line plot.

migration_clean %>% 
  select(countries_most_net_migrants, year, net_migration) %>% 
  group_by(countries_most_net_migrants, year) %>% 
  summarise(total_net_migration = sum(net_migration)) %>% 
  ggplot(aes(year, total_net_migration, 
             fill = countries_most_net_migrants, 
             group = countries_most_net_migrants)) + 
  geom_bar(position = "stack", stat = "identity") +
  blorg::blog_theme +
  scale_fill_manual(values = blorg::blog_palette)

Net Migration Per Capita

The problem with net migration on its own is that it can be skewed by the larger countries (Germany etc.) in regards the “impact” on the local government and population. I.e., if 1000,000 people land in Germany in 1 year (net), it’s disruptive but not catastrophic; if the same number landed on Iceland (population ~ 300,000) it would be a humanitarian disaster.

There are a couple of ways I thought to look at this, one is per capita, the other is against some economic metric like GDP. Below is a sequence of bar charts of the net migration per 1000 for each country, sorted by population (Germany largest, Iceland smallest). It was the best way I could think of plotting it while keeping all the data on the page.

country_order_by_pop <-
  (migration_clean %>% 
  filter(year == 2016) %>% 
  arrange(desc(population)))$country
  

migration_clean %>% 
  select(country, year, net_migration_per_cap) %>%
  mutate(country = factor(country, levels = country_order_by_pop)) %>% 
  ggplot(aes(year, net_migration_per_cap * 1000, 
             fill = country, group = country)) + 
  geom_bar(stat = "identity") +
  blorg::blog_theme +
  facet_wrap(~country, ncol = 7) +
  theme(legend.position = "none") +
  labs(title = "European Net Migration per 1000 - Ordered By 2016 Population", x = "Year", y = "Net Migration per 1000 Population") +
  scale_x_discrete(breaks = scales::pretty_breaks(n = 5))

What I like about this plot, is that each one tells a different story for each country but comparisons between plots are still quite easy. What really jumps out to me is how some countries have stayed so relatively stable (e.g. Finland, United Kingdom) over a long period of time which includes some titanic economic and political events, while others have swung wildly (e.g. Germany, Ireland, Cyprus). The 2008 financial crisis was obviously a huge contributor to the overall picture.

Mapping Migration

Mapping out the gross and per capital net migration we see there are quite strong differences, for example Germany really stands out on the gross figure, but under per capita it seems to above average but not exceptional.

First the net migration:

# http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts#nuts13
euro_map <- 
  maptools::readShapePoly(fn="./data/NUTS/NUTS_RG_60M_2013_4326.shp")

euro_limits <- data.frame(lon = c(60.64878, 24.08464, -31.26192, 56.00000),
                          lat = c(80.58823, 34.83469, 39.45479, 74.00000))

euro_map_df <- euro_map %>% fortify(region='NUTS_ID')
euro_map_df <-
  euro_map_df %>% 
  left_join(country_codes, by = c("id" = "a2_code")) %>% 
  left_join(migration_country_summary, by = "country") %>% 
  filter(long > min(euro_limits$lon) & long < max(euro_limits$lon) &
           lat  > min(euro_limits$lat) & lat  < max(euro_limits$lat),
         !is.na(country)) # remove lower NUTS regsions

m0 <- ggplot(data=euro_map_df)

m0 + geom_polygon(aes(x=long, y=lat, group=group, 
                            fill=total_net_migration)) +
  geom_path(aes(x=long, y=lat, group=group), color='black') +
  coord_equal() +
  theme_void() + 
  scale_fill_gradientn(colours=topo.colors(3),
                       labels = scales::comma,
                        name = "Total Net Migration\n 2006-2016" )

Then the net migration per capita:

m0 + geom_polygon(aes(x=long, y=lat, group=group, 
                            fill=(mean_net_migration_per_capita * 1000))) +
  geom_path(aes(x=long, y=lat, group=group), color='black') +
  coord_equal() +
  theme_void() + 
  scale_fill_gradientn(colours=topo.colors(3),
                       labels = scales::comma,
                        name = "Mean Net Migration\nper 1000 Population\n2006-2016")

Luxembourg is a pretty big outlier per 1000 but the only has a population of approximately 600,000, that’s only about 9,000 net migrants in 10 years.

Now let’s look More closely at 2015 itself when the Germany accepted so many immigrants.

euro_map_2015_df <-
  euro_map %>% fortify(region='NUTS_ID') %>% 
  left_join(country_codes, by = c("id" = "a2_code")) %>% 
  left_join((migration_clean %>% filter(year == 2015)),
            by = "country") %>% 
  filter(long > min(euro_limits$lon) & long < max(euro_limits$lon) &
           lat  > min(euro_limits$lat) & lat  < max(euro_limits$lat),
         !is.na(country)) # remove lower NUTS regsions

m_2015 <- ggplot(data=euro_map_2015_df)

m_2015 + geom_polygon(aes(x=long, y=lat, group=group, 
                            fill=net_migration)) +
  geom_path(aes(x=long, y=lat, group=group), color='black') +
  coord_equal() +
  theme_void() + 
  scale_fill_gradientn(colours=topo.colors(3),
                       labels = scales::comma,
                        name = "Net Migration\n 2015" )

m_2015 + geom_polygon(aes(x=long, y=lat, group=group, 
                            fill=(net_migration_per_cap * 1000))) +
  geom_path(aes(x=long, y=lat, group=group), color='black') +
  coord_equal() +
  theme_void() + 
  scale_fill_gradientn(colours=topo.colors(3),
                       labels = scales::comma,
                        name = "Net Migration\nper 1000 Population\n2015")

Summary

The biggest take away I have from this exercise is that each country has gone through it’s own very particular story around net migration. Ireland and the dramatic decline from 2008 onwards, Germany and it’s huge peak in 2015, the outsized impact on the mediterranean states that are closer to Africa and the Middle East, Cyprus plummeting into negative territory after the 2010 European banking crisis etc.

On a larger scale the overall drop in net migratioin to the whole region from 2009-2013 is also quite remarkable when compared to the current situation and before 2008. It begs many political questions around future planning and current “sharing of the load” across the region. Luckily I can leave that to the political analysts, numbers are far easier to understand than people.