Intro

As a German at the European Geosciences Union General Assembly (EGU GA) 2018, you sometimes feel like there are just a lot of people from Germany at the conference. I sometimes feel bad for everybody else as we happily chat away in German, not always taking notice of excluding others from a conversation. But is that really true? There are a lot of Germans in Europe, and many research institutions with scientsts from all over the world. So let’s load some packages and take a look at the data!

library("eurostat")
library("lubridate")
library("dplyr")
library("rvest")
library("stringr")
library("knitr")

Get population data

Download popoulation data from Eurostat using the eurostat package.

(We could probably do better here and only take into account working population…)

code_pop <- eurostat::search_eurostat("Population on 1 January", type = "table") %>%
  head(n = 1) %>% .$code
data_pop <- get_eurostat(code_pop, lang = "en", cache = FALSE)

data_pop_2017 <- data_pop %>%
  filter(year(time) == 2017) %>%
  label_eurostat(lang = "en") %>%
  select(geo, values)

# adjust country names to match EGU website
data_pop_2017 <- data_pop_2017 %>%
  mutate(country = case_when(
      grepl(x = .$geo, pattern = ".*Macedonia*") ~ "Macedonia, The Former Yugoslav Republic Of",
      grepl(x = .$geo, pattern = "Bosnia.*") ~ "Bosnia And Herzegovina",
      grepl(x = .$geo, pattern = "Germany.*") ~ "Germany",
      TRUE ~ as.character(geo)
    )
  ) %>%
  select(country, inhabitants = values)

Download participant data

webpage <- read_html("https://egu2018.eu/")

participant_stats <- html_table(webpage)[[1]]
names(participant_stats) <- c("country", "participants")
participant_stats <- participant_stats %>%
  mutate(participants = str_replace_all(participant_stats$participants, "[\\.,\\,]", "")) %>%
  mutate(participants = as.integer(participants))

kable(head(participant_stats))
country participants
Germany 2451
United Kingdom 1385
France 1097
Italy 1095
United States 957
China 756

EGU’s participant data supports the personal impression: Germany is the country with most participants at the GA, beating United Kingdom with a comfortable margin of 1066 people.

Analyse

data <- left_join(participant_stats, data_pop_2017, by = "country") %>%
  filter(!is.na(inhabitants)) %>%
  mutate(`participants/100 000 inhabitants` = participants/inhabitants * 100000)

options(scipen = 999)
options(knitr.kable.NA = '')

kable(summary(data), caption = "Summary statistics of EGU participant data")
Summary statistics of EGU participant data
country participants inhabitants participants/100 000 inhabitants
Length:43 Min. : 1.0 Min. : 37550 Min. : 0.02829
Class :character 1st Qu.: 12.0 1st Qu.: 2862248 1st Qu.: 0.38010
Mode :character Median : 35.0 Median : 7040272 Median : 1.36769
Mean : 253.4 Mean :16032527 Mean : 1.93544
3rd Qu.: 223.5 3rd Qu.:11059960 3rd Qu.: 2.43802
Max. :2451.0 Max. :82521653 Max. :10.34435

The average number of participants per 100 000 inhabitants is 1.9354365.

Result table

data_arranged <- data %>% arrange(desc(`participants/100 000 inhabitants`))
kable(data_arranged, row.names = TRUE)
country participants inhabitants participants/100 000 inhabitants
1 Iceland 35 338349 10.3443486
2 Austria 734 8772865 8.3667080
3 Switzerland 655 8419550 7.7795132
4 Norway 333 5258317 6.3328247
5 Andorra 3 73105 4.1036865
6 Finland 178 5503297 3.2344247
7 Luxembourg 18 590667 3.0474023
8 Germany 2451 82521653 2.9701295
9 Netherlands 474 17081507 2.7749308
10 Monaco 1 37550 2.6631158
11 Denmark 144 5748769 2.5048841
12 Sweden 237 9995153 2.3711493
13 Belgium 264 11351727 2.3256373
14 Estonia 30 1315635 2.2802677
15 United Kingdom 1385 65808573 2.1045890
16 Czech Republic 210 10578820 1.9850985
17 Ireland 93 4784383 1.9438243
18 Italy 1095 60589445 1.8072455
19 France 1097 66989083 1.6375803
20 Slovenia 33 2065895 1.5973706
21 Cyprus 12 854802 1.4038339
22 Hungary 134 9797561 1.3676873
23 Portugal 109 10309573 1.0572698
24 Spain 479 46528024 1.0294871
25 Greece 110 10768193 1.0215270
26 Croatia 28 4154213 0.6740145
27 Georgia 23 3718200 0.6185789
28 Latvia 12 1950116 0.6153480
29 Slovakia 32 5435343 0.5887393
30 Poland 187 37972964 0.4924556
31 Malta 2 460297 0.4345021
32 Serbia 27 7040272 0.3835079
33 Romania 74 19644350 0.3766986
34 Bulgaria 23 7101859 0.3238589
35 Turkey 142 79814871 0.1779117
36 Lithuania 3 2847904 0.1053406
37 Macedonia, The Former Yugoslav Republic Of 2 2073702 0.0964459
38 Azerbaijan 9 9809981 0.0917433
39 Armenia 2 2986151 0.0669758
40 Albania 1 2876591 0.0347634
41 Belarus 3 9504704 0.0315633
42 Bosnia And Herzegovina 1 3509728 0.0284922
43 Ukraine 12 42414905 0.0282919

Turns out Germany is above average, but not “overrepresented” as much as other countries such as Iceland, Austria, Switzerland, Norway, Andorra, Finland.

What did we learn? ¯\_(ツ)_/¯ Nothing exciting, really, just that I will be more relaxed about meeting “German” colleagues at future EGUs, because while it seems we’re crushing the place in numbers, we are not.

The number that really matters for EGU as an international conference remains 106 - the number of countries represented by EGU attendees.

Metadata

Participant data courtesy of EGU/Copernicus (https://ego2018.eu).

Population data courtesy of European Union via Eurostat.

sessionInfo()
## R version 3.4.4 (2018-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2.2  knitr_1.20      stringr_1.3.0   rvest_0.3.2    
## [5] xml2_1.2.0      dplyr_0.7.4     lubridate_1.7.3 eurostat_3.1.5 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16       highr_0.6          compiler_3.4.4    
##  [4] pillar_1.2.1       RColorBrewer_1.1-2 bindr_0.1.1       
##  [7] class_7.3-14       tools_3.4.4        digest_0.6.15     
## [10] jsonlite_1.5       evaluate_0.10.1    tibble_1.4.2      
## [13] lattice_0.20-35    pkgconfig_2.0.1    rlang_0.2.0       
## [16] curl_3.2           yaml_2.1.18        e1071_1.6-8       
## [19] httr_1.3.1         hms_0.4.2          tidyselect_0.2.4  
## [22] classInt_0.1-24    rprojroot_1.3-2    grid_3.4.4        
## [25] glue_1.2.0         R6_2.2.2           rmarkdown_1.9     
## [28] sp_1.2-7           selectr_0.4-1      readr_1.1.1       
## [31] tidyr_0.8.0        purrr_0.2.4        magrittr_1.5      
## [34] backports_1.1.2    htmltools_0.3.6    assertthat_0.2.0  
## [37] stringi_1.1.7