As a German at the European Geosciences Union General Assembly (EGU GA) 2018, you sometimes feel like there are just a lot of people from Germany at the conference. I sometimes feel bad for everybody else as we happily chat away in German, not always taking notice of excluding others from a conversation. But is that really true? There are a lot of Germans in Europe, and many research institutions with scientsts from all over the world. So let’s load some packages and take a look at the data!
library("eurostat")
library("lubridate")
library("dplyr")
library("rvest")
library("stringr")
library("knitr")
Download popoulation data from Eurostat using the eurostat package.
(We could probably do better here and only take into account working population…)
code_pop <- eurostat::search_eurostat("Population on 1 January", type = "table") %>%
head(n = 1) %>% .$code
data_pop <- get_eurostat(code_pop, lang = "en", cache = FALSE)
data_pop_2017 <- data_pop %>%
filter(year(time) == 2017) %>%
label_eurostat(lang = "en") %>%
select(geo, values)
# adjust country names to match EGU website
data_pop_2017 <- data_pop_2017 %>%
mutate(country = case_when(
grepl(x = .$geo, pattern = ".*Macedonia*") ~ "Macedonia, The Former Yugoslav Republic Of",
grepl(x = .$geo, pattern = "Bosnia.*") ~ "Bosnia And Herzegovina",
grepl(x = .$geo, pattern = "Germany.*") ~ "Germany",
TRUE ~ as.character(geo)
)
) %>%
select(country, inhabitants = values)
webpage <- read_html("https://egu2018.eu/")
participant_stats <- html_table(webpage)[[1]]
names(participant_stats) <- c("country", "participants")
participant_stats <- participant_stats %>%
mutate(participants = str_replace_all(participant_stats$participants, "[\\.,\\,]", "")) %>%
mutate(participants = as.integer(participants))
kable(head(participant_stats))
| country | participants |
|---|---|
| Germany | 2451 |
| United Kingdom | 1385 |
| France | 1097 |
| Italy | 1095 |
| United States | 957 |
| China | 756 |
EGU’s participant data supports the personal impression: Germany is the country with most participants at the GA, beating United Kingdom with a comfortable margin of 1066 people.
participant count / 100 000 inhabitantsdata <- left_join(participant_stats, data_pop_2017, by = "country") %>%
filter(!is.na(inhabitants)) %>%
mutate(`participants/100 000 inhabitants` = participants/inhabitants * 100000)
options(scipen = 999)
options(knitr.kable.NA = '')
kable(summary(data), caption = "Summary statistics of EGU participant data")
| country | participants | inhabitants | participants/100 000 inhabitants | |
|---|---|---|---|---|
| Length:43 | Min. : 1.0 | Min. : 37550 | Min. : 0.02829 | |
| Class :character | 1st Qu.: 12.0 | 1st Qu.: 2862248 | 1st Qu.: 0.38010 | |
| Mode :character | Median : 35.0 | Median : 7040272 | Median : 1.36769 | |
| Mean : 253.4 | Mean :16032527 | Mean : 1.93544 | ||
| 3rd Qu.: 223.5 | 3rd Qu.:11059960 | 3rd Qu.: 2.43802 | ||
| Max. :2451.0 | Max. :82521653 | Max. :10.34435 |
The average number of participants per 100 000 inhabitants is 1.9354365.
data_arranged <- data %>% arrange(desc(`participants/100 000 inhabitants`))
kable(data_arranged, row.names = TRUE)
| country | participants | inhabitants | participants/100 000 inhabitants | |
|---|---|---|---|---|
| 1 | Iceland | 35 | 338349 | 10.3443486 |
| 2 | Austria | 734 | 8772865 | 8.3667080 |
| 3 | Switzerland | 655 | 8419550 | 7.7795132 |
| 4 | Norway | 333 | 5258317 | 6.3328247 |
| 5 | Andorra | 3 | 73105 | 4.1036865 |
| 6 | Finland | 178 | 5503297 | 3.2344247 |
| 7 | Luxembourg | 18 | 590667 | 3.0474023 |
| 8 | Germany | 2451 | 82521653 | 2.9701295 |
| 9 | Netherlands | 474 | 17081507 | 2.7749308 |
| 10 | Monaco | 1 | 37550 | 2.6631158 |
| 11 | Denmark | 144 | 5748769 | 2.5048841 |
| 12 | Sweden | 237 | 9995153 | 2.3711493 |
| 13 | Belgium | 264 | 11351727 | 2.3256373 |
| 14 | Estonia | 30 | 1315635 | 2.2802677 |
| 15 | United Kingdom | 1385 | 65808573 | 2.1045890 |
| 16 | Czech Republic | 210 | 10578820 | 1.9850985 |
| 17 | Ireland | 93 | 4784383 | 1.9438243 |
| 18 | Italy | 1095 | 60589445 | 1.8072455 |
| 19 | France | 1097 | 66989083 | 1.6375803 |
| 20 | Slovenia | 33 | 2065895 | 1.5973706 |
| 21 | Cyprus | 12 | 854802 | 1.4038339 |
| 22 | Hungary | 134 | 9797561 | 1.3676873 |
| 23 | Portugal | 109 | 10309573 | 1.0572698 |
| 24 | Spain | 479 | 46528024 | 1.0294871 |
| 25 | Greece | 110 | 10768193 | 1.0215270 |
| 26 | Croatia | 28 | 4154213 | 0.6740145 |
| 27 | Georgia | 23 | 3718200 | 0.6185789 |
| 28 | Latvia | 12 | 1950116 | 0.6153480 |
| 29 | Slovakia | 32 | 5435343 | 0.5887393 |
| 30 | Poland | 187 | 37972964 | 0.4924556 |
| 31 | Malta | 2 | 460297 | 0.4345021 |
| 32 | Serbia | 27 | 7040272 | 0.3835079 |
| 33 | Romania | 74 | 19644350 | 0.3766986 |
| 34 | Bulgaria | 23 | 7101859 | 0.3238589 |
| 35 | Turkey | 142 | 79814871 | 0.1779117 |
| 36 | Lithuania | 3 | 2847904 | 0.1053406 |
| 37 | Macedonia, The Former Yugoslav Republic Of | 2 | 2073702 | 0.0964459 |
| 38 | Azerbaijan | 9 | 9809981 | 0.0917433 |
| 39 | Armenia | 2 | 2986151 | 0.0669758 |
| 40 | Albania | 1 | 2876591 | 0.0347634 |
| 41 | Belarus | 3 | 9504704 | 0.0315633 |
| 42 | Bosnia And Herzegovina | 1 | 3509728 | 0.0284922 |
| 43 | Ukraine | 12 | 42414905 | 0.0282919 |
Turns out Germany is above average, but not “overrepresented” as much as other countries such as Iceland, Austria, Switzerland, Norway, Andorra, Finland.
What did we learn? ¯\_(ツ)_/¯ Nothing exciting, really, just that I will be more relaxed about meeting “German” colleagues at future EGUs, because while it seems we’re crushing the place in numbers, we are not.
The number that really matters for EGU as an international conference remains 106 - the number of countries represented by EGU attendees.
Participant data courtesy of EGU/Copernicus (https://ego2018.eu).
Population data courtesy of European Union via Eurostat.
sessionInfo()
## R version 3.4.4 (2018-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bindrcpp_0.2.2 knitr_1.20 stringr_1.3.0 rvest_0.3.2
## [5] xml2_1.2.0 dplyr_0.7.4 lubridate_1.7.3 eurostat_3.1.5
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.16 highr_0.6 compiler_3.4.4
## [4] pillar_1.2.1 RColorBrewer_1.1-2 bindr_0.1.1
## [7] class_7.3-14 tools_3.4.4 digest_0.6.15
## [10] jsonlite_1.5 evaluate_0.10.1 tibble_1.4.2
## [13] lattice_0.20-35 pkgconfig_2.0.1 rlang_0.2.0
## [16] curl_3.2 yaml_2.1.18 e1071_1.6-8
## [19] httr_1.3.1 hms_0.4.2 tidyselect_0.2.4
## [22] classInt_0.1-24 rprojroot_1.3-2 grid_3.4.4
## [25] glue_1.2.0 R6_2.2.2 rmarkdown_1.9
## [28] sp_1.2-7 selectr_0.4-1 readr_1.1.1
## [31] tidyr_0.8.0 purrr_0.2.4 magrittr_1.5
## [34] backports_1.1.2 htmltools_0.3.6 assertthat_0.2.0
## [37] stringi_1.1.7