We are interested in the geographic breakdown of “elite” sports clubs in France (along with Corsica, excluding French overseas territories).
Only team sports are considered here (football, handball, rugby, basketball, volleyball, ice hockey).
By “elite” club, I mean clubs that are evolving in first or second division of each relevant sport (even if some sports don’t really have a second division) ; also, “elite” does not necessarly mean professional, as some of these clubs are still amateurs (especially true for women championships).
library(tidyverse)
library(ggmap)
library(sf)
library(tmap)
library(formattable)
I created a quick csv
file, listing the cities where an “elite” club is present for different sports.
# Import data
clubs <- read_csv("clubs_pro_france.csv")
# quick glance at the data
head(sample_n(tbl = clubs, size = nrow(clubs)))
## # A tibble: 6 x 4
## Ville Sport Division Categorie
## <chr> <chr> <int> <chr>
## 1 Strasbourg Hockey 2 H
## 2 Toulouse Handball 1 H
## 3 Reims Basket 1 H
## 4 Toulon Handball 1 F
## 5 Villard de Lans Hockey 2 H
## 6 Chalon en Champagne Hockey 2 H
We have 4 variables:
Ville
: french city where the “elite” club is locatedSport
: sport (Football, Handball, Basketball, Rugby, Volleyball, Ice Hockey)Division
: league (1st division or 2nd division)Categorie
: gender (men (H
) or women (F
))We will need to transform the previous data, in order to be able to map them correctly.
Geocoding of the cities from clubs
dataframe.
We are adding ‘France’ after the cities, to be sure we are geocoding the correct city (with the exception of ‘Monaco’, which is not a French city, but has teams in some championships).
# find longitude and latitude for all cities listed in 'clubs'
clubs_coords <- sapply(X = clubs[, "Ville"],
FUN = function(x) geocode(location = paste(x, "France"),
source = "dsk", messaging = FALSE))
# find longitude and latitude for "Monaco"
monaco_coords <- geocode(location = "Monaco, Monaco", source = "dsk", messaging = FALSE)
Add coordinates to the dataframe.
# add coordinates to the dataframe
clubs_ok <- clubs %>%
mutate(long = clubs_coords[[1]], lat = clubs_coords[[2]]) %>%
# specific case for "Monaco
mutate(long = if_else(Ville == "Monaco", monaco_coords[1] %>% pull(), long),
lat = if_else(Ville == "Monaco", monaco_coords[2] %>% pull(), lat))
In order to map the data, we have to transform it into a sf
object.
# transform into 'sf' object, using a CRS equal to 4326
clubs_sf <- st_as_sf(x = clubs_ok,
coords = c("long","lat"), crs = 4326)
head(clubs_sf) # geometry column has been created
## Simple feature collection with 6 features and 4 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -1.55336 ymin: 43.29695 xmax: 7.41903 ymax: 47.46667
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 6 x 5
## Ville Sport Division Categorie geometry
## <chr> <chr> <int> <chr> <POINT [°]>
## 1 Marseille Football 1 H (5.38107 43.29695)
## 2 Toulouse Football 1 H (1.44367 43.60426)
## 3 Nantes Football 1 H (-1.55336 47.21725)
## 4 Monaco Football 1 H (7.41903 43.73141)
## 5 Angers Football 1 H (-0.55 47.46667)
## 6 Nimes Football 1 H (4.35 43.83333)
# quick plot of the new 'sf' object
plot(clubs_sf["Sport"])
# same plot using ggplot
ggplot(clubs_sf) +
geom_sf()
In order to visualize the data from clubs
into a France map, we have to create this 2nd map.
These data were extracted from the GADM database.
# import background map
fr_map <- readRDS("gadm36_FRA_1_sf.rds")
head(fr_map)
## Simple feature collection with 6 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -5.143751 ymin: 41.33375 xmax: 9.560416 ymax: 50.16764
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## GID_0 NAME_0 GID_1 NAME_1 VARNAME_1 NL_NAME_1 TYPE_1
## 1 FRA France FRA.1_1 Auvergne-Rhône-Alpes <NA> <NA> Région
## 6 FRA France FRA.2_1 Bourgogne-Franche-Comté <NA> <NA> Région
## 7 FRA France FRA.3_1 Bretagne <NA> <NA> Région
## 8 FRA France FRA.4_1 Centre-Val de Loire <NA> <NA> Région
## 9 FRA France FRA.5_1 Corse Corsica <NA> Région
## 10 FRA France FRA.6_1 Grand Est <NA> <NA> Région
## ENGTYPE_1 CC_1 HASC_1 geometry
## 1 Region <NA> FR.AR MULTIPOLYGON (((2.22057 44....
## 6 Region <NA> FR.BF MULTIPOLYGON (((5.894627 46...
## 7 Region <NA> FR.BT MULTIPOLYGON (((-2.83514 47...
## 8 Region <NA> FR.CN MULTIPOLYGON (((1.12372 46....
## 9 Region <NA> FR.CE MULTIPOLYGON (((9.25764 41....
## 10 Region <NA> FR.AO MULTIPOLYGON (((4.299289 47...
# plot the map
tm_shape(fr_map) +
tm_polygons(col = "tan") +
tm_layout(bg.color = "skyblue",
main.title = "Metropolitan France (including Corsica)",
main.title.size = 1.2,
main.title.position = "center",
fontfamily = "Bookman"
)
Before plotting the map, we will transform the CRS into an unprojected CRS for both data/map.
# actual CRS: => projected CRS
st_crs(clubs_sf)
## Coordinate Reference System:
## EPSG: 4326
## proj4string: "+proj=longlat +datum=WGS84 +no_defs"
st_crs(fr_map)
## Coordinate Reference System:
## EPSG: 4326
## proj4string: "+proj=longlat +datum=WGS84 +no_defs"
# transform CRS to unprojected CRS ; use CRS code = 2154 (projected coordinate system for "France" : https://epsg.io/?q=France%20kind%3APROJCRS)
clubs_sf_proj <- st_transform(x = clubs_sf, crs = 2154)
fr_map_proj <- st_transform(x = fr_map, crs = 2154)
We can now plot our data.
For example, plot all cities where there is (at least) one “elite” club.
map1 <- tm_shape(fr_map_proj) +
tm_borders() +
tm_shape(clubs_sf_proj) +
tm_dots(col = "Sport", size = 0.3, jitter = 0.1) +
tm_layout(main.title = "Which sport in which city?",
main.title.position = "center",
main.title.size = 1.2,
fontfamily = "Bookman"
)
map1
This is not really readable, we can plot the number of clubs per city.
# using number of clubs per city to size the dots
clubs_size <- clubs_sf_proj %>%
count(Ville)
map2 <- tm_shape(fr_map_proj) +
tm_borders() +
tm_shape(clubs_size) +
tm_dots(col = "n", size = "n", scale = 2, alpha = 0.7, style = "cat",
title = "",
palette = tmaptools::get_brewer_pal("Dark2", n = 8, plot = FALSE),
legend.size.show = FALSE
) +
tm_layout(main.title = 'Number of "elite" clubs per city',
main.title.position = "center",
main.title.size = 1.2,
fontfamily = "Bookman"
)
map2
# arranging both maps side by side
tmap_arrange(map1, map2, nrow = 1)
Plot “elite” cities by gender.
map3 <- tm_shape(fr_map_proj) +
tm_borders() +
tm_shape(clubs_sf_proj) +
tm_dots(col = "Categorie", size = 0.2, jitter = 0.1, palette = c("hotpink1", "dodgerblue1")) +
tm_facets(by = "Categorie", as.layers = TRUE) +
tm_layout(legend.show = FALSE,
panel.labels = c("Women", "Men")
)
map3
Join both spatial data, to compute statistics by region.
# join both data/map
all_data <- st_intersection(clubs_sf_proj, fr_map_proj)
head(all_data)
## Simple feature collection with 6 features and 14 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 706704.2 ymin: 6455601 xmax: 913392.9 ymax: 6569740
## epsg (SRID): 2154
## proj4string: +proj=lcc +lat_1=49 +lat_2=44 +lat_0=46.5 +lon_0=3 +x_0=700000 +y_0=6600000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## # A tibble: 6 x 15
## Ville Sport Division Categorie GID_0 NAME_0 GID_1 NAME_1 VARNAME_1
## <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Sain… Foot… 1 H FRA France FRA.… Auver… <NA>
## 2 Lyon Foot… 1 H FRA France FRA.… Auver… <NA>
## 3 Gren… Foot… 2 H FRA France FRA.… Auver… <NA>
## 4 Cler… Foot… 2 H FRA France FRA.… Auver… <NA>
## 5 Bour… Bask… 1 H FRA France FRA.… Auver… <NA>
## 6 Vill… Bask… 1 H FRA France FRA.… Auver… <NA>
## # ... with 6 more variables: NL_NAME_1 <chr>, TYPE_1 <chr>,
## # ENGTYPE_1 <chr>, CC_1 <chr>, HASC_1 <chr>, geometry <POINT [m]>
The data from France map has now been added to data from clubs
.
# Remove geometry and compute how many clubs per region
clubs_region <- all_data %>%
# Remove geometry from dataframe
st_set_geometry(NULL) %>%
# create factor in order to use 'complete'
mutate(NAME_1 = as.factor(NAME_1),
Categorie = as.factor(Categorie)) %>%
count(NAME_1, Categorie) %>%
# add and keep value '0' where we have no count
complete(NAME_1, Categorie, fill = list(n = 0)) %>%
# add total counts per region, for display in future plot
add_count(NAME_1, wt = n) %>%
rename(Total = nn)
clubs_region
## # A tibble: 26 x 4
## NAME_1 Categorie n Total
## <fct> <fct> <dbl> <dbl>
## 1 Auvergne-Rhône-Alpes F 4 33
## 2 Auvergne-Rhône-Alpes H 29 33
## 3 Bourgogne-Franche-Comté F 4 11
## 4 Bourgogne-Franche-Comté H 7 11
## 5 Bretagne F 4 14
## 6 Bretagne H 10 14
## 7 Centre-Val de Loire F 3 12
## 8 Centre-Val de Loire H 9 12
## 9 Corse F 0 3
## 10 Corse H 3 3
## # ... with 16 more rows
How many “elite” clubs per region?
ggplot(clubs_region, aes(x = reorder(NAME_1, n), y = n, fill = Categorie)) +
geom_col(width = 0.7, position = "dodge") +
scale_y_continuous(breaks = seq(0, 35, 5)) +
# change legend title and texts
scale_fill_manual(labels = c("Women", "Men"), values = c("hotpink1", "dodgerblue1")) +
coord_flip(expand = 0, ylim = c(-3, 30)) +
# add total counts per region
geom_text(aes(label = paste0("(", Total, ")")),
y = -1.6,
family = "Bookman",
fontface = "italic",
col = "sienna4",
alpha = 0.8) +
labs(title = 'More "elite" clubs in Southern France',
x = "", y = 'Number of "elite" clubs',
fill = "Gender") +
theme(plot.title = element_text(family = "Bookman", face = "bold.italic", size = 18,
hjust = 0.9, vjust = 5), plot.margin = margin(10, 2, 5, 0, "mm"),
axis.ticks.y = element_blank(),
axis.text.y = element_text(family = "Bookman", face = "italic", size = 11, colour = "sienna4"),
axis.text.x = element_text(family = "Bookman", face = "italic", size = 8, colour = "sienna4"),
axis.title.x = element_text(family = "Bookman", face = "italic", size = 9, colour = "sienna4"),
panel.background = element_blank(),
panel.grid.major.x = element_line(linetype = "dashed", colour = "grey90", size = 0.5),
legend.title = element_text(family = "Bookman", face = "italic", colour = "sienna4"),
legend.text = element_text(family = "Bookman", face = "italic", colour = "sienna4")
)
Ratio between number of “elite” clubs and total population per region.
(Population data is from INSEE web site)
# Import data for population per region
fr_pop <- readxl::read_xls("estim-pop-nreg-sexe-aq-1975-2018.xls", sheet = "2018", range = "A5:V18")
head(fr_pop)
## # A tibble: 6 x 22
## X__1 `0 à 4 ans` `5 à 9 ans` `10 à 14 ans` `15 à 19 ans` `20 à 24 ans`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Auve… 460097 506417 505548 497434 457304
## 2 Bour… 143511 164296 170767 168188 141824
## 3 Bret… 170095 200975 209705 209957 174775
## 4 Cent… 137424 158048 161842 156152 127016
## 5 Corse 15501 18379 17550 17735 15893
## 6 Gran… 293508 328163 331942 335903 313173
## # ... with 16 more variables: `25 à 29 ans` <dbl>, `30 à 34 ans` <dbl>,
## # `35 à 39 ans` <dbl>, `40 à 44 ans` <dbl>, `45 à 49 ans` <dbl>, `50 à
## # 54 ans` <dbl>, `55 à 59 ans` <dbl>, `60 à 64 ans` <dbl>, `65 à 69
## # ans` <dbl>, `70 à 74 ans` <dbl>, `75 à 79 ans` <dbl>, `80 à 84
## # ans` <dbl>, `85 à 89 ans` <dbl>, `90 à 94 ans` <dbl>, `95 ans et
## # plus` <dbl>, Total <dbl>
# only keep variables of interest
fr_pop_ok <- fr_pop %>%
select(X__1, Total)
# change the values of region names, in order to match those from 'all_data'
fr_pop_ok[, 1] <- all_data %>% st_set_geometry(NULL) %>% distinct(NAME_1)
fr_pop_ok
## # A tibble: 13 x 2
## X__1 Total
## <chr> <dbl>
## 1 Auvergne-Rhône-Alpes 8037059
## 2 Bourgogne-Franche-Comté 2813289
## 3 Bretagne 3336643
## 4 Centre-Val de Loire 2582522
## 5 Corse 337796
## 6 Grand Est 5548090
## 7 Hauts-de-France 6023336
## 8 Île-de-France 12246234
## 9 Normandie 3342467
## 10 Nouvelle-Aquitaine 5994336
## 11 Occitanie 5903190
## 12 Pays de la Loire 3787411
## 13 Provence-Alpes-Côte d'Azur 5065723
Join the data.
clubs_pop <- clubs_region %>%
count(NAME_1, wt = n) %>%
left_join(fr_pop_ok, by = c("NAME_1" = "X__1")) %>%
# compute number of clubs per 100000 inhbitants
mutate(per100000 = round(nn / Total * 100000, 2)) %>%
# change the format of 'Population' for nice display in the table
mutate(Total = format(Total, big.mark = " ", trim = TRUE)) %>%
arrange(desc(per100000)) %>%
# Rename for table display
select(Region = NAME_1, Population = Total, `# of clubs` = nn, `Per 100 000 capita` = per100000)
clubs_pop
## # A tibble: 13 x 4
## Region Population `# of clubs` `Per 100 000 capita`
## <chr> <chr> <dbl> <dbl>
## 1 Corse 337 796 3 0.89
## 2 Centre-Val de Loire 2 582 522 12 0.46
## 3 Occitanie 5 903 190 27 0.46
## 4 Provence-Alpes-Côte d'Azur 5 065 723 22 0.43
## 5 Bretagne 3 336 643 14 0.42
## 6 Auvergne-Rhône-Alpes 8 037 059 33 0.41
## 7 Grand Est 5 548 090 22 0.4
## 8 Pays de la Loire 3 787 411 15 0.4
## 9 Bourgogne-Franche-Comté 2 813 289 11 0.39
## 10 Nouvelle-Aquitaine 5 994 336 21 0.35
## 11 Normandie 3 342 467 11 0.33
## 12 Hauts-de-France 6 023 336 18 0.3
## 13 Île-de-France 12 246 234 26 0.21
Pretty table (using formattable
package).
format_table(x = clubs_pop,
format = "html",
align = c("l", "c", "c", "l"),
formatters = list(
`Per 100 000 capita` = color_bar("pink")
)) %>%
kableExtra::kable_styling(bootstrap_options = "striped", full_width = F, position = "left") %>%
kableExtra::column_spec(column = 1, border_right = TRUE) %>%
kableExtra::column_spec(column = 4, italic = TRUE)
Region | Population | # of clubs | Per 100 000 capita |
---|---|---|---|
Corse | 337 796 | 3 | 0.89 |
Centre-Val de Loire | 2 582 522 | 12 | 0.46 |
Occitanie | 5 903 190 | 27 | 0.46 |
Provence-Alpes-Côte d’Azur | 5 065 723 | 22 | 0.43 |
Bretagne | 3 336 643 | 14 | 0.42 |
Auvergne-Rhône-Alpes | 8 037 059 | 33 | 0.41 |
Grand Est | 5 548 090 | 22 | 0.40 |
Pays de la Loire | 3 787 411 | 15 | 0.40 |
Bourgogne-Franche-Comté | 2 813 289 | 11 | 0.39 |
Nouvelle-Aquitaine | 5 994 336 | 21 | 0.35 |
Normandie | 3 342 467 | 11 | 0.33 |
Hauts-de-France | 6 023 336 | 18 | 0.30 |
Île-de-France | 12 246 234 | 26 | 0.21 |