In the interests of full disclosure: I have been, and continue to be, personally affected by the cladding and building safety crisis.
Last week I put together an infographic (shown above) which I hoped would help with the campaigns for a comprehensive Government intervention to resolve the cladding and building safety crisis. The map within the infographic was picked up by BBC News, and was shown very briefly in the background of a feature on the News at Ten (Thursday 29th April).
This media coverage was a success in a very narrow personal sense, and obviously took place in the context of huge disappointment for the cladding/building safety campaign with the Fire Safety Bill passing parliament without financial protection for leaseholder. But, in this context I did want to document this small success. In particular, I wanted to document what my objectives were with the map visualisation, and also outline the limitations of the data plotted on the map. Primarily, this is for my own portfolio, but also just in case it is of interest to anyone else working on mapping the building safety crisis.
The primary intended audience of the infographic was people involved in the cladding and building safety campaign. I hoped it would help people to see the national scale of the issues. The secondary audience was policy-makers and politicians involved in addressing the crisis, who might see the infographic if it was used widely by campaigners. More specifically, my objectives were:
To produce a visual engaging and impactful infographic, which could be understood by the audiences identified above;
To give a sense of the scale and geographic spread of the cladding and building safety crisis based the coarse grain data which been made available by Government ;
To highlight that many of the applications to the Government’s Building Safety Fund are from large cities (London, Manchester, Birmingham etc.).
Moving on to the limitations of the data used and the map itself. There is relatively little data openly available from Government on the scale and geographic spread of the Building Safety Crisis. An issue that some campaigners have been addressing recently by crowdsourcing data on affected buildings. With the data being provided directly by the leaseholders involved, who can face bills ranging from £10,000 to £100,000 plus to rectify fire safety issues in their buildings.
Some data is available from Government which includes the number of applications that have been made to its Building Safety Fund. In the map above, I used the ‘Building Safety Fund: registration by local authority’ data. More specifically, the version of this data associated with a release made on 5th March 2021. It looks like the version of data I used has been replace with an updated version. Unfortunately, I can’t find the exact version of the data I used on the MCHLG website anymore, so I can’t provide a link here.
The data itself is very simple. Just a list of local authorities and count of the number of applications made to the Building Safety Fund from which each local authority.
| Local Authority | Count |
|---|---|
| Barking and Dagenham | 24 |
| Barnet | 37 |
| … | … |
| Worcestershire | 8 |
Given the data I was working, the map itself has the following main limitations.
The map doesn’t attempt to plot exact locations of buildings that have applied. This is because the data from the Government just lists the number of applications to the BSF per Local Authority area (as discussed above). I think there are some security concerns within Government around releasing more geographically specific data.
As a result, on the map I produced in each Local Authority the appropriate number of applications (i.e. dots) are shown at randomly selected locations. So, one can’t look for a specific building that has applied to the Building Safety Fund on the map.
The exact location where each point is plotted with each Local Authority varies each time the code is run. This is because the locations allocated to each application within each Local Authority are sampled at random.
I was pleased with that final map figure met the objectives I had started with: giving a sense of the scale and scope of the issue; and, highlighting that many of the applications come from large cities.
I had tried other approaches to visualizing the data. For example, I tried a choropleth map approach but found this made the building safety issues look rather small for two reasons. First, applications for BSF disproportionately come from smaller, urban local authorities. So, when viewed from a national scale the affected areas appear small relative to large rural local authorities (where no buildings have applied to the BSF). Secondly, the heavily right-skewed distribution of local authority application counts made colour scaling challenging.
I also tried plotting a single dot for each local authority, with the size and colour of each dot scaled by the number of application made to the BSF. See plot below. However, I there were problems with overplotting and the over map lacked visual impact.
I was also pleased with the way in which I combined map with the nested area chart.
In retrospect I could have spent more time on the refining/simplifying the colour palette used in the infographic and the approach to labeling of the cities with the most BSF applications.
Below, is the (R) code I wrote to produce the map (just in case any one is interested) …
# import packages used in this notebook
suppressPackageStartupMessages({
suppressWarnings({
# for data wrangling
library(tidyverse)
# for working with geospatial data
library(sf)
# for background layers for the map plot
library(rnaturalearth)
library(rnaturalearthdata)
# for styling ggplot map output
library(ggthemes)
})})
#******************************************************************************
# READ IN AND PROCESS DATA REQUIRED TO CREATE THE PLOT
#******************************************************************************
# get background layer for the map from a package
england <- ne_states(geounit = "england", returnclass = "sf")
# read MCLG in data (counts of BSF applications)
# cross check with .csv - there should be 81 rows (excluding column names)
la_data <- read_csv("MHCLG_BSF_and_ACM_BSF_by_Local_Authority.csv", skip = 1)
# read in shapefiles for LA boundaries
boundaries_lat_long <- st_read("boundaries_2019/Counties_and_Unitary_Authorities_(December_2019)_Boundaries_UK_BUC.shp") %>%
st_transform(4326) # set crs to match the background map
## Reading layer `Counties_and_Unitary_Authorities_(December_2019)_Boundaries_UK_BUC' from data source `C:\Users\chris\Desktop\Data Analysis Projects\bsf_viz\boundaries_2019\Counties_and_Unitary_Authorities_(December_2019)_Boundaries_UK_BUC.shp' using driver `ESRI Shapefile'
## Simple feature collection with 216 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -116.1928 ymin: 7054.1 xmax: 655653.8 ymax: 1218618
## Projected CRS: OSGB 1936 / British National Grid
# merge LA boundaries and BSF application counts
bsd_geo <- merge(boundaries_lat_long,
# rename variable to so there is a shared key between the two
# datasets
la_data %>%
rename(ctyua19nm = `Local Authority`)
) %>%
arrange(Count) %>%
# include to check that for a single Local Authority points are approx.
# within boundaries of the Local Authority
# filter(ctyua19nm == "Brighton and Hove") %>%
{.}
# confirm number of local authorities in merged dataset
# there should be 80 features / 80 unique Local Authority names
# (rather than 81 as the total in la_data won't match a geometry in the LA boundaries dataset)
bsd_geo %>%
select(ctyua19nm) %>%
as_tibble() %>%
distinct(ctyua19nm)
## # A tibble: 80 x 1
## ctyua19nm
## <chr>
## 1 Bath and North East Somerset
## 2 Enfield
## 3 Milton Keynes
## 4 Sunderland
## 5 Wiltshire
## 6 Wokingham
## 7 Calderdale
## 8 Devon
## 9 Gateshead
## 10 Kirklees
## # ... with 70 more rows
# confirm that data for key variables is complete
bsd_geo %>%
skimr::skim(ctyua19nm, Count)
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| ctyua19nm | 0 | 1 | 4 | 35 | 0 | 80 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Count | 0 | 1 | 34.5 | 43.8 | 5 | 8.75 | 21.5 | 36.25 | 293 | ▇▁▁▁▁ |
# confirm no NAs in geometry columns
# (as not sure if skimr work with geometries - see warning message)
print(glue::glue("Number of LAs with no geometry: {sum(is.na(bsd_geo$geometry))}"))
## Number of LAs with no geometry: 0
When randomly selecting locations within each Local Authority the follow message printed to the console “although coordinates are longitude/latitude, st_intersects assumes that they are planar”. Having read up on this, it does not appear to be any cause for concern as the message relates to the age old challenge of projecting 3-D geographic features to a 2-D diagram. Given the geographic features being sampled here are relatively small (local authorities), and the purpose of the plot illustrative, then choice of crs/projection was not particularly important and I settled with using a widely used standard (WGS84).
#******************************************************************************
# SIMULATE LOCATIONS OF BSF APPLICATION (WITH EACH LOCAL AUTHORITY)
#******************************************************************************
# create a vector of the application counts for each Local Authority
num_dots <- as.data.frame(bsd_geo) %>%
select(Count)
# confirm the total number of application is correct
print(glue::glue("Number of dots to be plotted: {sum(num_dots)}")) # should be 2760
## Number of dots to be plotted: 2760
# for each LA generate an appropriate number of random points within it's boundaries
# one point per application
buildings <- st_sample(bsd_geo, size = num_dots$Count, type = "random")
#******************************************************************************
# CONSTRUCT THE PLOT
#******************************************************************************
# print details of the layer ahead of plottin to confirm:
# the correct number of locations for applications have been simulated
# and all layers use the same CRS (WGS 84)
buildings # there should be 2760 features in the output
## Geometry set for 2760 features
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -4.364205 ymin: 50.3463 xmax: 1.625964 ymax: 55.06523
## Geodetic CRS: WGS 84
## First 5 geometries:
england
## Simple feature collection with 152 features and 83 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -6.348948 ymin: 49.90961 xmax: 1.771169 ymax: 55.80549
## CRS: +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## First 10 features:
## featurecla scalerank adm1_code diss_me iso_3166_2 wikipedia iso_a2
## 1695 Admin-1 scale rank 8 GBR-5707 5707 GB-CHW <NA> GB
## 1697 Admin-1 scale rank 8 GBR-2779 2779 GB-SHR <NA> GB
## 1699 Admin-1 scale rank 8 GBR-2775 2775 GB-HEF <NA> GB
## 1701 Admin-1 scale rank 8 GBR-2039 2039 GB-GLS <NA> GB
## 1703 Admin-1 scale rank 8 GBR-2034 2034 GB-NBL <NA> GB
## 1704 Admin-1 scale rank 8 GBR-2139 2139 GB-CMA <NA> GB
## 2282 Admin-1 scale rank 8 GBR-5662 5662 GB-NTY <NA> GB
## 2283 Admin-1 scale rank 8 GBR-5663 5663 GB-STY <NA> GB
## 2284 Admin-1 scale rank 8 GBR-5664 5664 GB-SND <NA> GB
## 2285 Admin-1 scale rank 8 GBR-2029 2029 GB-DUR <NA> GB
## adm0_sr name name_alt name_local
## 1695 1 Cheshire West and Chester <NA> <NA>
## 1697 1 Shropshire Salop <NA>
## 1699 1 Herefordshire <NA> <NA>
## 1701 1 Gloucestershire <NA> <NA>
## 1703 1 Northumberland <NA> <NA>
## 1704 1 Cumbria <NA> <NA>
## 2282 1 North Tyneside <NA> <NA>
## 2283 1 South Tyneside <NA> <NA>
## 2284 1 Sunderland <NA> <NA>
## 2285 1 Durham County Durham <NA>
## type type_en code_local code_hasc
## 1695 Unitary Authority Unitary Authority <NA> GB.CZ
## 1697 Unitary Single-Tier County Unitary Single-Tier County <NA> GB.SP
## 1699 Unitary Authority Unitary Authority <NA> GB.HE
## 1701 Administrative County Administrative County <NA> GB.GC
## 1703 Unitary Single-Tier County Unitary Single-Tier County <NA> GB.NB
## 1704 Administrative County Administrative County <NA> GB.CU
## 2282 Metropolitan Borough Metropolitan Borough <NA> GB.NI
## 2283 Metropolitan Borough Metropolitan Borough <NA> GB.SX
## 2284 Metropolitan Borough Metropolitan Borough <NA> GB.SD
## 2285 Unitary Single-Tier County Unitary Single-Tier County <NA> GB.DH
## note hasc_maybe region region_cod provnum_ne gadm_level check_me
## 1695 <NA> <NA> North West <NA> 20094 2 20
## 1697 <NA> <NA> West Midlands <NA> 20028 2 20
## 1699 <NA> <NA> West Midlands <NA> 20025 2 20
## 1701 <NA> <NA> South West <NA> 20084 2 20
## 1703 <NA> <NA> North East <NA> 20097 2 20
## 1704 <NA> <NA> North West <NA> 20095 2 20
## 2282 <NA> <NA> North East <NA> 3 2 20
## 2283 <NA> <NA> North East <NA> 3 2 20
## 2284 <NA> <NA> North East <NA> 3 2 20
## 2285 <NA> <NA> North East <NA> 3 2 20
## datarank abbrev postal area_sqkm sameascity labelrank name_len mapcolor9
## 1695 5 Ches CH 0 NA 9 25 6
## 1697 5 Shropshr SP 0 NA 9 10 6
## 1699 5 <NA> HE 0 NA 9 13 6
## 1701 5 <NA> GC 0 NA 9 15 6
## 1703 5 <NA> NB 0 NA 9 14 6
## 1704 5 Cumbria CU 0 NA 9 7 6
## 2282 5 <NA> TW 0 NA 9 14 6
## 2283 5 <NA> TW 0 NA 9 14 6
## 2284 5 <NA> TW 0 NA 9 10 6
## 2285 5 Durham DH 0 NA 9 6 6
## mapcolor13 fips fips_alt woe_id woe_label woe_name latitude
## 1695 3 UKZ8 UKC5 -12602157 <NA> Cheshire 53.1671
## 1697 3 UKL6 UKL6 12602188 <NA> Shropshire 52.6167
## 1699 3 UKF7 UKF7 12602187 <NA> Herefordshire 52.0807
## 1701 3 UKE6 UKE6 12602184 <NA> Gloucestershire 51.8153
## 1703 3 UKJ6 UKJ6 12602153 <NA> Northumberland 55.3152
## 1704 3 UKC9 UKC9 12602148 <NA> Cumbria 54.6368
## 2282 3 UKJ5 <NA> -12602156 <NA> Tyne and Wear 55.0184
## 2283 3 UKM7 <NA> -12602156 <NA> Tyne and Wear 54.9526
## 2284 3 UKN6 <NA> -12602156 <NA> Tyne and Wear 54.8617
## 2285 3 UKD8 UKD8 12602150 <NA> County Durham 54.6955
## longitude sov_a3 adm0_a3 adm0_label admin geonunit gu_a3 gn_id
## 1695 -2.76774 GB1 GBR 7 United Kingdom England ENG 7290537
## 1697 -2.72133 GB1 GBR 7 United Kingdom England ENG 2638655
## 1699 -2.73691 GB1 GBR 7 United Kingdom England ENG 2647071
## 1701 -2.14862 GB1 GBR 7 United Kingdom England ENG 2648402
## 1703 -2.07057 GB1 GBR 7 United Kingdom England ENG 2641235
## 1704 -2.89222 GB1 GBR 7 United Kingdom England ENG 2651712
## 2282 -1.48264 GB1 GBR 7 United Kingdom England ENG 2641238
## 2283 -1.44209 GB1 GBR 7 United Kingdom England ENG 3333199
## 2284 -1.46175 GB1 GBR 7 United Kingdom England ENG 3333205
## 2285 -1.79965 GB1 GBR 7 United Kingdom England ENG 2650629
## gn_name gns_id gns_name
## 1695 Cheshire West and Chester 10876240 Cheshire West and Chester
## 1697 Shropshire -2606991 Shropshire
## 1699 Herefordshire -2598551 Herefordshire, County of
## 1701 Gloucestershire -2597217 Gloucestershire, County of
## 1703 Northumberland -2604404 Northumberland, County of
## 1704 Cumbria -2593896 Cumbria, County of
## 2282 Borough of North Tyneside -2604401 North Tyneside, Borough of
## 2283 Borough of South Tyneside 6077641 South Tyneside, Borough of
## 2284 City and Borough of Sunderland 6077647 Sunderland, City and Borough of
## 2285 County Durham -2594982 Durham, County
## gn_level gn_region gn_a1_code region_sub sub_code gns_level gns_lang
## 1695 2 <NA> GB.Z8 Cheshire <NA> 1 eng
## 1697 2 <NA> GB.L6 Shropshire <NA> 1 cym
## 1699 2 <NA> GB.F7 Herefordshire <NA> 1 eng
## 1701 2 <NA> GB.E6 Gloucestershire <NA> 1 eng
## 1703 2 <NA> GB.J6 Northumberland <NA> 1 eng
## 1704 2 <NA> GB.C9 Cumbria <NA> 1 cym
## 2282 2 <NA> GB.J5 Tyne and Wear <NA> 1 eng
## 2283 2 <NA> GB.M7 Tyne and Wear <NA> 1 ind
## 2284 2 <NA> GB.N6 Tyne and Wear <NA> 1 ind
## 2285 2 <NA> GB.D8 Durham <NA> 1 eng
## gns_adm1 gns_region min_label max_label min_zoom wikidataid name_ar
## 1695 <NA> UK25 10 11 10 Q1070591 <NA>
## 1697 <NA> UK01 10 11 10 Q23103 <NA>
## 1699 <NA> UK01 10 11 10 Q23129 <NA>
## 1701 <NA> UK01 10 11 10 Q23165 <NA>
## 1703 <NA> UK01 10 11 10 Q23079 <NA>
## 1704 <NA> UK01 10 11 10 Q23066 <NA>
## 2282 <NA> UK01 10 11 10 Q1120443 <NA>
## 2283 <NA> UK35 10 11 10 Q1541228 <NA>
## 2284 <NA> UK35 10 11 10 Q1280897 <NA>
## 2285 <NA> UK01 10 11 10 Q23082 <NA>
## name_bn name_de name_en
## 1695 <NA> Cheshire West and Chester Cheshire West and Chester
## 1697 <NA> Shropshire Shropshire
## 1699 <NA> Herefordshire Herefordshire
## 1701 <NA> Gloucestershire Gloucestershire
## 1703 <NA> Northumberland Northumberland
## 1704 <NA> Cumbria Cumbria
## 2282 <NA> North Tyneside North Tyneside
## 2283 <NA> South Tyneside South Tyneside
## 2284 <NA> City of Sunderland City of Sunderland
## 2285 <NA> Durham County Durham
## name_es name_fr name_el name_hi
## 1695 Cheshire West and Chester Cheshire West and Chester <NA> <NA>
## 1697 Shropshire Shropshire <NA> <NA>
## 1699 Herefordshire Herefordshire <NA> <NA>
## 1701 Gloucestershire Gloucestershire <NA> <NA>
## 1703 Northumberland Northumberland <NA> <NA>
## 1704 Cumbria Cumbria <NA> <NA>
## 2282 North Tyneside North Tyneside <NA> <NA>
## 2283 <NA> South Tyneside <NA> <NA>
## 2284 <NA> cité de Sunderland <NA> <NA>
## 2285 Durham Durham <NA> <NA>
## name_hu name_id name_it name_ja name_ko
## 1695 <NA> <NA> Cheshire West and Chester <NA> <NA>
## 1697 Shropshire Shropshire Shropshire <NA> <NA>
## 1699 Herefordshire Herefordshire Herefordshire <NA> <NA>
## 1701 Gloucestershire Gloucestershire Gloucestershire <NA> <NA>
## 1703 Northumberland Northumberland Northumberland <NA> <NA>
## 1704 Cumbria Cumbria Cumbria <NA> <NA>
## 2282 North Tyneside <NA> North Tyneside <NA> <NA>
## 2283 South Tyneside <NA> South Tyneside <NA> <NA>
## 2284 <NA> <NA> City of Sunderland <NA> <NA>
## 2285 Durham County Durham Durham <NA> <NA>
## name_nl name_pl
## 1695 Cheshire West and Chester Cheshire West and Chester
## 1697 Shropshire Shropshire
## 1699 Herefordshire Herefordshire
## 1701 Gloucestershire Gloucestershire
## 1703 Northumberland Northumberland
## 1704 Cumbria Kumbria
## 2282 North Tyneside North Tyneside
## 2283 South Tyneside South Tyneside
## 2284 City of Sunderland City of Sunderland
## 2285 Durham Durham
## name_pt name_ru name_sv
## 1695 Cheshire West and Chester <NA> Cheshire West and Chester
## 1697 Shropshire <NA> Shropshire
## 1699 Herefordshire <NA> Herefordshire
## 1701 Gloucestershire <NA> Gloucestershire
## 1703 Northumberland <NA> Northumberland
## 1704 Cúmbria <NA> Cumbria
## 2282 North Tyneside <NA> Borough of North Tyneside
## 2283 <NA> <NA> South Tyneside
## 2284 <NA> <NA> Sunderland
## 2285 Durham <NA> Durham
## name_tr name_vi name_zh ne_id
## 1695 <NA> <NA> <NA> 1159317557
## 1697 Shropshire Shropshire <NA> 1159314805
## 1699 Herefordshire Herefordshire <NA> 1159314797
## 1701 Gloucestershire Gloucestershire <NA> 1159312871
## 1703 Northumberland Northumberland <NA> 1159312859
## 1704 Cumbria Cumbria <NA> 1159312957
## 2282 <NA> <NA> <NA> 1159317473
## 2283 <NA> <NA> <NA> 1159317475
## 2284 <NA> <NA> <NA> 1159317479
## 2285 Durham Kontlugu <NA> <NA> 1159312847
## geometry
## 1695 MULTIPOLYGON (((-2.956797 5...
## 1697 MULTIPOLYGON (((-3.14043 52...
## 1699 MULTIPOLYGON (((-3.05125 51...
## 1701 MULTIPOLYGON (((-2.665354 5...
## 1703 MULTIPOLYGON (((-2.685007 5...
## 1704 MULTIPOLYGON (((-2.850288 5...
## 2282 MULTIPOLYGON (((-1.455585 5...
## 2283 MULTIPOLYGON (((-1.412913 5...
## 2284 MULTIPOLYGON (((-1.360341 5...
## 2285 MULTIPOLYGON (((-1.34082 54...
boundaries_lat_long
## Simple feature collection with 216 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -8.650007 ymin: 49.88235 xmax: 1.76368 ymax: 60.84567
## Geodetic CRS: WGS 84
## First 10 features:
## objectid ctyua19cd ctyua19nm ctyua19nmw bng_e bng_n
## 1 1 E06000001 Hartlepool <NA> 447160 531474
## 2 2 E06000002 Middlesbrough <NA> 451141 516887
## 3 3 E06000003 Redcar and Cleveland <NA> 464361 519597
## 4 4 E06000004 Stockton-on-Tees <NA> 444940 518183
## 5 5 E06000005 Darlington <NA> 428029 515648
## 6 6 E06000006 Halton <NA> 354246 382146
## 7 7 E06000007 Warrington <NA> 362744 388456
## 8 8 E06000008 Blackburn with Darwen <NA> 369490 422806
## 9 9 E06000009 Blackpool <NA> 332819 436635
## 10 10 E06000010 Kingston upon Hull, City of <NA> 511894 431650
## long lat st_areasha st_lengths geometry
## 1 -1.27018 54.67614 96845510 50305.33 MULTIPOLYGON (((-1.24098 54...
## 2 -1.21099 54.54467 52908459 34964.41 MULTIPOLYGON (((-1.200884 5...
## 3 -1.00608 54.56752 248679106 83939.75 MULTIPOLYGON (((-1.197502 5...
## 4 -1.30664 54.55691 207159134 87075.86 MULTIPOLYGON (((-1.197502 5...
## 5 -1.56835 54.53534 198812771 91926.84 MULTIPOLYGON (((-1.696917 5...
## 6 -2.68853 53.33424 80285488 66618.91 MULTIPOLYGON (((-2.675188 5...
## 7 -2.56167 53.39163 178628104 71840.67 MULTIPOLYGON (((-2.576743 5...
## 8 -2.46360 53.70080 139386373 55099.15 MULTIPOLYGON (((-2.5513 53....
## 9 -3.02199 53.82164 33695602 30031.41 MULTIPOLYGON (((-3.047741 5...
## 10 -0.30382 53.76920 71407321 41191.84 MULTIPOLYGON (((-0.3312091 ...
ggplot() +
# plot background map
geom_sf(data = england, colour = "grey90", fill = "grey90") +
# option to superimpose LA boundaries to visually check that points create fall
# with boundaries
#geom_sf(data = boundaries_lat_long) +
# plot simulated locations of the applications
geom_sf(data = buildings, size = 1, alpha = 0.2) +
# use minimal map theme to remove visual clutter
theme_map() +
# set coordinate system
coord_sf()
#******************************************************************************
# COLOURING OF THIS PLOT AND ASSEMBLY OF THE INFOGRAPHIC
# COMPLETED IN AFFINITY DESIGNER
#******************************************************************************
ggsave("out.svg")
Finally, I wanted to identify the cities with the most applications to the BSF for labeling on the map. The required a little bit of coding (see below) as London is broken down into multiple local authorities.
# read in ons data set which provides a lookup between local authorities
# and regions
# in the language of UK admistrative geography London is considered a region
la_region <- read_csv("https://opendata.arcgis.com/datasets/6a41affae7e345a7b2b86602408ea8a2_0.csv")
# join the lookup with the BSF data
la_data_regions <- la_data %>%
left_join(la_region, by = c("Local Authority" = "LAD21NM")) %>%
filter(!is.na(`Local Authority`))
# identify total number of applications from London LAs
num_london_apps <- la_data_regions %>%
filter(RGN21NM == "London") %>%
summarise(london_total = sum(Count)) %>%
pull()
# create a new dataframe without London LAs
la_data_regions_simp <- la_data_regions %>%
filter(RGN21NM != "London" | is.na(RGN21NM)) %>%
select(`Local Authority`, Count) %>%
# add a row for London (as a single entity)
add_row(`Local Authority` = "London", Count = num_london_apps)
# confirm after aggregating counts for London Local Authorities the
# total number of applications is as expected
sum(la_data_regions_simp$Count) # should be 2760
## [1] 2760
# output the top 6 cities by number of applications to the BSF
top_cites <- la_data_regions_simp %>%
slice_max(Count, n = 6)
top_cites
## # A tibble: 6 x 2
## `Local Authority` Count
## <chr> <dbl>
## 1 London 1630
## 2 Manchester 144
## 3 Birmingham 110
## 4 Leeds 70
## 5 Salford 56
## 6 Liverpool 53