InfraDensity

Summary

We recently received a data source called InfrDensity from the European Commission. This data was generated by the Joint Research Centre in a Freshwater Ecosystem Assessment. The full report is available here. This metric was generated using a combination of Corine Land Cover and the density of aquatic infrastructure based on the OpenStreetMap datasource. Publicly, these data are available at the NUTS Classification, however that is not useful for us. As such, we received this datasource at a finer scale.

This document details the datasource we received and potential ways we can use it for a European-wide tool in the NAVIDIV project. As an outline I’ll show the datasource, and walk through how I think we should use it.

# Packages
library(tidyverse)
library(sf)
library(mapview)

Lets fist take a look at the datasource they provided

infra <- st_read("../InfrDensity/infrDensity.gdb/a00000009.gdbtable")

## Reading layer `InfrDensity' from data source 
##   `C:\Users\Aaron Sexton\Desktop\InfrDensity\infrDensity.gdb\a00000009.gdbtable' 
##   using driver `OpenFileGDB'
## Simple feature collection with 311369 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 2647600 ymin: 1462200 xmax: 6787300 ymax: 5380300
## Projected CRS: ETRS89-extended / LAEA Europe

ggplot() + geom_sf(data = infra) + aes()

Pan-European Cover. Can’t make an interactive map with this because the file is too large So can show you still images of the map zoomed in a bit so that we can see the coverage.

Very fine scale polygons.

When we zoom in even more and add a 1km buffer around each study site, we can see an issue. Within each 1km buffer are multiple polygons with multiple values.

In QGIS I clipped this layer at 1km surrounding each study site in the Seine. below is a look into that csv file

infraden <- read.csv("../InfrDensity/infrdensity_seine_1k.csv")
head(infraden)

##   OBJECTID HydroID CatchmentID AreaSqKm InfrDensity Shape_Length Shape_Area fid
## 1    65098 8088365      337714     0.50        1.07         4200     500000  20
## 2    65098 8088365      337714     0.50        1.07         4200     500000  21
## 3    65098 8088365      337714     0.50        1.07         4200     500000 146
## 4    65104 8088432      337867     3.20        0.97        11400    3200000  21
## 5    65137 8088687      338435     0.78       12.62         5000     780000  46
## 6    65137 8088687      338435     0.78       12.62         5000     780000 178
##              study station       size
## 1    Seine_Fish_AJ 3080003  392285.62
## 2    Seine_Fish_AJ 3080036  135374.35
## 3 Seine_Inverts_ET 3149176  193254.56
## 4    Seine_Fish_AJ 3080036 1743760.23
## 5    Seine_Fish_AJ 3134000   37270.52
## 6 Seine_Inverts_ET 3134000   37195.69

Generating One Value per Station

This gives us a table of all the polygons within the 1km buffer of each site Since there may be multiple values per polygon there are multiple rows of varying sizes and values for each site.
This is an issue, because we only want one value per site. So, lets do that.

Briefly, I took the proportion each polygon makes up of the total coverage within the 1km buffer, and multiplied the InfrDensity score by the proportion. This way the score is weighted by how present it is.

# Remove unnecessary columns
infraden2 <- dplyr::select(infraden, station, InfrDensity, size)
str(infraden2)

## 'data.frame':    503 obs. of  3 variables:
##  $ station    : int  3080003 3080036 3149176 3080036 3134000 3134000 3154000 3135000 3173000 3137000 ...
##  $ InfrDensity: num  1.07 1.07 1.07 0.97 12.62 ...
##  $ size       : num  392286 135374 193255 1743760 37271 ...

# Now lets create a column that just tells us the total size of riparian areas
# within the 1km buffer for each site. Because we may have meanders, or thick or
# skinny stretches.

infraden3 <- aggregate(infraden2$size, by = list(station = infraden2$station), FUN = sum)
head(infraden3)

##   station       x
## 1  909090  935949
## 2 3004095 1430207
## 3 3004349 4341862
## 4 3005200 2348466
## 5 3005999 2213552
## 6 3006000 3868853

infraden3 <- dplyr::left_join(infraden2, infraden3, by =c("station" = "station"))
infraden3 <- dplyr::rename(infraden3, totalsize = x)
head(infraden3)

##   station InfrDensity       size totalsize
## 1 3080003        1.07  392285.62   2214292
## 2 3080036        1.07  135374.35   2047613
## 3 3149176        1.07  193254.56   2094815
## 4 3080036        0.97 1743760.23   2047613
## 5 3134000       12.62   37270.52   3269968
## 6 3134000       12.62   37195.69   3269968

# Now lets calculate the proportion each little polygon makes up for the total
# size coverage of each 1km polygon
infraden3 <- dplyr::mutate(infraden3, proportion = size / totalsize)
head(infraden3)

##   station InfrDensity       size totalsize proportion
## 1 3080003        1.07  392285.62   2214292 0.17716072
## 2 3080036        1.07  135374.35   2047613 0.06611324
## 3 3149176        1.07  193254.56   2094815 0.09225374
## 4 3080036        0.97 1743760.23   2047613 0.85160627
## 5 3134000       12.62   37270.52   3269968 0.01139783
## 6 3134000       12.62   37195.69   3269968 0.01137494

# Now lets multiply the InfrDensity score of each polygon by the proportion cover
infraden3 <- dplyr::mutate(infraden3, weighted = InfrDensity * proportion)

# Then sum all the values for each site
infraden4 <- aggregate(infraden3$weighted, by = list(station = infraden3$station), FUN = sum)
infraden4 <- dplyr::rename(infraden4, cum_wtd_infrden = x)
head(infraden4)

##   station cum_wtd_infrden
## 1  909090        3.819643
## 2 3004095        4.553032
## 3 3004349        2.590000
## 4 3005200        3.443493
## 5 3005999        1.319771
## 6 3006000        1.320000

hist(sqrt(infraden4$cum_wtd_infrden))

So this now gives us one value of infrastructure density. It seems to make sense when I check it against the map, but that’s just spot checking, its not reproducible. So how do we ensure it is the correct way to do this?

May be better to standardize the process at the 25km stretch? With Cybill’s 25km buffer? Might make more sense at the European scale.

Let’s plot them out to visualize that

# Upload the sites from biodiversity collection with no data in them
sites_seine <- st_read("../InfrDensity/seine_pts.gpkg")

## Reading layer `seine_pts' from data source 
##   `C:\Users\Aaron Sexton\Desktop\InfrDensity\seine_pts.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 206 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 3651566 ymin: 2739610 xmax: 3963050 ymax: 2976726
## Projected CRS: ETRS89-extended / LAEA Europe

# Attach the infrdensity value we calculated
infra5 <- left_join(sites_seine, infraden4, by = c("station" = "station"))

# sqrt convert it to normality
infra5$sqrtd <- sqrt(infra5$cum_wtd_infrden)

# Plot it out 
mapview(infra5, zcol = "sqrtd")

Now lets calculate it based off watersheds

Calculating the weighted average of infrastructure density within the watershed makes sense both biologically and statistically.

We can use the HydroBASINS layer created by the HydroSHEDS group, which can be found here.

What is particularly great about this data source is that we can select the scale that makes the most sense - there are 12 levels of spatial fineness. Lets take a look at these different layers spatial size.

largebasin <- st_read("../InfrDensity/hybas_eu_lev02_v1c.shp")

## Reading layer `hybas_eu_lev02_v1c' from data source 
##   `C:\Users\Aaron Sexton\Desktop\InfrDensity\hybas_eu_lev02_v1c.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 9 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -24.54232 ymin: 12.59131 xmax: 69.55452 ymax: 81.85898
## Geodetic CRS:  WGS 84

ggplot() + geom_sf(data = largebasin) + aes()

Obviously too large a scale for us.

finebasin <- st_read("../InfrDensity/hybas_eu_lev12_v1c.shp")

## Reading layer `hybas_eu_lev12_v1c' from data source 
##   `C:\Users\Aaron Sexton\Desktop\InfrDensity\hybas_eu_lev12_v1c.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 137658 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -24.54232 ymin: 12.59131 xmax: 69.55452 ymax: 81.85898
## Geodetic CRS:  WGS 84

ggplot() + geom_sf(data = finebasin) + aes()+ xlim(-2, 9) + ylim(42, 51)

Probably too fine a scale for us.

Lets find the goldilocks layer.

goldbasin <- st_read("../InfrDensity/hybas_eu_lev07_v1c.shp")

## Reading layer `hybas_eu_lev07_v1c' from data source 
##   `C:\Users\Aaron Sexton\Desktop\InfrDensity\hybas_eu_lev07_v1c.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 7820 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -24.54232 ymin: 12.59131 xmax: 69.55452 ymax: 81.85898
## Geodetic CRS:  WGS 84

ggplot() + geom_sf(data = goldbasin) + aes() + xlim(-2, 9) + ylim(42, 51)

This seems great.

Now in QGIS I combined the HydroSHEDS layer with the InfrDensity layer so that it tells us which InfrDensity polygons are within each HydroSHEDS polygons so that we can run that same math as above.

infr_inhydro <- read.csv("../InfrDensity/infraden_in_hydrosheds.csv")
head(infr_inhydro)

##   OBJECTID HydroID CatchmentID AreaSqKm InfrDensity Shape_Length Shape_Area
## 1        1 8000004       18403     0.01       14.88          400      10000
## 2        2 8000017       18446     0.14        2.98         4200     140000
## 3        3 8000033       18469     1.84        1.25         9400    1840000
## 4        4 8000036       18476     0.28        1.32         3000     280000
## 5        5 8000051       18503     0.79        2.62         5000     790000
## 6        6 8000054       18512     0.10        0.00         3600     100000
##     HYBAS_ID NEXT_DOWN  NEXT_SINK   MAIN_BAS DIST_SINK DIST_MAIN SUB_AREA
## 1 2070027050         0 2070027050 2070027050         0         0   2221.3
## 2 2070033080         0 2070033080 2070033080         0         0   2588.5
## 3 2070033080         0 2070033080 2070033080         0         0   2588.5
## 4 2070032600         0 2070032600 2070032600         0         0   3429.9
## 5 2070032600         0 2070032600 2070032600         0         0   3429.9
## 6 2070032600         0 2070032600 2070032600         0         0   3429.9
##   UP_AREA PFAF_ID ENDO COAST ORDER SORT    size
## 1  2221.3 2427050    0     1     0 2375   10000
## 2  2588.5 2447902    0     0     1 2886  140000
## 3  2588.5 2447902    0     0     1 2886 1840000
## 4  3429.9 2447506    0     0     1 2878  280000
## 5  3429.9 2447506    0     0     1 2878  790000
## 6  3429.9 2447506    0     0     1 2878  100000

# Lets get ride of some unnecessary columns
infr_inhydro <- dplyr::select(infr_inhydro, InfrDensity, size, HYBAS_ID)
head(infr_inhydro)

##   InfrDensity    size   HYBAS_ID
## 1       14.88   10000 2070027050
## 2        2.98  140000 2070033080
## 3        1.25 1840000 2070033080
## 4        1.32  280000 2070032600
## 5        2.62  790000 2070032600
## 6        0.00  100000 2070032600

# Now lets sum the sizes of all the infrden polygons within each shed polygon
infrhydro2 <- aggregate(infr_inhydro$size, by = list(shed = infr_inhydro$HYBAS_ID), FUN = sum)
head(infrhydro2)

##         shed            x
## 1 2070001320  10289580.35
## 2 2070001360 206617398.24
## 3 2070001370     56663.07
## 4 2070001450   7408958.97
## 5 2070001460  83280775.02
## 6 2070001720  31264945.29

# Now we can add that back into the main df
infrhydro3 <- dplyr::left_join(infr_inhydro, infrhydro2, by = c("HYBAS_ID" = "shed"))
infrhydro3 <- dplyr::rename(infrhydro3, ripinshed = x)
head(infrhydro3)

##   InfrDensity    size   HYBAS_ID ripinshed
## 1       14.88   10000 2070027050  98088710
## 2        2.98  140000 2070033080 177751269
## 3        1.25 1840000 2070033080 177751269
## 4        1.32  280000 2070032600 575067545
## 5        2.62  790000 2070032600 575067545
## 6        0.00  100000 2070032600 575067545

# Now we calculate the proportion cover each infr polygon makes up for the total
# infr cover within each hydroshed
infrhydro4 <- dplyr::mutate(infrhydro3, proportion = size / ripinshed)
head(infrhydro4)

##   InfrDensity    size   HYBAS_ID ripinshed   proportion
## 1       14.88   10000 2070027050  98088710 0.0001019485
## 2        2.98  140000 2070033080 177751269 0.0007876174
## 3        1.25 1840000 2070033080 177751269 0.0103515435
## 4        1.32  280000 2070032600 575067545 0.0004868993
## 5        2.62  790000 2070032600 575067545 0.0013737517
## 6        0.00  100000 2070032600 575067545 0.0001738926

# Now we multiply the InfrDensity score of each polygon by the proportion cover
infrhydro4 <- dplyr::mutate(infrhydro4, weighted = InfrDensity * proportion)

# Then sum all the values for each watershed
infrhydro5 <- aggregate(infrhydro4$weighted, by = list(shed = infrhydro4$HYBAS_ID), FUN = sum)
head(infrhydro5)

##         shed            x
## 1 2070001320 0.0007815448
## 2 2070001360 0.7172531324
## 3 2070001370 0.0032223951
## 4 2070001450 0.3489788212
## 5 2070001460 2.5629695115
## 6 2070001720 0.9638749491

infrhydro5 <- dplyr::rename(infrhydro5, cum_wtd_infrden = x)
head(infrhydro5)

##         shed cum_wtd_infrden
## 1 2070001320    0.0007815448
## 2 2070001360    0.7172531324
## 3 2070001370    0.0032223951
## 4 2070001450    0.3489788212
## 5 2070001460    2.5629695115
## 6 2070001720    0.9638749491

#Put them back together
infrhydro4 <- dplyr::left_join(infrhydro4, infrhydro5, by = c("HYBAS_ID" = "shed"))

head(infrhydro4)

##   InfrDensity    size   HYBAS_ID ripinshed   proportion     weighted
## 1       14.88   10000 2070027050  98088710 0.0001019485 0.0015169942
## 2        2.98  140000 2070033080 177751269 0.0007876174 0.0023471000
## 3        1.25 1840000 2070033080 177751269 0.0103515435 0.0129394294
## 4        1.32  280000 2070032600 575067545 0.0004868993 0.0006427071
## 5        2.62  790000 2070032600 575067545 0.0013737517 0.0035992294
## 6        0.00  100000 2070032600 575067545 0.0001738926 0.0000000000
##   cum_wtd_infrden
## 1       0.7165765
## 2       2.0398237
## 3       2.0398237
## 4       1.0234892
## 5       1.0234892
## 6       1.0234892

# Looks nice!

Now lets plot it out to see how it looks spatially.

# We can combine the table we just made with the GIS layer we looked at earlier
# We'll keep just the hydroshed ID and the score we created to make the join clean
shedscore <- dplyr::select(infrhydro4, HYBAS_ID, cum_wtd_infrden)
shedscore$infraden <- sqrt(shedscore$cum_wtd_infrden)
shedscore <- dplyr::select(shedscore, HYBAS_ID, infraden)
str(shedscore)

## 'data.frame':    318215 obs. of  2 variables:
##  $ HYBAS_ID: int  2070027050 2070033080 2070033080 2070032600 2070032600 2070032600 2070032600 2070032280 2070025100 2070032600 ...
##  $ infraden: num  0.847 1.428 1.428 1.012 1.012 ...

str(goldbasin)

## Classes 'sf' and 'data.frame':   7820 obs. of  14 variables:
##  $ HYBAS_ID : num  2.07e+09 2.07e+09 2.07e+09 2.07e+09 2.07e+09 ...
##  $ NEXT_DOWN: num  0.00 0.00 0.00 0.00 2.07e+09 ...
##  $ NEXT_SINK: num  2.07e+09 2.07e+09 2.07e+09 2.07e+09 2.07e+09 ...
##  $ MAIN_BAS : num  2.07e+09 2.07e+09 2.07e+09 2.07e+09 2.07e+09 ...
##  $ DIST_SINK: num  0 0 0 0 21.2 ...
##  $ DIST_MAIN: num  0 0 0 0 21.2 ...
##  $ SUB_AREA : num  5.1 6264.2 7763.3 308.9 3612.7 ...
##  $ UP_AREA  : num  5.1 6264.2 7763.3 23856.5 3613.1 ...
##  $ PFAF_ID  : int  2110110 2110120 2110130 2110141 2110142 2110143 2110144 2110145 2110147 2110146 ...
##  $ ENDO     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ COAST    : int  1 0 1 0 0 0 0 0 0 0 ...
##  $ ORDER    : int  0 1 0 1 2 1 2 1 1 2 ...
##  $ SORT     : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ geometry :sfc_MULTIPOLYGON of length 7820; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:9, 1:2] 32.3 32.3 32.3 32.3 32.3 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:13] "HYBAS_ID" "NEXT_DOWN" "NEXT_SINK" "MAIN_BAS" ...

goldbasin2 <- dplyr::left_join(goldbasin, shedscore, by = c("HYBAS_ID" = "HYBAS_ID"))

goldbasin2 <- goldbasin2 %>% 
  drop_na(infraden)

goldbasin2 <- dplyr::select(goldbasin2, HYBAS_ID, infraden, geometry)

# Saving it locally - st_write(goldbasin2, "../InfrDensity/HSHEDS_infraden.gpkg")

iden <- st_read("../InfrDensity/HSHEDS_infraden.gpkg")

## Reading layer `HSHEDS_infraden' from data source 
##   `C:\Users\Aaron Sexton\Desktop\InfrDensity\HSHEDS_infraden.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 2416 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -10.66733 ymin: 34.63693 xmax: 41.01665 ymax: 71.18401
## Geodetic CRS:  WGS 84

hist(iden$infraden)

hist(sqrt(iden$infraden))

ggplot(data = iden) + geom_sf(data = iden, aes(fill = infraden)) + 
  scale_fill_steps(low = "royalblue2", high = "maroon", 
                   breaks = c(0.75, 1.25, 1.75, 2.25, 2.75))

Looks good. We likely will need to sqrt transform the data to normality, but this looks promising for a European scale assessment?

InfraDensity

Aaron Sexton

2022-05-24

Summary

Lets fist take a look at the datasource they provided

Generating One Value per Station

Now lets calculate it based off watersheds