Shipping Validation

Summary

Lock data covering shipping traffic from the 90s to 2017. Spatial coverage is Wolters data

RICOMEX data is contemporary (2022). Spaital coverage is pan-European.

Goal is to link them spatially and see how well the RICOMEX data does in representing shipping data. We want to use this because it is pan-European and can give us detailed data (ship speed, size, frequency) that the lock data cannot provide. Additionally, the RICOMEX data has a finer spatial grain than the lock data for a majority of Europe in areas where locks are less dense.

# Packages
library(dplyr)
library(tidyverse)
library(sf)
library(ggplot2)
library(mapview)

Locks

Within the Large River Databse (LRDB) there are several files relevant to lock data

Locks_WSV - shipping frequency for each lock across entire temporal range

Locks_Stabu - multiple names for each lock and each year

Locks_Birk_300km - number of and type of ships passing each site/lock for a 300m stretch. Currently an issue with lock name

Locks_Birk_150km - same as above, at 150km

Locks_Birk_10km - same as above, at 10km

Occasion - biodiveristy data that lists each collection effort, what the
site code is, and what lock is associated with that site

Site - collection sites - has data on each siteID, collection ID, and lockID

Need to link them all - Site says where spatially the collection is. Then can link the collection to a lock. Then we can plot out shipping intensity

# Lets start with the shipping data for each lock provided
lockswsv <- read.csv("../NAVIDIV/Data/Wolters/Locks_WSV.csv")
str(lockswsv)

## 'data.frame':    814 obs. of  16 variables:
##  $ lock_WSV       : chr  "Abwinden" "Abwinden" "Abwinden" "Abwinden" ...
##  $ river          : chr  "Danube" "Danube" "Danube" "Danube" ...
##  $ year           : int  1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 ...
##  $ FRA            : int  14523 14523 14523 14523 14523 14523 14523 14523 14523 14523 ...
##  $ Lad_t          : int  9986502 9986502 9986502 9986502 9986502 9986502 10979904 11633673 12316450 10737354 ...
##  $ Tragf_t        : int  16804264 16804264 16804264 16804264 16804264 16804264 16804264 16804264 16804264 16804264 ...
##  $ FTK            : int  4392 4392 4392 4392 4392 4392 4392 4392 4392 4392 ...
##  $ SpB            : int  786 786 786 786 786 786 786 786 786 786 ...
##  $ Son            : int  221 221 221 221 221 221 221 221 221 221 ...
##  $ SpB_Son        : int  880 880 880 880 880 880 880 880 880 880 ...
##  $ FTK_SpB_Son    : int  5273 5273 5273 5273 5273 5273 5273 5273 5273 5273 ...
##  $ FRA_FTK_SpB_Son: int  19796 19796 19796 19796 19796 19796 19796 19796 19796 19796 ...
##  $ lock_ID_WSV    : chr  "lo_01" "lo_01" "lo_01" "lo_01" ...
##  $ WSV_ID         : chr  "lo_01_1994" "lo_01_1995" "lo_01_1996" "lo_01_1997" ...
##  $ Efficiency     : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ n_empty        : int  NA NA NA NA NA NA NA NA NA NA ...

# How many locks do we have data on?
length(unique(lockswsv$lock_WSV))

## [1] 37

# Each lock has data from 90s to today

# Lets take a look at how shipping traffic changed across the years
ggplot(lockswsv, aes(year, FRA_FTK_SpB_Son)) +
  geom_point()

Remarkably stable over time

# Lets color code it by river
ggplot(lockswsv, aes(year, FRA_FTK_SpB_Son, group = lock_WSV)) +
  geom_point() + 
  geom_point(aes(colour = river)) +
  geom_line()

Again we can see it is stable across time even within rivers.
Only difference really is spatially.

# Plotting the same thing with just lines for each lock
ggplot(lockswsv, aes(year, FRA_FTK_SpB_Son, color = lock_WSV)) +
  geom_line(size = 1.5) + 
  theme(legend.position = "none")

Biodivesity Data

Lets plot out the biodiversity collection sites spatially

# This is the sites package trimmed down to a geopackage
woltersites <- st_read("../NAVIDIV/GIS/Creation/outputs/wolter_pts.gpkg")

## Reading layer `wolter_pts' from data source 
##   `C:\Users\Aaron Sexton\Desktop\NAVIDIV\GIS\Creation\outputs\wolter_pts.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 826 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 3936968 ymin: 2396709 xmax: 5847602 ymax: 3483496
## Projected CRS: ETRS89-extended / LAEA Europe

# Simple ggplot
sitesgg <- ggplot() + geom_sf(data = woltersites) + aes()
sitesgg

Now lets make an interactive map

# interactive map
mapview(woltersites)

# Fish Collection Occasions
occasion <- read.csv("../NAVIDIV/Data/Wolters/Occassion.csv")
str(occasion)

## 'data.frame':    2737 obs. of  68 variables:
##  $ River_name_copy           : chr  "Elbe" "Elbe" "Elbe" "Elbe" ...
##  $ Site_name_copy            : chr  "Abstiegskanal" "Abstiegskanal" "Abstiegskanal" "Abstiegskanal" ...
##  $ Wetted_width              : int  60 60 60 60 280 280 280 280 280 280 ...
##  $ Site_code_LR              : chr  "DELR0399N" "DELR0399N" "DELR0399N" "DELR0399N" ...
##  $ Occ_laufnr                : int  3608 3609 3610 3611 3849 3846 3847 3848 3835 3840 ...
##  $ Dist_ocean_copy           : num  383 383 383 383 237 ...
##  $ Sampling_location         : chr  "Mixed" "Mixed" "Mixed" "Mixed" ...
##  $ Date                      : chr  "01/09/2000" "11/10/2001" "25/10/2002" "17/10/2003" ...
##  $ Yearocc                   : int  2000 2001 2002 2003 1999 1997 1997 1998 1997 1998 ...
##  $ Monthocc                  : int  9 10 10 10 6 6 9 6 9 8 ...
##  $ Method                    : chr  "Electro" "Electro" "Electro" "Electro" ...
##  $ Fishing_time              : chr  "Day" "Day" "Day" "Day" ...
##  $ species_caught            : int  11 11 10 10 8 9 11 7 15 15 ...
##  $ specimen_caught           : int  427 266 159 563 100 310 5424 54 1445 1132 ...
##  $ n_Ind_100m2               : chr  "142.33" "8.87" "3.18" "11.26" ...
##  $ n_Ind_ha                  : chr  "14,233.33" "886.67" "318.00" "1,126.00" ...
##  $ Fished_area               : chr  "300.00" "3,000.00" "5,000.00" "5,000.00" ...
##  $ Fished_length             : int  100 1000 1000 1000 -999 -999 -999 -999 980 1150 ...
##  $ Fished_width              : num  3 3 5 5 -999 -999 -999 -999 3 3 ...
##  $ Average_depth             : num  -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
##  $ Equipment                 : chr  "Boat" "Boat" "Boat" "Boat" ...
##  $ Sampling_strategy         : chr  "Partial" "Partial" "Partial" "Partial" ...
##  $ Runs_separated            : chr  "Separated" "Separated" "Separated" "Separated" ...
##  $ Number_of_runs            : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Dominating_substrate      : chr  NA NA NA NA ...
##  $ Stop_nets_used            : chr  NA NA NA NA ...
##  $ No_of_anodes              : chr  NA NA NA NA ...
##  $ Type_of_anode             : chr  "Ring" "Ring" "Stripe" "Stripe" ...
##  $ Size_of_anode             : num  -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
##  $ Type_of_current           : chr  "DC" "DC" "DC" "DC" ...
##  $ Voltage_used              : num  300 300 300 300 300 300 300 300 300 300 ...
##  $ Country                   : chr  "Germany" "Germany" "Germany" "Germany" ...
##  $ Equivalent                : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Reporter_code             : chr  "DEIGB" "DEIGB" "DEIGB" "DEIGB" ...
##  $ Fishdata_owner_code       : chr  "DE002" "DE002" "DE002" "DE002" ...
##  $ lock_ID                   : chr  "lo_35" "lo_35" "lo_35" "lo_35" ...
##  $ WSV_ID                    : chr  "lo_35_2000" "lo_35_2001" "lo_35_2002" "lo_35_2003" ...
##  $ Stabu_ID                  : chr  "lo_99_2000" "lo_99_2001" "lo_99_2002" "lo_99_2003" ...
##  $ Barriers_catchment_down   : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ Barriers_segment_up       : chr  "No" "No" "No" "No" ...
##  $ Barriers_segment_down     : chr  "No" "No" "No" "No" ...
##  $ Barriers_no_segment_up    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Barriers_no_segment_down  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Barriers_dist_segment_up  : num  -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
##  $ Barriers_dist_segment_down: num  -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
##  $ Impoundment               : chr  "No" "No" "No" "No" ...
##  $ Hydropeaking              : chr  "No" "No" "No" "No" ...
##  $ Water_abstraction         : chr  "Weak" "Weak" "Weak" "Weak" ...
##  $ Water_use                 : chr  "No" "No" "No" "No" ...
##  $ Hydro_mod                 : chr  "No" "No" "No" "No" ...
##  $ Temperature_impact        : chr  "No" "No" "No" "No" ...
##  $ Velocity_increase         : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ Reservoir_flushing        : chr  "No" "No" "No" "No" ...
##  $ Sedimentation             : chr  "No" "No" "No" "No" ...
##  $ Channelisation            : chr  "Straightened" "Straightened" "Straightened" "Straightened" ...
##  $ Cross_section             : chr  "Technical crossec/U-profile" "Technical crossec/U-profile" "Technical crossec/U-profile" "Technical crossec/U-profile" ...
##  $ Instream_habitat          : chr  "High" "High" "High" "High" ...
##  $ Riparian_vegetation       : chr  "High" "High" "High" "High" ...
##  $ Embankment                : chr  "Continous no permeability" "Continous no permeability" "Continous no permeability" "Continous no permeability" ...
##  $ Floodprotection           : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ Floodplain                : chr  "No" "No" "No" "No" ...
##  $ Toxic_substances          : chr  "Intermediate" "Intermediate" "Intermediate" "Intermediate" ...
##  $ Acidification             : chr  "No" "No" "No" "No" ...
##  $ Water_quality_index       : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Eutrophication            : chr  "Low" "Low" "Low" "Low" ...
##  $ Organic_pollution         : chr  "Weak" "Weak" "Weak" "Weak" ...
##  $ Organic_siltation         : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ Navigation                : chr  "Strong" "Strong" "Strong" "Strong" ...

# How many locks here?
length(unique(occasion$lock_ID))

## [1] 37

#  So the column lock_ID inside the df occasions matches the column
#  lock_WSV column in the df Locks_WSV (which has traffic data)  

# Lets trim Occasions for ease of plotting 
occasion2 <- dplyr::select(occasion, River_name_copy, Site_code_LR, Occ_laufnr,
                           Yearocc, lock_ID, WSV_ID, Stabu_ID)
str(occasion2)

## 'data.frame':    2737 obs. of  7 variables:
##  $ River_name_copy: chr  "Elbe" "Elbe" "Elbe" "Elbe" ...
##  $ Site_code_LR   : chr  "DELR0399N" "DELR0399N" "DELR0399N" "DELR0399N" ...
##  $ Occ_laufnr     : int  3608 3609 3610 3611 3849 3846 3847 3848 3835 3840 ...
##  $ Yearocc        : int  2000 2001 2002 2003 1999 1997 1997 1998 1997 1998 ...
##  $ lock_ID        : chr  "lo_35" "lo_35" "lo_35" "lo_35" ...
##  $ WSV_ID         : chr  "lo_35_2000" "lo_35_2001" "lo_35_2002" "lo_35_2003" ...
##  $ Stabu_ID       : chr  "lo_99_2000" "lo_99_2001" "lo_99_2002" "lo_99_2003" ...

length(unique(occasion2$Site_code_LR))

## [1] 842

length(unique(occasion2$WSV_ID)) # Value of locks x year (multiple years per lock)

## [1] 189

# Study sites (same as woltersites above, but a csv here)
studysites <- read.csv("../NAVIDIV/Data/Wolters/Sites_trimmed.csv")

# Combine studysites name, lat and long AND occasion2 locks
linking <- left_join(occasion2, studysites, c("Site_code_LR" = "Site_code_LR"))
str(linking)

## 'data.frame':    2737 obs. of  12 variables:
##  $ River_name_copy: chr  "Elbe" "Elbe" "Elbe" "Elbe" ...
##  $ Site_code_LR   : chr  "DELR0399N" "DELR0399N" "DELR0399N" "DELR0399N" ...
##  $ Occ_laufnr     : int  3608 3609 3610 3611 3849 3846 3847 3848 3835 3840 ...
##  $ Yearocc        : int  2000 2001 2002 2003 1999 1997 1997 1998 1997 1998 ...
##  $ lock_ID        : chr  "lo_35" "lo_35" "lo_35" "lo_35" ...
##  $ WSV_ID         : chr  "lo_35_2000" "lo_35_2001" "lo_35_2002" "lo_35_2003" ...
##  $ Stabu_ID       : chr  "lo_99_2000" "lo_99_2001" "lo_99_2002" "lo_99_2003" ...
##  $ River_name     : chr  "Elbe" "Elbe" "Elbe" "Elbe" ...
##  $ Site_name      : chr  "Abstiegskanal" "Abstiegskanal" "Abstiegskanal" "Abstiegskanal" ...
##  $ Y_Latitude     : num  52.2 52.2 52.2 52.2 53 ...
##  $ X_Longitude    : num  11.7 11.7 11.7 11.7 11.6 ...
##  $ Strahler       : int  7 7 7 7 7 7 7 7 7 7 ...

# Now lets add the column "FRA_FTK_SpB_Son" from Locks_WSV which gives 
# cumulative shipping per year per lock
sumshipdf <- dplyr::select(lockswsv, FRA_FTK_SpB_Son, lock_ID_WSV, WSV_ID)
str(sumshipdf)

## 'data.frame':    814 obs. of  3 variables:
##  $ FRA_FTK_SpB_Son: int  19796 19796 19796 19796 19796 19796 19796 19796 19796 19796 ...
##  $ lock_ID_WSV    : chr  "lo_01" "lo_01" "lo_01" "lo_01" ...
##  $ WSV_ID         : chr  "lo_01_1994" "lo_01_1995" "lo_01_1996" "lo_01_1997" ...

linking <- left_join(linking, sumshipdf, c("lock_ID" = "lock_ID_WSV", 
                                            "WSV_ID" = "WSV_ID"))
str(linking)

## 'data.frame':    2737 obs. of  13 variables:
##  $ River_name_copy: chr  "Elbe" "Elbe" "Elbe" "Elbe" ...
##  $ Site_code_LR   : chr  "DELR0399N" "DELR0399N" "DELR0399N" "DELR0399N" ...
##  $ Occ_laufnr     : int  3608 3609 3610 3611 3849 3846 3847 3848 3835 3840 ...
##  $ Yearocc        : int  2000 2001 2002 2003 1999 1997 1997 1998 1997 1998 ...
##  $ lock_ID        : chr  "lo_35" "lo_35" "lo_35" "lo_35" ...
##  $ WSV_ID         : chr  "lo_35_2000" "lo_35_2001" "lo_35_2002" "lo_35_2003" ...
##  $ Stabu_ID       : chr  "lo_99_2000" "lo_99_2001" "lo_99_2002" "lo_99_2003" ...
##  $ River_name     : chr  "Elbe" "Elbe" "Elbe" "Elbe" ...
##  $ Site_name      : chr  "Abstiegskanal" "Abstiegskanal" "Abstiegskanal" "Abstiegskanal" ...
##  $ Y_Latitude     : num  52.2 52.2 52.2 52.2 53 ...
##  $ X_Longitude    : num  11.7 11.7 11.7 11.7 11.6 ...
##  $ Strahler       : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ FRA_FTK_SpB_Son: int  17645 15078 14514 15436 37873 29557 29557 36800 29557 36800 ...

# Now lets save it as a spatial feature so that we can plot it
linking_sf <- st_as_sf(linking, coords = c("X_Longitude", "Y_Latitude"), crs = 4326)

# Lets change the name of that column to something sensible
linking_sf <- rename(linking_sf, Cumulative_Annual_Ships = FRA_FTK_SpB_Son)

# Lets plot it out
cumship_locks <- mapview(linking_sf, zcol = "Cumulative_Annual_Ships", legend = TRUE)
cumship_locks

Voila!

Whats nice about this is that it links the biodiveristy data with the lock shipping data from that year of collection.

# Can play around with colors if we want
mapviewOptions(vector.palette = c("plum1", "firebrick4", "black"))
mapview(linking_sf, zcol = "Cumulative_Annual_Ships", legend = TRUE)

For example, when we zoom in on two biodiversity collection sites and look at their metadata, we see that one uses data from a 2007 lock and the other from 1999.

However, there is currently an issue with missing data Locks lo_99, lo_88, and lo_30 in the “lock_ID_WSV” column have 0 values for all years. This comes from the Locks_WSV csv file.

lockswsv <- read.csv("../NAVIDIV/Data/Wolters/Locks_WSV.csv")

missinglocks <- dplyr::filter(lockswsv, lock_ID_WSV == c("lo_99", "lo_88",
                                                         "lo_30"))

## Warning in lock_ID_WSV == c("lo_99", "lo_88", "lo_30"): longer object length is
## not a multiple of shorter object length

These locks account for roughly ~30% of all the observations…

length(occasion$lock_ID)

## [1] 2737

length(occasion$lock_ID[occasion$lock_ID == 'lo_99'])

## [1] 646

length(occasion$lock_ID[occasion$lock_ID == 'lo_88'])

## [1] 105

length(occasion$lock_ID[occasion$lock_ID == 'lo_30'])

## [1] 38

(645 + 105 + 38) / 2737

## [1] 0.2879065

If you zoom in on the Danube, you can see a good amount of Zero values.

Currently working with CW to figure this out. Will update.

RICOMEX

Now working on plotting out the RICOMEX data. Currently only have March, but

once I receive April and the first half of May, I’ll plot them together.

For now, we can plot out March just to see spatial cover.

# For the sake of visualization, I'll just plot out one hour of one day.

marchpos1 <- st_read("../NAVIDIV/GIS/Shipping/March2022_Positions/28_03_22/28_03_22_16h.shp")

## Reading layer `28_03_22_16h' from data source 
##   `C:\Users\Aaron Sexton\Desktop\NAVIDIV\GIS\Shipping\March2022_Positions\28_03_22\28_03_22_16h.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 6660 features and 17 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -2.898483 ymin: 42.26117 xmax: 8.499942 ymax: 51.19984
## Geodetic CRS:  WGS 84

mapview(marchpos1)

Now lets plot them together.

mapview(marchpos1, color = "black", col.regions = "grey", lwd = 2, layer.name = "RICOMEX") +
  mapview(linking_sf, color = "black", col.regions = "pink", lwd = 2, layer.name = "CW")

We’ve got some good overlap along the upper-Rhine it looks like. Since they’re staked on top of eachother, if you want to turn on or off one of the layers, you can just hover over this button: on the map and then click the one you want off or change the basemap.

So, yes we have some overlap, but really I would like more if possible. Once we have the April and May data I’ll check that overlap as well. Then will keep everyone updated.