The hydroBASINS data were first filtered to remove any basins greater that 84,944 km2, which is the maximum size of a HUC8 in the Watershed Boundary Dataset. We could also use 3,504, which is the median size, but I chose to be more inclusive.
Removing basins where the size difference is over a certain threshold as a percentage of the basin size from the GAGESII dataset
With both 10 and 50% cutoff, we get rid of pretty much all of our smallest basins (< ~65km2), with 10% cutoff we also get rid of almost all (90%) of our basins < ~155 km2
Highlight some gages with a difference in size above 10% to see what is causing the descrepency. The map below only maps a few basins to keep the size of the document down, but these are the broad categories that I saw.
selected_hydrobasin_basins <- readRDS('basin_compare_data/selected_hydrobasin_basins.RDS')
selected_gagesII_basins <- readRDS('basin_compare_data/selected_gagesII_basins.RDS')
selected_gagesII_sites <- readRDS('basin_compare_data/selected_gagesII_sites.RDS')
highlight_gage <- '10249280' #gage inside hydrobasin basins
highlight_gage <- '11239300'
highlight_gage <- '15478040'
highlight_gage <- '01617800' #huge descrepency hydrobasins really big
highlight_gage <- '1237200'
highlight_gage <- '05481300' #hydrobasins much smaller that gagesII
highlight_gage <- '12323770' #non-overlapping regions
highlight_gage <- '05551200'
#in hydrobasins but not in gagesII?
highlight_gage <- '210166029'
highlight_gage <- '208732885'
highlight_gages <- c('10249280', '11239300', '01617800', '05481300', '12323770', '210166029', '208732885', '15478040', '1237200', '05551200')
highlight_gages_pad <- stringr::str_pad(highlight_gages, 8, pad = '0', side = 'left')
If we assume we’ll use a size threshold of 10%, let’s look at the worst 100 basins that still meet that threshold. These basins all have 7-10% difference in size