Comparison of basins sizes delineated by HydroBASINS and GAGESII

The hydroBASINS data were first filtered to remove any basins greater that 84,944 km2, which is the maximum size of a HUC8 in the Watershed Boundary Dataset. We could also use 3,504, which is the median size, but I chose to be more inclusive.

Comparison of basin sizes

Removing basins where the size difference is over a certain threshold as a percentage of the basin size from the GAGESII dataset

Threshold of 10% size cutoff

Threshold of 50% size cutoff

With both 10 and 50% cutoff, we get rid of pretty much all of our smallest basins (< ~65km2), with 10% cutoff we also get rid of almost all (90%) of our basins < ~155 km2

Basins retained by area at different cutoff thresholds

Mapping basins

Highlight some gages with a difference in size above 10% to see what is causing the descrepency. The map below only maps a few basins to keep the size of the document down, but these are the broad categories that I saw.

selected_hydrobasin_basins <- readRDS('basin_compare_data/selected_hydrobasin_basins.RDS')
selected_gagesII_basins <- readRDS('basin_compare_data/selected_gagesII_basins.RDS')
selected_gagesII_sites <- readRDS('basin_compare_data/selected_gagesII_sites.RDS')

highlight_gage <- '10249280' #gage inside hydrobasin basins
highlight_gage <- '11239300'
highlight_gage <- '15478040'
highlight_gage <- '01617800' #huge descrepency hydrobasins really big
highlight_gage <- '1237200'


highlight_gage <- '05481300' #hydrobasins much smaller that gagesII
highlight_gage <- '12323770' #non-overlapping regions
highlight_gage <- '05551200'

#in hydrobasins but not in gagesII?
highlight_gage <- '210166029' 
highlight_gage <- '208732885'


highlight_gages <- c('10249280', '11239300', '01617800', '05481300', '12323770', '210166029', '208732885', '15478040', '1237200', '05551200')
highlight_gages_pad <- stringr::str_pad(highlight_gages, 8, pad = '0', side = 'left')

If we assume we’ll use a size threshold of 10%, let’s look at the worst 100 basins that still meet that threshold. These basins all have 7-10% difference in size

Data question/clarification

Within the combined_watershed_metadata_01JUL2025.csv file there are many-to-one matches between Downstream_HB_ID_q and gage_id_q (_q was added by me to keep track of where the columns were coming from).

sf_meta <- read_csv('../data/google drive/combined_watershed_metadata_01JUL2025.csv')

q_meta <- sf_meta %>%
  #select the basins where the data have been pulled
  filter(processing_status == 'success') %>%
  #use _q to keep track of the columns that come from this dataset
  rename_with(~ paste0(.x, "_q")) %>%
  #change to character for merge
  mutate(Downstream_HB_ID_q = as.character(Downstream_HB_ID_q)) %>%
  filter(basin_area_q < 84944)

many_to_one <- q_meta %>%
  group_by(Downstream_HB_ID_q) %>%
  summarize(n_gages_per_basin = n()) %>%
  filter(n_gages_per_basin != 1) %>%
  arrange(desc(n_gages_per_basin))
print(many_to_one)
## # A tibble: 228 × 2
##    Downstream_HB_ID_q n_gages_per_basin
##    <chr>                          <int>
##  1 7120064850                         5
##  2 7120344410                         5
##  3 7121010700                         5
##  4 7121074570                         5
##  5 7120400820                         4
##  6 7120500490                         4
##  7 7120543030                         4
##  8 7120570480                         4
##  9 7120607270                         4
## 10 7120657880                         4
## # ℹ 218 more rows