Basin size comparison

Comparison of basins sizes delineated by HydroBASINS and GAGESII

The hydroBASINS data were first filtered to remove any basins greater that 84,944 km2, which is the maximum size of a HUC8 in the Watershed Boundary Dataset. We could also use 3,504, which is the median size, but I chose to be more inclusive.

Comparison of basin sizes

Removing basins where the size difference is over a certain threshold as a percentage of the basin size from the GAGESII dataset

Threshold of 10% size cutoff

Threshold of 50% size cutoff

With both 10 and 50% cutoff, we get rid of pretty much all of our smallest basins (< ~65km2), with 10% cutoff we also get rid of almost all (90%) of our basins < ~155 km2

Basins retained by area at different cutoff thresholds

Mapping basins

Highlight some gages with a difference in size above 10% to see what is causing the descrepency. The map below only maps a few basins to keep the size of the document down, but these are the broad categories that I saw.

selected_hydrobasin_basins <- readRDS('basin_compare_data/selected_hydrobasin_basins.RDS')
selected_gagesII_basins <- readRDS('basin_compare_data/selected_gagesII_basins.RDS')
selected_gagesII_sites <- readRDS('basin_compare_data/selected_gagesII_sites.RDS')

highlight_gage <- '10249280' #gage inside hydrobasin basins
highlight_gage <- '11239300'
highlight_gage <- '15478040'
highlight_gage <- '01617800' #huge descrepency hydrobasins really big
highlight_gage <- '1237200'


highlight_gage <- '05481300' #hydrobasins much smaller that gagesII
highlight_gage <- '12323770' #non-overlapping regions
highlight_gage <- '05551200'

#in hydrobasins but not in gagesII?
highlight_gage <- '210166029' 
highlight_gage <- '208732885'


highlight_gages <- c('10249280', '11239300', '01617800', '05481300', '12323770', '210166029', '208732885', '15478040', '1237200', '05551200')
highlight_gages_pad <- stringr::str_pad(highlight_gages, 8, pad = '0', side = 'left')

If we assume we’ll use a size threshold of 10%, let’s look at the worst 100 basins that still meet that threshold. These basins all have 7-10% difference in size

Data question/clarification

Within the combined_watershed_metadata_01JUL2025.csv file there are many-to-one matches between Downstream_HB_ID_q and gage_id_q (_q was added by me to keep track of where the columns were coming from).

sf_meta <- read_csv('../data/google drive/combined_watershed_metadata_01JUL2025.csv')

q_meta <- sf_meta %>%
  #select the basins where the data have been pulled
  filter(processing_status == 'success') %>%
  #use _q to keep track of the columns that come from this dataset
  rename_with(~ paste0(.x, "_q")) %>%
  #change to character for merge
  mutate(Downstream_HB_ID_q = as.character(Downstream_HB_ID_q)) %>%
  filter(basin_area_q < 84944)

many_to_one <- q_meta %>%
  group_by(Downstream_HB_ID_q) %>%
  summarize(n_gages_per_basin = n()) %>%
  filter(n_gages_per_basin != 1) %>%
  arrange(desc(n_gages_per_basin))

print(many_to_one)

## # A tibble: 228 × 2
##    Downstream_HB_ID_q n_gages_per_basin
##    <chr>                          <int>
##  1 7120064850                         5
##  2 7120344410                         5
##  3 7121010700                         5
##  4 7121074570                         5
##  5 7120400820                         4
##  6 7120500490                         4
##  7 7120543030                         4
##  8 7120570480                         4
##  9 7120607270                         4
## 10 7120657880                         4
## # ℹ 218 more rows