CUAHSI Virtual University Seminal Papers in Flood Hydrology Problem Set

Question 1-Big Floods in Big Thompson!

Question 1a

One key difference between the 1976 and 2013 Big Thompson floods was the scale and timing of the precipitation. For the 1976 event, the precipitation extent was highly localized, with rainfall concentrated in only the upper reaches of the watershed and falling over a shorter period of time. This older event appears to be much more of a localized flash flood type from early summer. This could in part explain why the 1976 event peak streamflow was not a particularly high flood peak relative to other years because all of the precipitation fell downstream of Estes Park where the gage is located. In contrast, for the 2013 event, the rainfall fell over a much larger area and was over the course of eight days in early fall. During the 2013 event, I was a senior in high school in Boulder, Colorado and our area was also heavily impacted by this precipitation event. For the more recent event, regional flooding was more the result of accumulated rainfall over time and space. I remember it being so strange to keep geting rain day after day in a way I had never experienced before. It was this experience that made me really interested in understanding floods and eventually led me to a PhD in hydrology focused on floods.

Question 1b

It appears that the 1982 Big Thompson flood of 5500 cfs was caused by a dam failure, as indicated by the USGS peak streamflow qualification code specifying “Discharge affected to unknown degree by Regulation or Diversion.” A quick google shows that this was the failure of the Lawn Lake earthen dam in Rocky Mountain National Park (https://history.denverlibrary.org/news/lawn-lake-flood). Jarrett and Costa (1988) provide evidence for a paleoflood event from 8-10k years ago with flows between 3,000-5,000 cfs. One potential issue with including either of these events in flood frequency analysis (FFA) would be a violation of the assumption of events being independent and identically distributed (iid). The dam failure flood clearly comes from a different process population and we don’t really know what caused the paleoflood, so including these floods could introduce a mixed population. Similarly, including these floods could violate the assumption of non-stationarity needed to conduct FFA. Including the 1982 flood might result what looks like an increasing trend in flood magnitude even though without this event, the Big Thompson River at Estes Park annual peak streamflow data visually appear stationary. Including a historic event from thousands of years ago could violate assumptions about stationarity of the climate between then and more modern times, introducing potential bias to the FFA.

Question 1c It appears that Big Thompson annual peak floods tend to occur around June. Yu et al. (2021) provide added details that these Front Range summertime flood events are commonly driven by warm moisture from the south brought by the monsoon and interacting with topography to cause orographic precipitation, with runoff amplified by steep slopes and shallow soils. Growing up in Boulder, Colorado, I remember getting frequent summertime afternoon thunderstorms that would occasionally come with a flash flood warning from the National Weather Service (NWS).

Question 2-The Times, They are a-Changing (or are they?)

# read in packages needed
library(tidyverse)
library(dataRetrieval)
library(EflowStats)

# load daily flood data
load("Q_daily.rda") 

# calculate water year, day of water year, and filter out partial wys
Q_daily <- Q_daily %>% 
  dplyr::mutate(date_numeric = as.numeric(date),
                wy = calcWaterYear(date),
                dowy = get_waterYearDay(date)) %>% 
  dplyr::filter(wy > 1946, wy < 2018) 

#calculate Spearman rank correlation for discharge vs. date
cor_q <- cor.test(Q_daily$date_numeric, Q_daily$discharge, method = "spearman")
#plot daily discharge time series with linear trendline
ggplot(Q_daily, aes(date, discharge)) +
  geom_line() +
  geom_smooth(method = "lm", se=F) +
  labs(y="Streamflow (m3/s)", x="Date", title="Daily Streamflow Time Series") +
  scale_y_log10() +
  theme_light()  +
  annotate(
    "text",
    x = min(Q_daily$date),
    y = min(Q_daily$discharge)+5,
    label = paste0(
      "Spearman Correlation Coefficient: ",
      round(cor_q$estimate, 2),
      "\nP-value: ",
      signif(cor_q$p.value, 2)
    ),
    hjust = 0,
    vjust = 1
  )

#calculate annual (not water year but shouldn't matter since peaks are in summer) maximums
Q_annual_max <- Q_daily %>% 
  group_by(wy) %>% 
  dplyr::mutate(discharge_max = max(discharge)) %>% 
  dplyr::filter(discharge == discharge_max) %>% #remove partial year and dam break year
  dplyr::mutate(date_numeric = as.numeric(date)) %>% 
  distinct(wy, .keep_all = TRUE)
#calculate Spearman rank correlation for discharge vs. date
cor_qmax <- cor.test(Q_annual_max$wy, Q_annual_max$discharge, method = "spearman")
#plot annual maximum time series
ggplot(Q_annual_max, aes(wy, discharge)) +
  geom_line() +
  geom_smooth(method = "lm", se=F) +
  labs(y="Streamflow (m3/s)", x="Date", title="Water Year Maximum Time Series") +
  # scale_y_log10() +
  theme_light() +
  annotate(
    "text",
    x = min(Q_annual_max$wy),
    y = 400,
    label = paste0(
      "Spearman Correlation Coefficient: ",
      round(cor_qmax$estimate, 2),
      "\nP-value: ",
      signif(cor_qmax$p.value, 2)
    ),
    hjust = 0,
    vjust = 1
  )

#calculate Spearman rank correlation for timing of annual max vs. year
cor_tmax <- cor.test(Q_annual_max$wy, Q_annual_max$dowy, method = "spearman")
#plot timing of annual maximum
ggplot(Q_annual_max, aes(wy, dowy)) +
  geom_line() +
  geom_smooth(method = "lm", se=F) +
  labs(, title="Water Year Maximum Seasonality") +
  # scale_y_log10() +
  theme_light() +
  annotate(
    "text",
    x = min(Q_annual_max$wy),
    y = max(Q_annual_max$dowy),
    label = paste0(
      "Spearman Correlation Coefficient: ",
      round(cor_tmax$estimate, 2),
      "\nP-value: ",
      signif(cor_tmax$p.value, 2)
    ),
    hjust = 0,
    vjust = 1
  )

#calculate wy annual minimums
Q_annual_min <- Q_daily %>% 
  group_by(wy) %>% 
  dplyr::mutate(discharge_min = min(discharge)) %>% 
  dplyr::filter(discharge == discharge_min) %>% 
  dplyr::mutate(date_numeric = as.numeric(date)) %>% 
  distinct(wy, .keep_all = TRUE)
#calculate Spearman rank correlation for discharge vs. date
cor_qmin <- cor.test(Q_annual_min$wy, Q_annual_min$dowy, method = "spearman")
#plot annual minimum time series
ggplot(Q_annual_min, aes(wy, discharge)) +
  geom_line() +
  geom_smooth(method = "lm", se=F) +
  labs(y="Streamflow (m3/s)", x="Date", title="Water Year Minimum Time Series") +
  # scale_y_log10() +
  theme_light() +
  annotate(
    "text",
    x = min(Q_annual_min$wy),
    y = min(Q_annual_min$discharge)+1,
    label = paste0(
      "Spearman Correlation Coefficient: ",
      round(cor_qmin$estimate, 2),
      "\nP-value: ",
      signif(cor_qmin$p.value, 2)
    ),
    hjust = 0,
    vjust = 1
  )

#calculate Spearman rank correlation for timing vs. date
cor_tmin <- cor.test(Q_annual_min$date_numeric, Q_annual_min$day, method = "spearman")
#plot annual minimum timing
ggplot(Q_annual_min, aes(wy, dowy)) +
  geom_line() +
  geom_smooth(method = "lm", se=F) +
  labs(title="Water Year Minimum Seasonality") +
  # scale_y_log10() +
  theme_light() +
  annotate(
    "text",
    x = min(Q_annual_min$wy),
    y = min(Q_annual_min$dowy)+300,
    label = paste0(
      "Spearman Correlation Coefficient: ",
      round(cor_tmin$estimate, 2),
      "\nP-value: ",
      signif(cor_tmin$p.value, 2)
    ),
    hjust = 0,
    vjust = 1
  )

#\\

Question 2a To detect trends in the streamflow data, I ran a series of Spearman rank tests. I decided to use a Spearman test because it is non-parametric and is more compatible with data like streamflow records which are not normally distributed. I removed the partial 1946 and 2018 water years (October 1 - September 30, wys) from the record. I calculated the wy maximum and minimums for discharge and the associated day of the water year for those extremes.

Spearman rank tests did find statistically significant (p-value < 0.10) trends in daily flows, indicating a slight increasing trend over time. It is worth noting that the correlation coefficient is very small (0.08). There were not statistically significant trends in wy maxima discharge or timing. There is a statistically significant decreasing trend in the magnitude of wy minimum flows but no statistically significant change in wy minima timing. The decreasing trend in low flows is visible in the plot of daily streamflow as well, with six years where low flows were less than 5 m3/s between wys 2007-2016 (half of the years). I would say that for the purposes of FFA, which focuses on annual maxima, the dataset is stationary.

Question 2b It would be interesting to look at changes to the inter- and intra-annual variability in streamflows. This could include calculating water year coefficient of variations and ranges from maxima to minima. It would also be interesting to break things up more seasonally to see if there are any trends in streamflow specific to a season (e.g. getting more winter streamflow with decreased storage in snowpack). This could be complimented by looking at the driving processes behind streamflow peaks and considering things like precipitation phase, temperature, timing, etc. and how those have changed over time. Finally, I would be interested to look at things like the slope of the rising and falling limbs of the hydrograph and time above baseflow for annual maxima to better understand if there are any characteristics of the annual flood events that are changing even if the peak flow magnitudes are not.

Question 3-Assume Crash Position, because Things are About to Get Extreme

#load in separated peaks data sets
load('peaksdata.RData')

#plot time series using ggplot
ggplot() +
  geom_point(data=snowpeaks, aes(date, discharge*0.0283,
             color = "Snow Peaks")) +
  geom_point(data=rainpeaks, aes(date, discharge*0.0283,
             color = "Rain Peaks")) +
  labs(y="Peak Discharge (m3/s)", x="Date") +
  ylim(c(0,100)) +
  theme_light() +
  scale_color_manual(
    values = c("Snow Peaks" = "black", "Rain Peaks" = "red")  # Define colors
  ) +
  theme(legend.title = element_blank(),
        legend.position = c(0.1, 0.9),
        legend.background = element_blank())

So there is your data, your mission.

Question 3a

# Fitting GEV distributions to both snowmelt and rainfall peaks
library(extRemes)

## Fitting the GEV to snowpeaks with MLE
snowGEV_MLE <- fevd(x=discharge, data=snowpeaks, units = "cfs")
plot(snowGEV_MLE, main = "Snow GEV with MLE")

## Fitting the GEV to rainpeaks with MLE
rainGEV_MLE <- fevd(x=discharge, data=rainpeaks, units = "cfs")
plot(rainGEV_MLE, main = "Rain GEV with MLE")

## Fitting the GEV to snowpeaks with L-Moments
snowGEV_LM <- fevd(x=discharge, data=snowpeaks, units = "cfs", method = "Lmoments")
plot(snowGEV_LM, main = "Snow GEV with L-Moments")

## Fitting the GEV to rainpeaks with L-Moments
rainGEV_LM <- fevd(x=discharge, data=rainpeaks, units = "cfs", , method = "Lmoments")
plot(rainGEV_LM, main = "Rain GEV with L-Moments")

Generally, I think the GEV works pretty well to characterize both the rain and snow peaks time series. When using the default parameter estimation method of maximum likelihood estimation (MLE), the resulting GEV fits are pretty good for snowmelt-driven floods since the empirical vs. model quantiles and the return period vs. return level fit most of the data very well. However, for the rainfall-driven peaks data fit with a GEV using MLE parameter estimation, the largest event in the time series with 2980 cfs from September 2013 is an outlier that falls well outside the confidence intervals of the FFA curve. When using the L-Moments parameter estimation, the GEV fit is significantly improved for the 2013 outlier in the rainfall population. It is important to note that this curve then changes dramatically, with upper tail event magnitudes much higher. The differences in the curve fit for different parameter estimation methods are much smaller for the snowmelt-driven population with the L-moments results looking very similar to the MLE results.

Question 3b

# Get the flood magnitudes for different return intervals (2,20,100 years)
return.level(snowGEV_LM)

## fevd(x = discharge, data = snowpeaks, method = "Lmoments", units = "cfs")
## get(paste("return.level.fevd.", newcl, sep = ""))(x = x, return.period = return.period)
## [1] "GEV model fitted to discharge (cfs)"
## Data are assumed to be  stationary 
## [1] "Covariate data = snowpeaks"
## [1] "Return Levels for period units in years"
##   2-year level  20-year level 100-year level 
##       845.8595      1560.8146      1922.0676

return.level(rainGEV_LM)

## fevd(x = discharge, data = rainpeaks, method = "Lmoments", units = "cfs")
## get(paste("return.level.fevd.", newcl, sep = ""))(x = x, return.period = return.period)
## [1] "GEV model fitted to discharge (cfs)"
## Data are assumed to be  stationary 
## [1] "Covariate data = rainpeaks"
## [1] "Return Levels for period units in years"
##   2-year level  20-year level 100-year level 
##       517.4502      1431.7720      2567.3600

For the snowmelt peaks data, the 2-year return level is 846 cfs, the 20-year is 1561 cfs, and the 100-year is 1992 cfs. In comparison, for the rain peaks data, the 2-year return level is 517 cfs, the 20-year is 1432 cfs, and the 100-year is 2567 cfs. So the flood magnitudes for higher frequency events for the snowmelt-driven floods higher than those for rainfall-driven floods. However, the flood magnitude for the low-frequency 100-year event is higher for the rainfall-driven population compared to the snowmelt-driven population.

Question 3c

# Compare values of location, scale, and shape parameters
snowLM_param <- snowGEV_LM$results
print("Snowmelt-Driven")

## [1] "Snowmelt-Driven"

print(snowLM_param)

##    location       scale       shape 
## 729.3594605 323.7449234  -0.1004036

rainLM_param <- rainGEV_LM$results
print("Rainfall-Driven")

## [1] "Rainfall-Driven"

print(rainLM_param)

##    location       scale       shape 
## 441.2583765 195.4717543   0.3325414

The snowmelt-driven population fit to a GEV curve using the L-moments parameter estimation method has a location parameter of 729, a scale parameter of 324, and a shape parameter of -0.10. For the rainfall-driven population, the location is 441, the scale is 195, and the shape is 0.33. Since the snowmelt population has a negative shape parameter, this suggests that this population of floods has an upper limit to their potential size which appears to be around 2300 cfs (where the plot from question 3a levels off). This helps to explain why even though the snowmelt population has greater magnitude high-frequency floods relative to the rainfall population, the low frequency floods are smaller than the rainfall population at the 100 year recurrence interval. Conversely, the rainfall-driven population has a shape parameter that is positive, that indicates that the flood peaks are unbounded with no asympototic upper bound on the underlying process. We can see this in the plot of recurrence interval vs. streamflow magnitude for rainfall events from question 3a, where the slope of GEV curve continues to increase with greater recurrence intervals. Especially eye catching is the major increase in the upper confidence interval on the upper tail, suggesting values well above 15,000 cfs could be possible. As we discussed briefly in class, rainfall-driven events could be unbounded because of the nature of extreme precipitation events where so many variables (e.g. storm extent/orientation/duration/seasonality, topography, and watershed physiographic characteristics) can come together to create very extreme conditions. On the other hand, for snowmelt-driven events, there is a maximum rate to the rate of melt from a snowpack that limits the maximum size of floods.

Question 3d

#combine peaks from each subpopulation into single annual max peak data set
newframe<- data.frame(rainpeaks=rainpeaks$discharge,snowpeaks=snowpeaks$discharge)
newframe$maxpeak <- apply(newframe, 1, max)  

#fit a GEV to the non-separated mixed-population
combinedGEV_LM <- fevd(x=maxpeak, data=newframe, method="Lmoments", units = "cfs")
plot(combinedGEV_LM)

#compare the low and high frequency flood magnitudes for the different distributions
return.level(snowGEV_LM, return.period = c(5,500))

## fevd(x = discharge, data = snowpeaks, method = "Lmoments", units = "cfs")
## get(paste("return.level.fevd.", newcl, sep = ""))(x = x, return.period = return.period)
## [1] "GEV model fitted to discharge (cfs)"
## Data are assumed to be  stationary 
## [1] "Covariate data = snowpeaks"
## [1] "Return Levels for period units in years"
##   5-year level 500-year level 
##       1180.161       2225.925

return.level(rainGEV_LM, return.period = c(5,500))

## fevd(x = discharge, data = rainpeaks, method = "Lmoments", units = "cfs")
## get(paste("return.level.fevd.", newcl, sep = ""))(x = x, return.period = return.period)
## [1] "GEV model fitted to discharge (cfs)"
## Data are assumed to be  stationary 
## [1] "Covariate data = rainpeaks"
## [1] "Return Levels for period units in years"
##   5-year level 500-year level 
##       821.4145      4494.4626

return.level(combinedGEV_LM, return.period = c(5,500))

## fevd(x = maxpeak, data = newframe, method = "Lmoments", units = "cfs")
## get(paste("return.level.fevd.", newcl, sep = ""))(x = x, return.period = return.period)
## [1] "GEV model fitted to maxpeak (cfs)"
## Data are assumed to be  stationary 
## [1] "Covariate data = newframe"
## [1] "Return Levels for period units in years"
##   5-year level 500-year level 
##       1273.645       3477.354

The snowmelt population has a 5-year event magnitude of 1180 cfs and a 500-year event magnitude of 2226 cfs. The 5-year and 500-year magnitudes for the rainfall population is 821 cfs and 4494 cfs, respectively. The 5-year and 500-year magnitudes for the non-separated mixed population is 1274 cfs and 3477 cfs, respectively. When using a mixed-population of flood events to fit the GEV with L-moments parameter estimation, the resulting FFA curve potentially overestimates the magnitude of higher frequency events like the 5-year flood while underestimating the magnitude of lower frequency events like the 500 year flood. The largest very rare flood events are driven by rainfall and the magnitudes of these extreme floods could be underestimated if using a mixed population and ignoring potential issues with iid. This could be very dangerous and lead to a larger than expected event happening with infrastructure underdesigned.

Question 3e

#find snow population upper bound
return.level(snowGEV_LM, return.period = c(1000, 5000, 10000, 1000000))

## fevd(x = discharge, data = snowpeaks, method = "Lmoments", units = "cfs")
## get(paste("return.level.fevd.", newcl, sep = ""))(x = x, return.period = return.period)
## [1] "GEV model fitted to discharge (cfs)"
## Data are assumed to be  stationary 
## [1] "Covariate data = snowpeaks"
## [1] "Return Levels for period units in years"
##  1000-year level  5000-year level 10000-year level 1e+06-year level 
##         2342.167         2582.698         2674.881         3148.357

I would estimate the upper bound for the snowmelt-driven population of floods to be at 3200 cfs. I determined this by returning the flood magnitude for extremely low frequency events (with recurrence intervals between 1000-1 million years). The results start to level out (as seen also in the plot from question 3a for snow) around this value. I think that there is an upper bound for snowmelt-driven floods because there is a physical limit to the amount of melt from the snowpack. The volume of water stored in the snowpack is finite and the rate of melt of this snow is also finite.

Question 3f 1. How important is the 2013 flood for determining the upper tail (e.g. 500-year return period) of the rainfall-driven flood peak distribution?

The 2013 flood event is very important to determining the upper tail of the rainfall-driven curve. Removing this event decreases the shape parameter from 0.33 to 0.06. While this positive shape value still suggests unbounded upper-tail magnitudes, it decreases the magnitudes of these events (e.g. the 500 year event goes from 4494 cfs with 2013 to 3148 when 2013 is excluded). I would argue that potentially the 2013 event might not be from the same population as the other rainfall peaks since it occurred in September and was driven by different mechanisms than what typically cause the summertime rainfall-driven streamflow peaks seen in the rest of the rainfall time series (refer back to question #1).

# Fitting GEV distributions to rainfall peaks excluding the 2013 event
rainpeaks_no13 <- rainpeaks %>% 
  dplyr::filter(year(date) != 2013)
rainGEV_no13 <- fevd(x=discharge, data=rainpeaks_no13, units = "cfs", , method = "Lmoments")
plot(rainGEV_no13, main = "Rain GEV with L-Moments")

# get flood magnitudes when we exclude the 2013 peak
print(rainGEV_no13$results)

##    location       scale       shape 
## 453.0831292 200.4351150   0.0626886

return.level(rainGEV_no13, return.period = c(2,20,100,500))

## fevd(x = discharge, data = rainpeaks_no13, method = "Lmoments", 
##     units = "cfs")
## get(paste("return.level.fevd.", newcl, sep = ""))(x = x, return.period = return.period)
## [1] "GEV model fitted to discharge (cfs)"
## Data are assumed to be  stationary 
## [1] "Covariate data = rainpeaks_no13"
## [1] "Return Levels for period units in years"
##   2-year level  20-year level 100-year level 500-year level 
##       527.3956      1107.4454      1521.8175      1975.8851

How important is the choice of parameter estimation technique? To do this, compare fitting using L moments and maximum likelihood estimation (MLE).

As demonstrated in the plots I generated for question 3a, there are significant differences in the FFA curves for rainfall events when using L-moments vs. MLE parameter estimation. The MLE rainfall population curves do a very poor job of fitting to the 2013 event, with that event falling well outside the confidence interval of the GEV. However, the differences between GEV fits for the snowmelt population using the MLE vs. L-moments method is very minor. This supports the understanding that L-moments parameter estimation is valuable for capturing outliers which are very common in FFA streamflow series datasets.

Based on your answers to subquestions 1 and 2, state in your own words in no more than one or two paragraphs the potential value of the “process-based” approach to flood frequency analysis described in Yu et al. (2021). Also, provide a few more sentences describing some of the shortcomings of process-based approaches (don’t just quote from Section 5.3 of that paper).

#Making a plot showing the different envelope curve options
# using methods from Yu et al. (2022)

# set a range of discharge values to compute the CDFs over
discharge_range <- seq(min(rainpeaks$discharge,snowpeaks$discharge), 
                       10000, length.out = 1000)

# snowmelt CDF
snow_cdf <- pevd(discharge_range, loc = snowLM_param['location'], 
                 shape = snowLM_param['shape'], scale = snowLM_param['scale'], 
                 type = "GEV")

# rainfall CDF
rain_cdf <- pevd(discharge_range, loc = rainLM_param['location'], 
                 shape = rainLM_param['shape'], scale = rainLM_param['scale'], 
                 type = "GEV")

# non-separated process CDF
nonsep_param <- combinedGEV_LM$results
nonsep_cdf <- pevd(discharge_range, loc = nonsep_param['location'], 
                 shape=nonsep_param['shape'], scale=nonsep_param['scale'], 
                 type = "GEV")

# combine CDFs 
mixture_cdf <- snow_cdf * rain_cdf
# plot(mixture_aep)

# combine back to plot envelope curves
all_dta <- data.frame(discharge=discharge_range, mixture_aep=1-mixture_cdf, 
                      snow_aep=1-snow_cdf, rain_aep=1-rain_cdf, nonsep_aep=1-nonsep_cdf) %>% 
  dplyr::mutate(mixture_ri = 1/mixture_aep,
                snow_ri = 1/snow_aep,
                rain_ri = 1/rain_aep,
                nonsep_ri = 1/nonsep_aep) %>% 
  mutate(across(everything(), ~ ifelse(is.infinite(.), NA, .)))
  
# plot all envelope curves together
ggplot(all_dta) +
   geom_line(aes(mixture_ri, discharge, color = "New Mixture Distribution"), linewidth=1.5) +
   geom_line(aes(nonsep_ri, discharge, color = "Non-Separated Mixture Distribution"), 
             linewidth=0.8, linetype="dashed") +
  geom_line(aes(snow_ri, discharge, color = "Snowmelt-Driven Distribution")) +
  geom_line(aes(rain_ri, discharge, color = "Rainfall-Driven Distribution")) +
 
  scale_x_log10(limits = c(1,600)) +
  ylim(0,5000) +
  scale_color_manual( values = c("Snowmelt-Driven Distribution" = "forestgreen",
                                 "Rainfall-Driven Distribution" = "red",
                                 "New Mixture Distribution" = "black",
                                 "Non-Separated Mixture Distribution" = "skyblue"),
    name = element_blank()) +
  labs(x = "Return Period (Years)", y = "Discharge (cfs)") +
  theme_minimal()

Using a processed-based approach to FFA that separates out mixed-populations of floods allows for a more accurate FFA curve that better captures the magnitudes of different frequency flood events. Having events caused by different processes (e.g. snowmelt and rainfall) violates the iid assumption and distorts the FFA curve. In this analysis, we saw that the mixed population can artificially inflate the magnitude of lower-frequency floods while underestimating the magnitude of upper-tail events. If instead, you use the methodology used by Yu et al. (2022) to combine the distributions by multiplying the sub-population cumulative density function (CDFs) together to get a better estimate of the mixture distribution (see figure above), the resulting envelope curve gives a more accurate idea of the size of potential floods compared to the original non-separated mixture distribution.

One potential limitation of process-based approaches is that it can be challenging to determine the driving process behind an event making the methods significantly more data and labor-intensive. Another complicating factor for this approach is that in some instances multiple drivers combine to create a flood (e.g. rain-on-snow combining rain and snowmelt). The analysis will have to assess whether these should be independent categories. When splitting up the sub-populations, the sample sizes for each FFA curve decreases and reduces the power of the data while increasing the size of the confidence intervals (although using partial duration series of different flood processes could help offset this).

Question 4

To estimate the largest imaginable flood in the Big Thompson watershed, irrespective of its probability of occurrence, I would begin by estimating the probable maximum precipitation (PMP) for the basin. The PMP represents the theoretical upper limit of rainfall that could occur over an area given the perfect combination of meteorological conditions and topography. As the above analysis suggests, rainfall-driven floods are likely to produce the largest floods compared to a snowmelt event here. While rainfall flood distribution and shape value suggest that rainfall-driven flood distributions are unbounded, I think that it is possible to estimate a PMP because there is a physical limit to the water carrying capacity of the atmosphere.

The 2013 flood in the Big Thompson is very helpful for determining what the largest imaginable flood might be like. This flood illustrates that the biggest flood in this basin is likely to come from multi-day storm sequences and be very different than the typical annual maximum flood events. I would take the 2013 storm and any other large anomalous Front Range storms I could find in the historical record and play around with transposing them onto the region. I would shift things like the storm duration, orientation, winds, atmospheric water content and stability, and event sequencing (while maintaining conditions that are physically possible). I believe that some of the largest precipitation-producing storms are going to come as climate warming increases the capacity of the atmosphere to hold water vapor. To account for this, I would like to recreate these storms under climate change conditions. This approach combining storm transposition with climate change considerations allows me to trade space for time when considering what the most extreme storm could possibly be.

I would run the resulting ensemble of PMP events through a rainfall-runoff model. Saturated antecedent hydrologic conditions would be used to maximize runoff generation. This would produce an ensemble of probable maximum flood (PMF) events for the Big Thompson spanning a range of physically realistic outcomes. The largest of these flows would represent the largest imaginable flood. The ensemble approach is useful because it considers a broad range of extreme combinations. Given that this is a managed watershed with dams, reservoirs, and diversions across the continental divide to the west slope of the Rocky Mountains, it would also be interesting to run the PMF flows through an operational model. Given that the largest flood on record for the Big Thompson is from the 1982 dam failure, it would be important to see if the PMF could potentially result in any infrastructure failures that could bring an even larger flood to the area.

CUAHSI Virtual University Seminal Papers in Flood Hydrology Problem Set

Mari Webb

December 1, 2024