Regional analysis applied to the hydrologic context aims to develop mathematical relations so that information from gauged catchments can be transferred to neighboring ungauged catchments (McCuen, 1998). A branch of hydrologic regional analysis is the regional frequency analysis. The regional frequency analysis makes use of widespread spatial data of a specific variable of interest (e.g., discharges and precipitation) within a region deemed homogeneous (in respect to statistical or physical behaviors) to estimate the quantiles for different exceedance probabilities in an ungauged spot in the study region (Naghettini & Pinto, 2007).
There are several technical guidelines to support the regional frequency analysis in the hydrologic community, among the pioneer’s guidance documents one may refer to the Flood-frequency analysis, Manual of Hydrology: Part 3 (Dalrymple, 1960). To name a few of the available approaches, following are the highlighted ones by Naghettini & Pinto (2007):
Given the time constraints to explore and compare all the mentioned approaches, for this question we will focus on the index-flood method proposed in Dalrymple (1960), and also described in Naghettini & Pinto (2007), Ponce (1989), and McCuen (1998).
The following sections describe the applied methodology to estimate the design flow events corresponding to the 25 (Q25), 50 (Q50), 100 (Q100), and 200 (Q200) year return periods for the Mossom Creek system.
The available data consisted of annual peak discharge time series for each catchment, and their respective drainage areas (DA), as indicated in Table 1, presented before. The excel data were converted to individual .txt files for each catchment. Data were read individually and the time series plots were generated for each catchment by the R script code presented as follows.
#Clean R memory
rm(list=ls(all=TRUE))
#Setting working directory
setwd("C:/Users/cassi/OneDrive/ComprehensiveExam_CassioRampinelli/ProfessorBaki/data")
#Save files names inside the working directory
filesNames<-dir()
#Create a variable defined as a list to store all data
data<-0
data<-as.list(data)
#Loop to read all data files automatically
for(i in 1:length(filesNames)){
data[[i]]<-read.table(file=filesNames[i],header=T,sep="\t")
}
#Vector to store the drainage area
dA<-c(172,3.63,2.59,37.3,117,47.7,54.7)
#Vector do store catchment ID
stationID<-substr(filesNames,1,nchar(filesNames)-4)
#Loop to plot time series of annual peak discharge for each catchment
for(i in 1:length(filesNames)){
plot(x=data[[i]]$Year,y=data[[i]]$Qpeak.m3.s.,type="l",ylab="Peak Discharge (m³/s)",xlab="Year",main=paste(c(stationID[i]," (DA=",dA[i]," km²)"), collapse = ''))
points(data[[i]]$Year, data[[i]]$Qpeak.m3.s., pch = 19)
}
The plots indicate the time series ranges are different. Although selecting a time base common to all the stations would be argued as a strategy to eliminate or reduce the effect of variability with time, this would also restrict or eliminate information captured by catchments with long timeline records.
For instance, an approach like that would be limited to the use of 21 years of data for all catchments, since this is the maximum period covered by catchment 08GA065 (the one with less data). If we consider the records for catchment 08GA010, that covers 97 years, this procedure will result in eliminating 76 years of flood records for this station. For the sake of simplicity, in this question we will use all available records for all catchments to extract all data information. However, additional analysis might be performed in further studies to evaluate the impact of potential variability in the time length of the adopted series.
Before proceeding with the regional analysis, a local frequency analysis for each catchment will be performed individually considering annual flow time series. The goal will be selecting a suitable distribution to use in the regional analysis. Among several statistical distributions available, here three frequently used distributions for extreme values will be tested: Generalized Extreme Values-GEV, Gumbel (Extreme-value Type I), and Pearson Type III-P3.
For each of the distributions listed above the form of the probability density function (pdf)\(f(x)\), the cumulative distribution function (cdf) \(F(x)\), and the quantile function \(x(F)\) are given. The equations are listed in Hosking & Wallis (1997).
The pdf for the GEV is given by :
\[ f(x)=\frac{1}{\alpha}t(x)^{k+1}exp(-t(x)) \] with
\[ t(x)=\Bigg( 1+k\Big(\frac{x-\xi}{\alpha}\Big)\Bigg)^{-1/k},for \ k \neq0 \]
\[ t(x)= exp\Bigg[-\Big(\frac{x-\xi}{\alpha}\Big)\Bigg],for \ k =0 \]
The cdf is given by:
\[ F(x)=exp\Big[-t(x)\Big] \]
The quantile function is given by:
\[ x(F)=\xi +\frac{\alpha}{k} (1-(-log (F))^k) \] where \(\xi\), \(\alpha\), and \(k\) are location, scale, and shape parameters respectively, and \(x\) represents the discharge (associated with a return period).
The pdf for the Type I extreme-value distribution (Gumbel) for maximum values is given by :
\[ f(x)=\frac{1}{\alpha}exp \Bigg[-\frac{x-\xi}{\alpha}-exp\Bigg(-\frac{x-\xi}{\alpha}\Bigg)\Bigg] \]
The cdf is given by:
\[ F(x)=exp\Bigg[-exp\Bigg(-\frac{x-\xi}{\alpha}\Bigg)\Bigg] \]
The quantile function is given by:
\[ x(F)=\xi -\alpha \ log(-log (F)) \]
In which \(\xi\), and \(\alpha\) are the location, and scale parameters respectively, and \(x\) represents the discharge (associated with a return period).
The P3 distribution has the following pdf:
\[ f(x)=\frac{(x-\xi)^{(\alpha-1)}\ exp(-(x-\xi)/\beta)}{\beta^\alpha\Gamma(\alpha)} \]
The cdf is given by:
\[ F(x)=\frac{G\Bigg(\alpha, \frac{x-\xi}{\beta}\Bigg)}{\Gamma(\alpha)} \]
The quantile function \(x(F)\) has no explicit analytical form.
For \(\gamma \neq0\), let \(\alpha=4/\gamma^2\), \(\beta=\frac{\sigma}2|\gamma|\), and \(\xi=\mu-\frac{2\sigma}{\gamma}\). If \(\gamma>0\), then the range of \(x\) is \(\xi \leq x < \infty\).
With \(\mu\), \(\sigma\), and \(\gamma\) the location, scale and shape parameters respectively, \(x\) represents the discharge (associated with a return period), and the incomplete gamma function given by:
\[ G\Bigg(\alpha, x\Bigg)=\int_0^xt^{\alpha-1}exp(-t)dt \]
The population parameters (location, scale, and shape) must be estimated. The Method of Moments, Method of L-Moments or the Maximum-Likelihood Method might be applied. Details about each method can be found in Chin (2014). Here, for the parameter estimates, a function from the R package (lmon) will be applied. This function uses the Method of L-Moments.
Next, the R code developed to fit the three distributions to all catchments is shown with comments.
Plots of the empirical (data) x fitted distributions are presented. For each catchment two plots are shown: cdf x Discharge, and Return Period X Discharge.
#Package with the probabilities funtions
library(lmom)
#Matrix to store GEV fitted parameters
GEV.parameters<-matrix(NA,nrow=7,ncol=3)
colnames(GEV.parameters)=c("xi","alpha","k")
rownames(GEV.parameters)=stationID
#Matrix to store Gumbel fitted parameters
Gumbel.parameters<-matrix(NA,nrow=7,ncol=2)
colnames(Gumbel.parameters)=c("xi","alpha")
rownames(Gumbel.parameters)=stationID
#Matrix to store P3 fitted parameters
P3.parameters<-matrix(NA,nrow=7,ncol=3)
colnames(P3.parameters)=c("mu","sigma","gamma")
rownames(P3.parameters)=stationID
#Loop to process frequency analysis to all catchments
for(i in 1:length(filesNames)){
#Sorting extreme values in decreasing order
sorted.maximas<-sort(data[[i]]$Qpeak.m3.s.,decreasing=T)
#Computing the empirical probabilities
p<-(c(1:length(sorted.maximas)))/(length(sorted.maximas)+1)
#Computing the empirical return period
tr<-1/p
#Computing sample L-moments
lmon<-samlmu(data[[i]]$Qpeak.m3.s.)
############################################
#Fitting parameters for GEV distribution
############################################
para.GEV<-pelgev(lmon)
GEV.parameters[i,]=para.GEV
############################################
#Fitting parameters for Gumbel distribution
############################################
para.Gumbel<-pelgum(lmon)
Gumbel.parameters[i,]=para.Gumbel
############################################
#Fitting parameters for P3 distribution
############################################
para.P3<-pelpe3(lmon)
P3.parameters[i,]=para.P3
##############################################
#Plot fitted cdf x empirical cdf
##############################################
plot(1-p,sorted.maximas,main=stationID[i],pch=19,cex.axis=1.3,cex.lab=1.3,xlab="",ylab="")
mtext(side=1, line=2, "cdf", font=1,cex=1.3)
mtext(side=2, line=2, expression(Discharge ~ (m^3/s)), font=1, cex=1.3)
#Pearson type 3 fitting
lines(cdfpe3(sorted.maximas,para.P3),sorted.maximas,col="red",lty=1,lwd=2)
#GEV fitting
lines(cdfgev(sorted.maximas,para.GEV),sorted.maximas,col="black",lty=4,lwd=2)
#Gumbel fitting
lines(cdfgum(sorted.maximas,para.Gumbel),sorted.maximas,col="blue",lty=2,lwd=2)
grid()
#Legend
legend("topleft", legend=c("Pearson Type 3", "Gumbel","GEV","Observed"),
col=c("red", "blue","black","black"), lty=c(1,4,2,NA),pch=c(NA,NA,NA,19), cex=1,lwd=rep(2,4))
##############################################
#Return Period x Discharge
##############################################
#Empirical data
plot(tr,sorted.maximas,main=stationID[i],xlab="",ylab="",ylim=c(min(sorted.maximas),max(sorted.maximas)*1.5),xlim=c(0,200),pch=19,cex.axis=1.3,cex.main=1.3,cex.lab=1.3)
mtext(side=1, line=2, "Return Period (year)", font=2,cex=1.3)
mtext(side=2, line=3, "Discharge (m³/s)", font=2, cex=1.3)
#Return Periods
returnPeriods<-c(2.33,5,10,25,50,100,200)
#Associated Probabilities
probabilities<-1-1/returnPeriods
#Pearson Type III
lines(returnPeriods,quape3(probabilities, para.P3),lty=1,col="red",lwd=2)
#GEV
lines(returnPeriods,quagev(probabilities, para.GEV),lty=4,col="black",lwd=2)
#Gumbel
lines(returnPeriods,quagum(probabilities, para.Gumbel),lty=2,col="blue",lwd=2)
grid()
legend("topleft", legend=c("Pearson Type 3", "Gumbel","GEV","Observado"),
col=c("red", "blue","black","black"), lty=c(1,4,2,NA),pch=c(NA,NA,NA,19), cex=1,lwd=rep(2,4))
}
The fitted parameters for all catchments are summarized in the following tables for each assessed distribution.
| \(\xi\) | \(\alpha\) | \(\kappa\) | |
|---|---|---|---|
| 08GA010 | 197.326996 | 60.9088633 | 0.0718144 |
| 08GA061 | 2.474685 | 1.0386982 | -0.1461842 |
| 08GA065 | 3.202304 | 0.8993236 | -0.1948388 |
| 08MH006 | 36.330789 | 14.8005734 | -0.0188384 |
| 08MH058 | 90.723492 | 35.0436162 | -0.0117970 |
| 08MH076 | 30.435422 | 11.5952163 | -0.2011454 |
| 08MH141 | 65.835591 | 16.4633580 | 0.1761987 |
| \(\xi\) | \(\alpha\) | |
|---|---|---|
| 08GA010 | 195.383261 | 57.239298 |
| 08GA061 | 2.548613 | 1.212163 |
| 08GA065 | 3.289938 | 1.115478 |
| 08MH006 | 36.458825 | 15.064942 |
| 08MH058 | 90.912802 | 35.431784 |
| 08MH076 | 31.606191 | 14.501914 |
| 08MH141 | 64.584866 | 14.332942 |
| \(\mu\) | \(\sigma\) | \(\gamma\) | |
|---|---|---|---|
| 08GA010 | 228.422680 | 71.601772 | 0.7598453 |
| 08GA061 | 3.248293 | 1.612482 | 1.6078114 |
| 08GA065 | 3.933810 | 1.514361 | 1.8093636 |
| 08MH006 | 45.154545 | 19.223800 | 1.1037393 |
| 08MH058 | 111.364583 | 45.130608 | 1.0766331 |
| 08MH076 | 39.976923 | 19.743356 | 1.8359664 |
| 08MH141 | 72.858064 | 17.687562 | 0.3773304 |
Based on the fitting plots from the previous section a visual selection of the most suitable distributions would be difficult since for most of the sites the three tested distributions seem to be reasonably adjusted to the data.
There are several statistical goodness-of-fit procedures to hypothesis testing. The chi-square test, and the Kolmogorov-Smirnov tests are maybe the most well-known examples. However, here we apply other powerful goodness-of-fit criteria known as Probability Plot Correlation Coefficient (PPCC) given its practical convenience.
The PPCC was originally developed by Filliben (1975) for normal distribution and later adjusted for other distribution types. The PPCC represents the correlation coefficient r, between the ordered observations {\({x_1,x_2,...,x_i,...x_N}\)}, and the corresponding teoretical quantiles {\(w_1, w_2,...,w_i,...w_N\)} which are computed by \(w_i=F^{-1}_X(1-q_i)\), and \(q_i\) represents the empirical probability related to \(i\). The formal expression for PPCC statistic is given by:
\[ PPCC=r=\frac{\sum_{i=1}^N(x_{i}-\bar{x})(w_i-\bar{w})}{\sqrt{\sum_{i=1}^N(x_{i}-\bar{x})^2(w_i-\bar{w})^2}} \] in which
\[ \bar{x}=\sum_{i=1}^Nx_{i}/N \] \[ \bar{w}=\sum_{i=1}^Nw_{i}/N \]
Here, for practical purpose, we will select the distribution model that yields the highest value for PPCC (as proposed in Tung et al. 2006 ) rather than comparing it to the critical reference values.
The R code developed to compute the PPCC metric for each catchment based on the three fitted distributions is presented below. The code makes use of the package ppcc to compute the PPCC values. At the end of the code lines, a summary table with the computed PPCC values is presented indicating the selected distribution.
##Package to compute PPCC metric
library(ppcc)
#Matrix to store ppcc metrics for all distributions in all catchments
ppccValues<-matrix(NA,nrow=7,ncol=3)
colnames(ppccValues)=c("GEV","Gumbel","P3")
rownames(ppccValues)=stationID
#Matrix to store p-values for all distributions in all catchments
pValues<-matrix(NA,nrow=7,ncol=3)
colnames(pValues)=c("GEV","Gumbel","P3")
rownames(pValues)=stationID
#Loop to compute ppcc and pvalues along catchments
for(i in 1:length(filesNames)){
#Computing ppcc coefficient for GEV
ppccGEV=ppccTest(data[[i]]$Qpeak.m3.s., "qgev",shape=GEV.parameters[i,3])
ppccValues[i,1]=ppccGEV$statistic
pValues[i,1]=ppccGEV$p.value
#Computing ppcc coefficient for Gumbel
ppccGumbel=ppccTest(data[[i]]$Qpeak.m3.s., "qgumbel")
ppccValues[i,2]=ppccGumbel$statistic
pValues[i,2]=ppccGumbel$p.value
#Computing ppcc coefficient for P3
ppccP3=ppccTest(data[[i]]$Qpeak.m3.s., "qpearson3",shape=P3.parameters[i,3])
ppccValues[i,3]=ppccP3$statistic
pValues[i,3]=ppccP3$p.value
}
#Selecting the most suitable distribution
selected=0
for(i in 1:length(filesNames)){
temp=which(ppccValues[i,]==max(ppccValues[i,]))
if(temp==1){selected[i]="GEV"}
else{
if(temp==2){selected[i]="Gumbel"}
else(selected[i]="P3")
}
}
#Final Tabble with ppcc values and the selected distribution
Selected.distribution=cbind(ppccValues,selected)
| GEV | Gumbel | P3 | Selected Distribution | |
|---|---|---|---|---|
| 08GA010 | 0.99548927151705 | 0.994377599208432 | 0.995570851137933 | P3 |
| 08GA061 | 0.992062881537118 | 0.982719566817776 | 0.988939542659771 | GEV |
| 08GA065 | 0.968038881587764 | 0.941387131654744 | 0.955484645569975 | GEV |
| 08MH006 | 0.992492810714673 | 0.9927431495946 | 0.994893903035613 | P3 |
| 08MH058 | 0.97579000031266 | 0.975826030556301 | 0.980726892001789 | P3 |
| 08MH076 | 0.972717159789732 | 0.978644474004069 | 0.986489253266378 | P3 |
| 08MH141 | 0.991888840111498 | 0.976573641282855 | 0.991290305595479 | GEV |
As mentioned in the Introduction we have adopted the Index-flood method to perform the region analysis. The index-flood method was chosen based on the following reasons: (a) the method only requires calibration of one index equation, (b) the approach is computationally simple to apply, and (c) the index ratios assume consistency across return periods (McCuen, 1998).
The procedure to develop a regional frequency curve by the index-flood method is summarized in the following steps adapted from Ponce (1989).
Assemble the records (annual exceedance or annual maxima series) for the catchments of interest.
For each \(i^{th}\) station, rank the records in descending order and compute return periods fitting a statistical distribution suitable to the local data such as GEV, Gumbel, etc.
For each \(i^{th}\) station, determine the mean annual flood, that is, the peak flow corresponding to the 2.33 return period.
Note: we have used the selected distributions indicated in Table 4 for each catchment. The R code implemented to compute the discharges associated to different return periods is presented below. Table 5 summarizes the results for these steps.
################################################
#Computing fitted discharges for return periods
##################################################
fitted.discharges=matrix(NA,nrow=7,ncol=length(returnPeriods))
colnames(fitted.discharges)=c("Q2.33","Q5","Q10","Q25","Q50","Q100","Q200")
rownames(fitted.discharges)=stationID
fitted.discharges[1,]=quape3(probabilities, P3.parameters[1,])
fitted.discharges[2,]=quagev(probabilities, GEV.parameters[2,])
fitted.discharges[3,]=quagev(probabilities, GEV.parameters[3,])
fitted.discharges[4,]=quape3(probabilities, P3.parameters[4,])
fitted.discharges[5,]=quape3(probabilities, P3.parameters[5,])
fitted.discharges[6,]=quape3(probabilities, P3.parameters[6,])
fitted.discharges[7,]=quagev(probabilities, GEV.parameters[7,])
fitted.discharges %>%
kbl(caption = "Table 5. Catchments ID and their respective discharges for different return periods",col.names = c("Q2.33","Q5","Q10","Q25","Q50","Q100","Q200")) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Q2.33 | Q5 | Q10 | Q25 | Q50 | Q100 | Q200 | |
|---|---|---|---|---|---|---|---|
| 08GA010 | 232.141807 | 284.560032 | 324.020377 | 370.380533 | 402.739541 | 433.494605 | 463.02993 |
| 08GA061 | 3.101811 | 4.216678 | 5.242480 | 6.710354 | 7.938584 | 9.289257 | 10.77933 |
| 08GA065 | 3.753105 | 4.769036 | 5.742416 | 7.194374 | 8.458740 | 9.897372 | 11.53922 |
| 08MH006 | 45.025790 | 59.474385 | 70.932096 | 84.881854 | 94.874516 | 104.536355 | 113.94725 |
| 08MH058 | 111.270888 | 145.134307 | 171.879423 | 204.354099 | 227.572413 | 249.994253 | 271.81178 |
| 08MH076 | 37.440124 | 52.561128 | 65.942643 | 83.378143 | 96.448777 | 109.449668 | 122.39808 |
| 08MH141 | 74.891639 | 87.535915 | 96.420816 | 106.090837 | 112.289952 | 117.728460 | 122.52093 |
Choose several frequencies, and for each \(i^{th}\) station and \(j^{th}\) frequency (or Return Period) calculate the peak flow ratio, i.e, the ratio of peak flow for the \(j^{th}\) frequency to the mean annual flood.
For each \(j^{th}\) frequency, determine the median value of peak flow ratios for all stations, that is , the median peak flow ratio.
Note: The code developed to compute the dimensionless fitted discharges using the Q2.33 as denominator for the index-ratio is presented next. Table 6 shows the results
################################################
#Computing dimensionless discharges for return periods
##################################################
adim.discharges=matrix(NA,nrow=8,ncol=length(returnPeriods))
colnames(adim.discharges)=c("Q2.33/Q2.33","Q5/Q2.33","Q10/Q2.33","Q25/Q2.33","Q50/Q2.33","Q100/Q2.33","Q200/Q2.33")
rownames(adim.discharges)=c(stationID,"Median")
adim.discharges[1,]=fitted.discharges[1,]/fitted.discharges[1,1]
adim.discharges[2,]=fitted.discharges[2,]/fitted.discharges[2,1]
adim.discharges[3,]=fitted.discharges[3,]/fitted.discharges[3,1]
adim.discharges[4,]=fitted.discharges[4,]/fitted.discharges[4,1]
adim.discharges[5,]=fitted.discharges[5,]/fitted.discharges[5,1]
adim.discharges[6,]=fitted.discharges[6,]/fitted.discharges[6,1]
adim.discharges[7,]=fitted.discharges[7,]/fitted.discharges[7,1]
adim.discharges[8,]=c(median(adim.discharges[1:7,1]),median(adim.discharges[1:7,2]),median(adim.discharges[1:7,3]),median(adim.discharges[1:7,4]),median(adim.discharges[1:7,5]),median(adim.discharges[1:7,6]),median(adim.discharges[1:7,7]))
adim.discharges %>%
kbl(caption = "Table 6. Catchments ID and their respective dimensionless discharges for different return periods",col.names = c("Q2.33/Q2.33","Q5/Q2.33","Q10/Q2.33","Q25/Q2.33","Q50/Q2.33","Q100/Q2.33","Q200/Q2.33")) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Q2.33/Q2.33 | Q5/Q2.33 | Q10/Q2.33 | Q25/Q2.33 | Q50/Q2.33 | Q100/Q2.33 | Q200/Q2.33 | |
|---|---|---|---|---|---|---|---|
| 08GA010 | 1 | 1.225803 | 1.395786 | 1.595493 | 1.734886 | 1.867370 | 1.994600 |
| 08GA061 | 1 | 1.359424 | 1.690135 | 2.163366 | 2.559338 | 2.994785 | 3.475173 |
| 08GA065 | 1 | 1.270691 | 1.530044 | 1.916912 | 2.253798 | 2.637115 | 3.074579 |
| 08MH006 | 1 | 1.320896 | 1.575366 | 1.885183 | 2.107115 | 2.321699 | 2.530711 |
| 08MH058 | 1 | 1.304333 | 1.544693 | 1.836546 | 2.045211 | 2.246717 | 2.442793 |
| 08MH076 | 1 | 1.403872 | 1.761283 | 2.226973 | 2.576081 | 2.923325 | 3.269169 |
| 08MH141 | 1 | 1.168834 | 1.287471 | 1.416591 | 1.499366 | 1.571984 | 1.635976 |
| Median | 1 | 1.304333 | 1.544693 | 1.885183 | 2.107115 | 2.321699 | 2.530711 |
#Regression
#Q2.33 discharge for all catchments
Q2.33=fitted.discharges[,1]
#Regressao linear
regression<-lm(log(Q2.33)~log(dA))
#Coefficients
coeff<-regression$coefficients
y<-log(dA)*coeff[2]+coeff[1]
plot(log(dA),log(Q2.33),xlab="",ylab="",main="Regression Analysis- Drainage Area X Q2.33")
lines(log(dA),y,lty=1,col="red")
eq <- paste0("ln(Q2.33)= ", coeff[1],
ifelse(sign(coeff[2])==1, " + ", " - "), abs(coeff[2]), " ln(DA) ")
mtext(side=1, line=3, expression('ln(DA[' ~ km^2 ~'])'), font=2,cex=1.3)
mtext(side=2, line=2, expression('ln(Q2.33[' ~ m^3/s ~'])'), font=2, cex=1.3)
## printing of the equation
mtext(eq, 3, line=-2)
Regional analysis requires the definition of regions deemed homogeneous. Gauging stations within homogeneous areas are selected to proceed with the regional frequency analysis. This is a controversial issue in hydrology and is affected by subjective judgments. In general, the type of data used to define these regions might be local statistics or local characteristics (Naghettini & Pinto, 2007). Examples of local statistics are metrics such as skewness and kurtosis for the available sample data of the gauges. Local characteristics encompass hydrometeorological variables, hydrologic signatures, or other physical attributes such as coordinates, and altitude, to name a few.
Among the several approaches to define the gauging stations that are part of a homogeneous region one might list: geographical proximity (nearby catchments are grouped based on an arbitrary proximity criterion), subjective grouping (catchments are arbitrarily grouped based on criteria such as similar climate features, topography , rain patters, etc.), objective grouping (a metric and a threshold are specified to group similar catchments such as likelihood ratios, skewness, kurtoses, etc.), cluster analysis, to name a few.
In this question, we have used the test of hydrologic homogeneity proposed in Dalrymple (1960), although other tests might also be applied and provide different conclusions.
The test is also described in Ponce (1989) as follows:
For each \(i^{th}\) station, use its frequency curve to determine the 2.33 year and the 10 year floods.
For each \(i^{th}\) station, calculate the 10 year peak flow ratio, i.e., the ratio of the 10 year flood to the 2.33 year flood.
Calculate the average of the 10 year peak flow ratios for all stations.
For each \(i^{th}\) station, multiply the 2.33 year flood by the average 10 year peak flow ratio to obtain an adjusted 10 year peak flow.
For each \(i^{th}\) station, use its frequency curve to determine the return period \(T_i\) for the adjusted 10 year peak flow.
For each \(i^{th}\) station, plot the return period \(T_i\) versus the length of the record \(n\), in years. Points located within the confidence limits are considered to be hydrologically homogeneous. Points lying outside of the solid lines should not be used in the calculation of the median peak flow ratio.
The described procedure was implemented in R. Code lines are presented below. The assessment has shown that Catchment 08MH141 lies outside the confidence limits (red solid lines in the Plot). Therefore, this Catchment should not be used in the analysis based on this method.
#Test of hydrologic homogeneity
#Adjusted Q10
adustedQ10=fitted.discharges[,1]*mean(adim.discharges[1:7,3])
adustedQ10
## 08GA010 08GA061 08GA065 08MH006 08MH058 08MH076 08MH141
## 357.656849 4.778907 5.782344 69.370453 171.433125 57.683349 115.384248
#Computing Ti for all catchments
accum<-0
fitted.tr10<-0
accum[1]<-cdfpe3(adustedQ10[1],P3.parameters[1,])
fitted.tr10[1]<-1/(1-accum[1])
accum[2]<-cdfgev(adustedQ10[2],GEV.parameters[2,])
fitted.tr10[2]<-1/(1-accum[2])
accum[3]<-cdfgev(adustedQ10[3],GEV.parameters[3,])
fitted.tr10[3]<-1/(1-accum[3])
accum[4]<-cdfpe3(adustedQ10[4],P3.parameters[4,])
fitted.tr10[4]<-1/(1-accum[4])
accum[5]<-cdfpe3(adustedQ10[5],P3.parameters[5,])
fitted.tr10[5]<-1/(1-accum[5])
accum[6]<-cdfpe3(adustedQ10[6],P3.parameters[6,])
fitted.tr10[6]<-1/(1-accum[6])
accum[7]<-cdfgev(adustedQ10[7],GEV.parameters[7,])
fitted.tr10[7]<-1/(1-accum[7])
ajustedTR10<-c(fitted.tr10[1],fitted.tr10[2],fitted.tr10[3],fitted.tr10[4],fitted.tr10[5],fitted.tr10[6],fitted.tr10[7])
len<-c(length(data[[1]]$Qpeak.m3.s.),length(data[[2]]$Qpeak.m3.s.),length(data[[3]]$Qpeak.m3.s.),length(data[[4]]$Qpeak.m3.s.),length(data[[5]]$Qpeak.m3.s.),length(data[[6]]$Qpeak.m3.s.),length(data[[7]]$Qpeak.m3.s.))
#Plot
plot(x=len,y=ajustedTR10,xlab="Length of record (years)",ylab="Return Period (Years)",main="Homogeneity Test Chart",log="y",xlim=c(5,100),ylim = c(1.2,150))
#Confidence limits based in Dalrymple (1960)
xLimits<-c(5,7.5,10,12.5,15,17.5,20,25,30,35,40,45,50,60,80,100)
upperLimity<-c(150.5,95,70,60,50,45,40,33,30,28,25,23,21,20.3,19.9,19.99)
lowerLimity<-c(1.21,1.5,1.75,2.25,2.4,2.6,2.7,2.85,3.1,3.6,3.95,4.1,4.5,4.8,4.95,4.99)
lines(xLimits,upperLimity,col="red")
lines(xLimits,lowerLimity,col="red")
#Removed station
text(31.5, 75,stationID[7],
cex=0.8, pos=3,col="red")
To compute the design flow events for the Mossom Creek, the regional analysis described in the previous section was updated following the same described steps after removing Catchment 08MH141 from the analysis, since it was considered not part of a homogeneous area.
This procedure resulted in the following Regional Fit, and Regression Analysis.
########################################################################
########################################################################
#####Updated regional analysis
########################################################################
################################################
#Computing dimensionless discharges for return periods removing station 08MH141
##################################################
adim.discharges.final=matrix(NA,nrow=7,ncol=length(returnPeriods))
colnames(adim.discharges.final)=c("Q2.33/Q2.33","Q5/Q2.33","Q10/Q2.33","Q25/Q2.33","Q50/Q2.33","Q100/Q2.33","Q200/Q2.33")
rownames(adim.discharges.final)=c(stationID[-7],"Median")
adim.discharges.final[1,]=fitted.discharges[1,]/fitted.discharges[1,1]
adim.discharges.final[2,]=fitted.discharges[2,]/fitted.discharges[2,1]
adim.discharges.final[3,]=fitted.discharges[3,]/fitted.discharges[3,1]
adim.discharges.final[4,]=fitted.discharges[4,]/fitted.discharges[4,1]
adim.discharges.final[5,]=fitted.discharges[5,]/fitted.discharges[5,1]
adim.discharges.final[6,]=fitted.discharges[6,]/fitted.discharges[6,1]
adim.discharges.final[7,]=c(median(adim.discharges.final[1:6,1]),median(adim.discharges.final[1:6,2]),median(adim.discharges.final[1:6,3]),median(adim.discharges.final[1:6,4]),median(adim.discharges.final[1:6,5]),median(adim.discharges.final[1:6,6]),median(adim.discharges.final[1:6,7]))
adim.discharges.final %>%
kbl(caption = "Table 7. Catchments ID and their respective dimensionless discharges for different return periods after removing 08MH141",col.names = c("Q2.33/Q2.33","Q5/Q2.33","Q10/Q2.33","Q25/Q2.33","Q50/Q2.33","Q100/Q2.33","Q200/Q2.33")) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Q2.33/Q2.33 | Q5/Q2.33 | Q10/Q2.33 | Q25/Q2.33 | Q50/Q2.33 | Q100/Q2.33 | Q200/Q2.33 | |
|---|---|---|---|---|---|---|---|
| 08GA010 | 1 | 1.225803 | 1.395786 | 1.595493 | 1.734886 | 1.867370 | 1.994600 |
| 08GA061 | 1 | 1.359424 | 1.690135 | 2.163366 | 2.559338 | 2.994785 | 3.475173 |
| 08GA065 | 1 | 1.270691 | 1.530044 | 1.916912 | 2.253798 | 2.637115 | 3.074579 |
| 08MH006 | 1 | 1.320896 | 1.575366 | 1.885183 | 2.107115 | 2.321699 | 2.530711 |
| 08MH058 | 1 | 1.304333 | 1.544693 | 1.836546 | 2.045211 | 2.246717 | 2.442793 |
| 08MH076 | 1 | 1.403872 | 1.761283 | 2.226973 | 2.576081 | 2.923325 | 3.269169 |
| Median | 1 | 1.312615 | 1.560030 | 1.901048 | 2.180456 | 2.479408 | 2.802645 |
#Plot Regional Fit-Index Flood
plot(x=returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[1,2:length(returnPeriods)],xlab="",ylab="",ylim=c(1,4),pch=0,cex.axis=1.3,cex.main=1.3,cex.lab=1.3,log="x",main="Index Flood - Regional Fit-Final")
points(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[2,2:length(returnPeriods)],pch=2,col="black")
points(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[3,2:length(returnPeriods)],pch=3,col="black")
points(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[4,2:length(returnPeriods)],pch=11,col="black")
points(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[5,2:length(returnPeriods)],pch=15,col="blue")
points(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[6,2:length(returnPeriods)],pch=16,col="red")
lines(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[7,2:length(returnPeriods)],lty=1,lwd=2,col="red")
mtext(side=1, line=2, "Return Period (year)", font=2,cex=1.3)
mtext(side=2, line=3, "Q/Q2.33", font=2, cex=1.3)
grid(nx=NULL, ny=NULL, col= "lightgray", lty="dotted", equilogs=FALSE)
legend("topleft", legend=c(stationID[-7],"Median (Regional Fit)"),
col=c("black","black","black","black","blue","red","red"), lty=c(NA,NA,NA,NA,NA,NA,1),lwd=c(NA,NA,NA,NA,NA,NA,2),pch=c(0,2,3,11,15,16,NA), cex=1)
#Regression after removing Catchment 08MH141
#Q2.33 discharge for all catchments
#Q2.33 discharge for all catchments
Q2.33=fitted.discharges[1:6,1]
#Linear Regression
regression<-lm(log(Q2.33)~log(dA[1:6]))
#Coefficients
coeff<-regression$coefficients
y<-log(dA[1:6])*coeff[2]+coeff[1]
plot(log(dA[1:6]),log(Q2.33),xlab="",ylab="",main="Regression Analysis - Final - Drainage Area X Q2.33")
lines(log(dA[1:6]),y,lty=1,col="red")
## sign check to avoid having plus followed by minus for negative coefficients
eq <- paste0("ln(Q2.33)= ", coeff[1],
ifelse(sign(coeff[2])==1, " + ", " - "), abs(coeff[2]), " ln(DA) ")
mtext(side=1, line=3, expression('ln(DA[' ~ km^2 ~'])'), font=2,cex=1.3)
mtext(side=2, line=2, expression('ln(Q2.33[' ~ m^3/s ~'])'), font=2, cex=1.3)
## printing of the equation
mtext(eq, 3, line=-2)
To regionalize the design flow events for the Mossom Creek we first compute the Q2.33 discharge at the study site using the Mossom Creek drainage area (3.188 km²) and the final regression analysis equation.
\[ ln(Q_{2.33})=0.09769367087988+0.990996382591051 \cdot ln(DA) \] \[ ln(Q_{2.33})=0.09769367087988+0.990996382591051 \cdot ln(3.188) \] \[ ln(Q_{2.33})=4.150293 \] \[ Q_{2.33}=e^{4.150293} \] \[ Q_{2.33}=63.45 \ m^3/s \]
Now the dimensionless regional discharges for the different return periods from the regional fit (represented by the median in Table 7) is multiplied by the Q2.33 discharge computed to Mossom Creek resulting in the design flow events for the Mossom Creek, summarized in Table 8.
#Computing Q2.33 at MossomCreek using regression analysis
lnQ2.33=coeff[2]+coeff[2]*3.188
Q2.33=exp(lnQ2.33)
MossomCreek=adim.discharges.final[7,4:7]*Q2.33
names(MossomCreek)=c("Q25","Q50","Q100","Q200")
MossomCreek=rbind(c("Q25","Q50","Q100","Q200"),MossomCreek)
MossomCreek %>%
kbl(caption = "Table 8. Design flows at Mossom Creek (m³/s)",col.names = c("","","","")) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Q25 | Q50 | Q100 | Q200 | |
| MossomCreek | 120.626381183791 | 138.355578475596 | 157.324799272791 | 177.83504230504 |
We have computed the design flow events corresponding to 25, 50, 100, and 200 year return periods for the Mossom Creek system by applying a regionalization analysis based on the index-flood method. There are different regionalization techniques that might bring different outputs. For this assignment we have focus only on the index-flood method, however, it would be a good designing practice evaluating the sensitivity of the design flow events to different regionalization methods depending on the level of significance and risk tolerance assumed for the project.
Similarly, there are also different homogeneity test strategies that might be complementary evaluated. For the sake of simplicity, we have used the 10 year flow ratio. Although some studies have shown that homogeneity may be assumed on the basis of the 10 year peak flow ratio (Ponce, 1998), the individual frequency curves may show wide and sometimes systematic differences at longer run periods.
Chin, D.A. (2014). Water Resources Engineering. 3rd Edition. Pearson Education, Inc. upper Saddle River, New Jersey. ISBN:0-13-283321-2.
Dalrymple, T. (1960). Flood-frequency analyses, Manual of Hydrology: Part.3. Flood-flow Techniques, Geological Survey Water Supply Paper 1543-A. U.S. Government Printing Office, Washington, D.C.
Filliben, J.J. (1975). The probability plot correlation coefficient test for normality. Technometrics 17, 111–117. http://dx.doi.org/10.2307/1268008.
Hosking, J. R. M. & Wallis, J. R. (1997). Regional Frequency Analysis - An Approach Based on L-Moments. Cambridge University Press, Cambridge, UK. ISBN: 13-978-0-521-43045-6.
McCuen, R.H. (1998). Hydrologic Analysis and Design. 2nd Edition. Pretence-Hall. Upper Saddle River, New Jersey. ISBN: 0-13-134958-9.
Naghettini, M. & Pinto, E.J.A. (2007). Hidrologia Estatística. CPRM. Belo Horizonte. ISBN: 978-85-7499-023-1.
Ponce, V.M. (1989). Engineering Hydrology: Principles and Practices. Pretence-Hall, Englewood Cliffs, New Jersey. NY. ISBN: 0-13-277831-9.
Tung, Y.K.; Yen, B.C.; Melching, C.S. (2006). Hydrosystems Enineering Reliability Assessment and Risk Analysis.. McGraw-Hill, New York. ISBN: 0-07-145158-7.