Comprehensive Exam - Professor Baki Question

 

By: Cássio Rampinelli

Date: May 20, 2020

_____________________________________________________________________

QUESTION: An ungauged creek/stream, named Mossom Creek has a relatively small watershed with a drainage area of 3.188 km² , has no hydrological gauging station. However, there are nearby total of seven watersheds where gauging stations are available (Table 1), and their time series flow data are in a separate Excel file (Time Series Historical Peak flow data all Gauging Stations). Therefore, a regional flood frequency analysis needs to be conducted using the hydrological information available at nearby watersheds having gauging stations to estimate the design flow events corresponding to the 25 (Q25), 50 (Q50), 100 (Q100), and 200 (Q200) year return periods for the Mossom Creek system.

 

SOLUTION

______________________________________________________________________________

1.INTRODUCTION

Regional analysis applied to the hydrologic context aims to develop mathematical relations so that information from gauged catchments can be transferred to neighboring ungauged catchments (McCuen, 1998). A branch of hydrologic regional analysis is the regional frequency analysis. The regional frequency analysis makes use of widespread spatial data of a specific variable of interest (e.g., discharges and precipitation) within a region deemed homogeneous (in respect to statistical or physical behaviors) to estimate the quantiles for different exceedance probabilities in an ungauged spot in the study region (Naghettini & Pinto, 2007).

There are several technical guidelines to support the regional frequency analysis in the hydrologic community, among the pioneer’s guidance documents one may refer to the Flood-frequency analysis, Manual of Hydrology: Part 3 (Dalrymple, 1960). To name a few of the available approaches, following are the highlighted ones by Naghettini & Pinto (2007):

  • Methods that aim to regionalize the quantiles associated with a pre-defined risk;
  • Methods that aim to regionalize the parameters of the probability distribution functions (pdfs), and
  • Methods that aim to regionalize standardized quantile curves or to generate a regional frequency curve named index-flood methods.

Given the time constraints to explore and compare all the mentioned approaches, for this question we will focus on the index-flood method proposed in Dalrymple (1960), and also described in Naghettini & Pinto (2007), Ponce (1989), and McCuen (1998).

The following sections describe the applied methodology to estimate the design flow events corresponding to the 25 (Q25), 50 (Q50), 100 (Q100), and 200 (Q200) year return periods for the Mossom Creek system.

2.DATA ASSESSMENT

The available data consisted of annual peak discharge time series for each catchment, and their respective drainage areas (DA), as indicated in Table 1, presented before. The excel data were converted to individual .txt files for each catchment. Data were read individually and the time series plots were generated for each catchment by the R script code presented as follows.

 

#Clean R memory
rm(list=ls(all=TRUE))

#Setting working directory
setwd("C:/Users/cassi/OneDrive/ComprehensiveExam_CassioRampinelli/ProfessorBaki/data")

#Save files names inside the working directory
filesNames<-dir()

#Create a variable defined as a list to store all data
data<-0
data<-as.list(data)

#Loop to read all data files automatically
for(i in 1:length(filesNames)){
  
  data[[i]]<-read.table(file=filesNames[i],header=T,sep="\t")
  
}

#Vector to store the drainage area
dA<-c(172,3.63,2.59,37.3,117,47.7,54.7)

#Vector do store catchment ID
stationID<-substr(filesNames,1,nchar(filesNames)-4)


#Loop to plot time series of annual peak discharge for each catchment

for(i in 1:length(filesNames)){
  
  plot(x=data[[i]]$Year,y=data[[i]]$Qpeak.m3.s.,type="l",ylab="Peak Discharge (m³/s)",xlab="Year",main=paste(c(stationID[i]," (DA=",dA[i]," km²)"), collapse = ''))
  points(data[[i]]$Year, data[[i]]$Qpeak.m3.s., pch = 19)
  
}

2.1 Time Base Assumption

The plots indicate the time series ranges are different. Although selecting a time base common to all the stations would be argued as a strategy to eliminate or reduce the effect of variability with time, this would also restrict or eliminate information captured by catchments with long timeline records.

For instance, an approach like that would be limited to the use of 21 years of data for all catchments, since this is the maximum period covered by catchment 08GA065 (the one with less data). If we consider the records for catchment 08GA010, that covers 97 years, this procedure will result in eliminating 76 years of flood records for this station. For the sake of simplicity, in this question we will use all available records for all catchments to extract all data information. However, additional analysis might be performed in further studies to evaluate the impact of potential variability in the time length of the adopted series.

3.LOCAL FREQUENCY ANALYSIS

Before proceeding with the regional analysis, a local frequency analysis for each catchment will be performed individually considering annual flow time series. The goal will be selecting a suitable distribution to use in the regional analysis. Among several statistical distributions available, here three frequently used distributions for extreme values will be tested: Generalized Extreme Values-GEV, Gumbel (Extreme-value Type I), and Pearson Type III-P3.

For each of the distributions listed above the form of the probability density function (pdf)\(f(x)\), the cumulative distribution function (cdf) \(F(x)\), and the quantile function \(x(F)\) are given. The equations are listed in Hosking & Wallis (1997).

3.1 Generalized Extreme Values (GEV)

The pdf for the GEV is given by :

\[ f(x)=\frac{1}{\alpha}t(x)^{k+1}exp(-t(x)) \] with

\[ t(x)=\Bigg( 1+k\Big(\frac{x-\xi}{\alpha}\Big)\Bigg)^{-1/k},for \ k \neq0 \]

\[ t(x)= exp\Bigg[-\Big(\frac{x-\xi}{\alpha}\Big)\Bigg],for \ k =0 \]

The cdf is given by:

\[ F(x)=exp\Big[-t(x)\Big] \]

The quantile function is given by:

\[ x(F)=\xi +\frac{\alpha}{k} (1-(-log (F))^k) \] where \(\xi\), \(\alpha\), and \(k\) are location, scale, and shape parameters respectively, and \(x\) represents the discharge (associated with a return period).

3.2 Gumbel distribution (Extreme-value Type I)

The pdf for the Type I extreme-value distribution (Gumbel) for maximum values is given by :

\[ f(x)=\frac{1}{\alpha}exp \Bigg[-\frac{x-\xi}{\alpha}-exp\Bigg(-\frac{x-\xi}{\alpha}\Bigg)\Bigg] \]

The cdf is given by:

\[ F(x)=exp\Bigg[-exp\Bigg(-\frac{x-\xi}{\alpha}\Bigg)\Bigg] \]

The quantile function is given by:

\[ x(F)=\xi -\alpha \ log(-log (F)) \]

In which \(\xi\), and \(\alpha\) are the location, and scale parameters respectively, and \(x\) represents the discharge (associated with a return period).

3.3 Pearson Type III Distribution (P3)

The P3 distribution has the following pdf:

\[ f(x)=\frac{(x-\xi)^{(\alpha-1)}\ exp(-(x-\xi)/\beta)}{\beta^\alpha\Gamma(\alpha)} \]

The cdf is given by:

\[ F(x)=\frac{G\Bigg(\alpha, \frac{x-\xi}{\beta}\Bigg)}{\Gamma(\alpha)} \]

The quantile function \(x(F)\) has no explicit analytical form.

For \(\gamma \neq0\), let \(\alpha=4/\gamma^2\), \(\beta=\frac{\sigma}2|\gamma|\), and \(\xi=\mu-\frac{2\sigma}{\gamma}\). If \(\gamma>0\), then the range of \(x\) is \(\xi \leq x < \infty\).

With \(\mu\), \(\sigma\), and \(\gamma\) the location, scale and shape parameters respectively, \(x\) represents the discharge (associated with a return period), and the incomplete gamma function given by:

\[ G\Bigg(\alpha, x\Bigg)=\int_0^xt^{\alpha-1}exp(-t)dt \]

3.4 Estimation of Distributional Parameters

The population parameters (location, scale, and shape) must be estimated. The Method of Moments, Method of L-Moments or the Maximum-Likelihood Method might be applied. Details about each method can be found in Chin (2014). Here, for the parameter estimates, a function from the R package (lmon) will be applied. This function uses the Method of L-Moments.

Next, the R code developed to fit the three distributions to all catchments is shown with comments.

Plots of the empirical (data) x fitted distributions are presented. For each catchment two plots are shown: cdf x Discharge, and Return Period X Discharge.

#Package with the probabilities funtions
library(lmom)

#Matrix to store GEV fitted parameters
GEV.parameters<-matrix(NA,nrow=7,ncol=3)
colnames(GEV.parameters)=c("xi","alpha","k")
rownames(GEV.parameters)=stationID


#Matrix to store Gumbel fitted parameters
Gumbel.parameters<-matrix(NA,nrow=7,ncol=2)
colnames(Gumbel.parameters)=c("xi","alpha")
rownames(Gumbel.parameters)=stationID


#Matrix to store P3 fitted parameters
P3.parameters<-matrix(NA,nrow=7,ncol=3)
colnames(P3.parameters)=c("mu","sigma","gamma")
rownames(P3.parameters)=stationID


#Loop to process frequency analysis to all catchments
for(i in 1:length(filesNames)){
  
  
  
  #Sorting extreme values in decreasing order
  sorted.maximas<-sort(data[[i]]$Qpeak.m3.s.,decreasing=T)
  
  #Computing the empirical probabilities
  p<-(c(1:length(sorted.maximas)))/(length(sorted.maximas)+1)
  
  #Computing the empirical return period
  tr<-1/p
  
  #Computing sample L-moments
  lmon<-samlmu(data[[i]]$Qpeak.m3.s.)
  
  
  ############################################
  #Fitting parameters for GEV distribution
  ############################################
  para.GEV<-pelgev(lmon)
  GEV.parameters[i,]=para.GEV
  
  
  ############################################
  #Fitting parameters for Gumbel distribution
  ############################################
  para.Gumbel<-pelgum(lmon)
  Gumbel.parameters[i,]=para.Gumbel
  
  ############################################
  #Fitting parameters for P3 distribution
  ############################################
  para.P3<-pelpe3(lmon)
  P3.parameters[i,]=para.P3
  
  ##############################################
  #Plot fitted cdf x empirical cdf
  ##############################################
  
  plot(1-p,sorted.maximas,main=stationID[i],pch=19,cex.axis=1.3,cex.lab=1.3,xlab="",ylab="")
  mtext(side=1, line=2, "cdf", font=1,cex=1.3)
  mtext(side=2, line=2, expression(Discharge ~ (m^3/s)), font=1, cex=1.3)
  
  #Pearson type 3 fitting
  lines(cdfpe3(sorted.maximas,para.P3),sorted.maximas,col="red",lty=1,lwd=2)
  
  
  #GEV fitting
  lines(cdfgev(sorted.maximas,para.GEV),sorted.maximas,col="black",lty=4,lwd=2)
  
  
  #Gumbel fitting
  lines(cdfgum(sorted.maximas,para.Gumbel),sorted.maximas,col="blue",lty=2,lwd=2)
  
  grid()
  #Legend
  legend("topleft", legend=c("Pearson Type 3", "Gumbel","GEV","Observed"),
         col=c("red", "blue","black","black"), lty=c(1,4,2,NA),pch=c(NA,NA,NA,19), cex=1,lwd=rep(2,4))
  
  
  ##############################################
  #Return Period x Discharge
  ##############################################
  
  #Empirical data
  plot(tr,sorted.maximas,main=stationID[i],xlab="",ylab="",ylim=c(min(sorted.maximas),max(sorted.maximas)*1.5),xlim=c(0,200),pch=19,cex.axis=1.3,cex.main=1.3,cex.lab=1.3)
  mtext(side=1, line=2, "Return Period (year)", font=2,cex=1.3)
  mtext(side=2, line=3, "Discharge (m³/s)", font=2, cex=1.3)
  
  #Return Periods
  returnPeriods<-c(2.33,5,10,25,50,100,200)
  
  #Associated Probabilities
  probabilities<-1-1/returnPeriods
  
  #Pearson Type III
  lines(returnPeriods,quape3(probabilities, para.P3),lty=1,col="red",lwd=2)
  
  #GEV
  lines(returnPeriods,quagev(probabilities, para.GEV),lty=4,col="black",lwd=2)
  
  #Gumbel
  lines(returnPeriods,quagum(probabilities, para.Gumbel),lty=2,col="blue",lwd=2)
  
  grid()
  
  legend("topleft", legend=c("Pearson Type 3", "Gumbel","GEV","Observado"),
         col=c("red", "blue","black","black"), lty=c(1,4,2,NA),pch=c(NA,NA,NA,19), cex=1,lwd=rep(2,4))
  
  
  
}

The fitted parameters for all catchments are summarized in the following tables for each assessed distribution.

Table 1. Catchment IDs and fitted GEV Parameters
\(\xi\) \(\alpha\) \(\kappa\)
08GA010 197.326996 60.9088633 0.0718144
08GA061 2.474685 1.0386982 -0.1461842
08GA065 3.202304 0.8993236 -0.1948388
08MH006 36.330789 14.8005734 -0.0188384
08MH058 90.723492 35.0436162 -0.0117970
08MH076 30.435422 11.5952163 -0.2011454
08MH141 65.835591 16.4633580 0.1761987
Table 2. Catchment IDs and fitted Gumbel Parameters
\(\xi\) \(\alpha\)
08GA010 195.383261 57.239298
08GA061 2.548613 1.212163
08GA065 3.289938 1.115478
08MH006 36.458825 15.064942
08MH058 90.912802 35.431784
08MH076 31.606191 14.501914
08MH141 64.584866 14.332942
Table 3. Catchment IDs and fitted Pearson Type III Parameters
\(\mu\) \(\sigma\) \(\gamma\)
08GA010 228.422680 71.601772 0.7598453
08GA061 3.248293 1.612482 1.6078114
08GA065 3.933810 1.514361 1.8093636
08MH006 45.154545 19.223800 1.1037393
08MH058 111.364583 45.130608 1.0766331
08MH076 39.976923 19.743356 1.8359664
08MH141 72.858064 17.687562 0.3773304

3.5 Selection of Distribution Model

Based on the fitting plots from the previous section a visual selection of the most suitable distributions would be difficult since for most of the sites the three tested distributions seem to be reasonably adjusted to the data.

There are several statistical goodness-of-fit procedures to hypothesis testing. The chi-square test, and the Kolmogorov-Smirnov tests are maybe the most well-known examples. However, here we apply other powerful goodness-of-fit criteria known as Probability Plot Correlation Coefficient (PPCC) given its practical convenience.

The PPCC was originally developed by Filliben (1975) for normal distribution and later adjusted for other distribution types. The PPCC represents the correlation coefficient r, between the ordered observations {\({x_1,x_2,...,x_i,...x_N}\)}, and the corresponding teoretical quantiles {\(w_1, w_2,...,w_i,...w_N\)} which are computed by \(w_i=F^{-1}_X(1-q_i)\), and \(q_i\) represents the empirical probability related to \(i\). The formal expression for PPCC statistic is given by:

\[ PPCC=r=\frac{\sum_{i=1}^N(x_{i}-\bar{x})(w_i-\bar{w})}{\sqrt{\sum_{i=1}^N(x_{i}-\bar{x})^2(w_i-\bar{w})^2}} \] in which

\[ \bar{x}=\sum_{i=1}^Nx_{i}/N \] \[ \bar{w}=\sum_{i=1}^Nw_{i}/N \]

Here, for practical purpose, we will select the distribution model that yields the highest value for PPCC (as proposed in Tung et al. 2006 ) rather than comparing it to the critical reference values.

The R code developed to compute the PPCC metric for each catchment based on the three fitted distributions is presented below. The code makes use of the package ppcc to compute the PPCC values. At the end of the code lines, a summary table with the computed PPCC values is presented indicating the selected distribution.

##Package to compute PPCC metric
library(ppcc)

#Matrix to store ppcc metrics for all distributions in all catchments
ppccValues<-matrix(NA,nrow=7,ncol=3)
colnames(ppccValues)=c("GEV","Gumbel","P3")
rownames(ppccValues)=stationID

#Matrix to store p-values  for all distributions in all catchments
pValues<-matrix(NA,nrow=7,ncol=3)
colnames(pValues)=c("GEV","Gumbel","P3")
rownames(pValues)=stationID

#Loop to compute ppcc and pvalues along catchments
for(i in 1:length(filesNames)){
  
  #Computing ppcc coefficient for GEV
  ppccGEV=ppccTest(data[[i]]$Qpeak.m3.s., "qgev",shape=GEV.parameters[i,3])
  ppccValues[i,1]=ppccGEV$statistic
  pValues[i,1]=ppccGEV$p.value
  
  #Computing ppcc coefficient for Gumbel
  ppccGumbel=ppccTest(data[[i]]$Qpeak.m3.s., "qgumbel")
  ppccValues[i,2]=ppccGumbel$statistic
  pValues[i,2]=ppccGumbel$p.value
  
  
  #Computing ppcc coefficient for P3
  ppccP3=ppccTest(data[[i]]$Qpeak.m3.s., "qpearson3",shape=P3.parameters[i,3])
  ppccValues[i,3]=ppccP3$statistic
  pValues[i,3]=ppccP3$p.value
  
  
  
}

#Selecting the most suitable distribution
selected=0
for(i in 1:length(filesNames)){
  
  temp=which(ppccValues[i,]==max(ppccValues[i,]))
  
  if(temp==1){selected[i]="GEV"}
  else{
    
      if(temp==2){selected[i]="Gumbel"}
    
    else(selected[i]="P3")
    
  }
  
  
}

#Final Tabble with ppcc values and the selected distribution
Selected.distribution=cbind(ppccValues,selected)
Table 4. Catchment IDs and the PPCC values for each distribution. Selected distribution is the one with largest PPCC value
GEV Gumbel P3 Selected Distribution
08GA010 0.99548927151705 0.994377599208432 0.995570851137933 P3
08GA061 0.992062881537118 0.982719566817776 0.988939542659771 GEV
08GA065 0.968038881587764 0.941387131654744 0.955484645569975 GEV
08MH006 0.992492810714673 0.9927431495946 0.994893903035613 P3
08MH058 0.97579000031266 0.975826030556301 0.980726892001789 P3
08MH076 0.972717159789732 0.978644474004069 0.986489253266378 P3
08MH141 0.991888840111498 0.976573641282855 0.991290305595479 GEV

4.REGIONAL ANALYSIS

As mentioned in the Introduction we have adopted the Index-flood method to perform the region analysis. The index-flood method was chosen based on the following reasons: (a) the method only requires calibration of one index equation, (b) the approach is computationally simple to apply, and (c) the index ratios assume consistency across return periods (McCuen, 1998).

4.1 Index-Flood Method

The procedure to develop a regional frequency curve by the index-flood method is summarized in the following steps adapted from Ponce (1989).

  • Assemble the records (annual exceedance or annual maxima series) for the catchments of interest.

  • For each \(i^{th}\) station, rank the records in descending order and compute return periods fitting a statistical distribution suitable to the local data such as GEV, Gumbel, etc.

  • For each \(i^{th}\) station, determine the mean annual flood, that is, the peak flow corresponding to the 2.33 return period.

Note: we have used the selected distributions indicated in Table 4 for each catchment. The R code implemented to compute the discharges associated to different return periods is presented below. Table 5 summarizes the results for these steps.

################################################
#Computing fitted discharges for return periods
##################################################

fitted.discharges=matrix(NA,nrow=7,ncol=length(returnPeriods))
colnames(fitted.discharges)=c("Q2.33","Q5","Q10","Q25","Q50","Q100","Q200")
rownames(fitted.discharges)=stationID

fitted.discharges[1,]=quape3(probabilities, P3.parameters[1,])
fitted.discharges[2,]=quagev(probabilities, GEV.parameters[2,])
fitted.discharges[3,]=quagev(probabilities, GEV.parameters[3,])
fitted.discharges[4,]=quape3(probabilities, P3.parameters[4,])
fitted.discharges[5,]=quape3(probabilities, P3.parameters[5,])
fitted.discharges[6,]=quape3(probabilities, P3.parameters[6,])
fitted.discharges[7,]=quagev(probabilities, GEV.parameters[7,])


fitted.discharges %>%
  kbl(caption = "Table 5. Catchments ID and their respective discharges for different return periods",col.names = c("Q2.33","Q5","Q10","Q25","Q50","Q100","Q200")) %>%
  kable_classic(full_width = F, html_font = "Cambria")
Table 5. Catchments ID and their respective discharges for different return periods
Q2.33 Q5 Q10 Q25 Q50 Q100 Q200
08GA010 232.141807 284.560032 324.020377 370.380533 402.739541 433.494605 463.02993
08GA061 3.101811 4.216678 5.242480 6.710354 7.938584 9.289257 10.77933
08GA065 3.753105 4.769036 5.742416 7.194374 8.458740 9.897372 11.53922
08MH006 45.025790 59.474385 70.932096 84.881854 94.874516 104.536355 113.94725
08MH058 111.270888 145.134307 171.879423 204.354099 227.572413 249.994253 271.81178
08MH076 37.440124 52.561128 65.942643 83.378143 96.448777 109.449668 122.39808
08MH141 74.891639 87.535915 96.420816 106.090837 112.289952 117.728460 122.52093
  • Choose several frequencies, and for each \(i^{th}\) station and \(j^{th}\) frequency (or Return Period) calculate the peak flow ratio, i.e, the ratio of peak flow for the \(j^{th}\) frequency to the mean annual flood.

  • For each \(j^{th}\) frequency, determine the median value of peak flow ratios for all stations, that is , the median peak flow ratio.

Note: The code developed to compute the dimensionless fitted discharges using the Q2.33 as denominator for the index-ratio is presented next. Table 6 shows the results

################################################
#Computing dimensionless discharges for return periods
##################################################

adim.discharges=matrix(NA,nrow=8,ncol=length(returnPeriods))
colnames(adim.discharges)=c("Q2.33/Q2.33","Q5/Q2.33","Q10/Q2.33","Q25/Q2.33","Q50/Q2.33","Q100/Q2.33","Q200/Q2.33")
rownames(adim.discharges)=c(stationID,"Median")

adim.discharges[1,]=fitted.discharges[1,]/fitted.discharges[1,1]
adim.discharges[2,]=fitted.discharges[2,]/fitted.discharges[2,1]
adim.discharges[3,]=fitted.discharges[3,]/fitted.discharges[3,1]
adim.discharges[4,]=fitted.discharges[4,]/fitted.discharges[4,1]
adim.discharges[5,]=fitted.discharges[5,]/fitted.discharges[5,1]
adim.discharges[6,]=fitted.discharges[6,]/fitted.discharges[6,1]
adim.discharges[7,]=fitted.discharges[7,]/fitted.discharges[7,1]
adim.discharges[8,]=c(median(adim.discharges[1:7,1]),median(adim.discharges[1:7,2]),median(adim.discharges[1:7,3]),median(adim.discharges[1:7,4]),median(adim.discharges[1:7,5]),median(adim.discharges[1:7,6]),median(adim.discharges[1:7,7]))



adim.discharges %>%
  kbl(caption = "Table 6. Catchments ID and their respective dimensionless discharges for different return periods",col.names = c("Q2.33/Q2.33","Q5/Q2.33","Q10/Q2.33","Q25/Q2.33","Q50/Q2.33","Q100/Q2.33","Q200/Q2.33")) %>%
  kable_classic(full_width = F, html_font = "Cambria")
Table 6. Catchments ID and their respective dimensionless discharges for different return periods
Q2.33/Q2.33 Q5/Q2.33 Q10/Q2.33 Q25/Q2.33 Q50/Q2.33 Q100/Q2.33 Q200/Q2.33
08GA010 1 1.225803 1.395786 1.595493 1.734886 1.867370 1.994600
08GA061 1 1.359424 1.690135 2.163366 2.559338 2.994785 3.475173
08GA065 1 1.270691 1.530044 1.916912 2.253798 2.637115 3.074579
08MH006 1 1.320896 1.575366 1.885183 2.107115 2.321699 2.530711
08MH058 1 1.304333 1.544693 1.836546 2.045211 2.246717 2.442793
08MH076 1 1.403872 1.761283 2.226973 2.576081 2.923325 3.269169
08MH141 1 1.168834 1.287471 1.416591 1.499366 1.571984 1.635976
Median 1 1.304333 1.544693 1.885183 2.107115 2.321699 2.530711
  • Plot median peak flow ratios versus frequencies on extreme value and draw a line of best fit to obtain a regional flood frequency curve for the given data.

  • Perform a regression analysis and plot the mean annual flood (Q2.33) as a function of the drainage area (DA).
#Regression

#Q2.33 discharge for all catchments
Q2.33=fitted.discharges[,1]

#Regressao linear
regression<-lm(log(Q2.33)~log(dA))

#Coefficients
coeff<-regression$coefficients
y<-log(dA)*coeff[2]+coeff[1]

plot(log(dA),log(Q2.33),xlab="",ylab="",main="Regression Analysis- Drainage Area X Q2.33")
lines(log(dA),y,lty=1,col="red")
eq <- paste0("ln(Q2.33)= ", coeff[1],
             ifelse(sign(coeff[2])==1, " + ", " - "), abs(coeff[2]), " ln(DA) ")

mtext(side=1, line=3, expression('ln(DA[' ~ km^2 ~'])'), font=2,cex=1.3)
mtext(side=2, line=2, expression('ln(Q2.33[' ~ m^3/s ~'])'), font=2, cex=1.3)

## printing of the equation
mtext(eq, 3, line=-2)

4.2 Testing Regional Homogeneity Assumption

Regional analysis requires the definition of regions deemed homogeneous. Gauging stations within homogeneous areas are selected to proceed with the regional frequency analysis. This is a controversial issue in hydrology and is affected by subjective judgments. In general, the type of data used to define these regions might be local statistics or local characteristics (Naghettini & Pinto, 2007). Examples of local statistics are metrics such as skewness and kurtosis for the available sample data of the gauges. Local characteristics encompass hydrometeorological variables, hydrologic signatures, or other physical attributes such as coordinates, and altitude, to name a few.

Among the several approaches to define the gauging stations that are part of a homogeneous region one might list: geographical proximity (nearby catchments are grouped based on an arbitrary proximity criterion), subjective grouping (catchments are arbitrarily grouped based on criteria such as similar climate features, topography , rain patters, etc.), objective grouping (a metric and a threshold are specified to group similar catchments such as likelihood ratios, skewness, kurtoses, etc.), cluster analysis, to name a few.

In this question, we have used the test of hydrologic homogeneity proposed in Dalrymple (1960), although other tests might also be applied and provide different conclusions.

The test is also described in Ponce (1989) as follows:

  • For each \(i^{th}\) station, use its frequency curve to determine the 2.33 year and the 10 year floods.

  • For each \(i^{th}\) station, calculate the 10 year peak flow ratio, i.e., the ratio of the 10 year flood to the 2.33 year flood.

  • Calculate the average of the 10 year peak flow ratios for all stations.

  • For each \(i^{th}\) station, multiply the 2.33 year flood by the average 10 year peak flow ratio to obtain an adjusted 10 year peak flow.

  • For each \(i^{th}\) station, use its frequency curve to determine the return period \(T_i\) for the adjusted 10 year peak flow.

  • For each \(i^{th}\) station, plot the return period \(T_i\) versus the length of the record \(n\), in years. Points located within the confidence limits are considered to be hydrologically homogeneous. Points lying outside of the solid lines should not be used in the calculation of the median peak flow ratio.

The described procedure was implemented in R. Code lines are presented below. The assessment has shown that Catchment 08MH141 lies outside the confidence limits (red solid lines in the Plot). Therefore, this Catchment should not be used in the analysis based on this method.

#Test of hydrologic homogeneity

#Adjusted Q10
adustedQ10=fitted.discharges[,1]*mean(adim.discharges[1:7,3])
adustedQ10
##    08GA010    08GA061    08GA065    08MH006    08MH058    08MH076    08MH141 
## 357.656849   4.778907   5.782344  69.370453 171.433125  57.683349 115.384248
#Computing Ti for all catchments
accum<-0
fitted.tr10<-0

accum[1]<-cdfpe3(adustedQ10[1],P3.parameters[1,])
fitted.tr10[1]<-1/(1-accum[1])

accum[2]<-cdfgev(adustedQ10[2],GEV.parameters[2,])
fitted.tr10[2]<-1/(1-accum[2])

accum[3]<-cdfgev(adustedQ10[3],GEV.parameters[3,])
fitted.tr10[3]<-1/(1-accum[3])

accum[4]<-cdfpe3(adustedQ10[4],P3.parameters[4,])
fitted.tr10[4]<-1/(1-accum[4])

accum[5]<-cdfpe3(adustedQ10[5],P3.parameters[5,])
fitted.tr10[5]<-1/(1-accum[5])

accum[6]<-cdfpe3(adustedQ10[6],P3.parameters[6,])
fitted.tr10[6]<-1/(1-accum[6])

accum[7]<-cdfgev(adustedQ10[7],GEV.parameters[7,])
fitted.tr10[7]<-1/(1-accum[7])


ajustedTR10<-c(fitted.tr10[1],fitted.tr10[2],fitted.tr10[3],fitted.tr10[4],fitted.tr10[5],fitted.tr10[6],fitted.tr10[7])
len<-c(length(data[[1]]$Qpeak.m3.s.),length(data[[2]]$Qpeak.m3.s.),length(data[[3]]$Qpeak.m3.s.),length(data[[4]]$Qpeak.m3.s.),length(data[[5]]$Qpeak.m3.s.),length(data[[6]]$Qpeak.m3.s.),length(data[[7]]$Qpeak.m3.s.))


#Plot
plot(x=len,y=ajustedTR10,xlab="Length of record (years)",ylab="Return Period (Years)",main="Homogeneity Test Chart",log="y",xlim=c(5,100),ylim = c(1.2,150))


#Confidence limits based in Dalrymple (1960)
xLimits<-c(5,7.5,10,12.5,15,17.5,20,25,30,35,40,45,50,60,80,100)
upperLimity<-c(150.5,95,70,60,50,45,40,33,30,28,25,23,21,20.3,19.9,19.99)
lowerLimity<-c(1.21,1.5,1.75,2.25,2.4,2.6,2.7,2.85,3.1,3.6,3.95,4.1,4.5,4.8,4.95,4.99)

lines(xLimits,upperLimity,col="red")
lines(xLimits,lowerLimity,col="red")

#Removed station
text(31.5, 75,stationID[7],
     cex=0.8, pos=3,col="red") 

4.3 Design Flow Events for the Mossom Creek System

To compute the design flow events for the Mossom Creek, the regional analysis described in the previous section was updated following the same described steps after removing Catchment 08MH141 from the analysis, since it was considered not part of a homogeneous area.

This procedure resulted in the following Regional Fit, and Regression Analysis.

########################################################################
########################################################################
#####Updated regional analysis
########################################################################

################################################
#Computing dimensionless discharges for return periods removing station 08MH141
##################################################

adim.discharges.final=matrix(NA,nrow=7,ncol=length(returnPeriods))
colnames(adim.discharges.final)=c("Q2.33/Q2.33","Q5/Q2.33","Q10/Q2.33","Q25/Q2.33","Q50/Q2.33","Q100/Q2.33","Q200/Q2.33")
rownames(adim.discharges.final)=c(stationID[-7],"Median")

adim.discharges.final[1,]=fitted.discharges[1,]/fitted.discharges[1,1]
adim.discharges.final[2,]=fitted.discharges[2,]/fitted.discharges[2,1]
adim.discharges.final[3,]=fitted.discharges[3,]/fitted.discharges[3,1]
adim.discharges.final[4,]=fitted.discharges[4,]/fitted.discharges[4,1]
adim.discharges.final[5,]=fitted.discharges[5,]/fitted.discharges[5,1]
adim.discharges.final[6,]=fitted.discharges[6,]/fitted.discharges[6,1]
adim.discharges.final[7,]=c(median(adim.discharges.final[1:6,1]),median(adim.discharges.final[1:6,2]),median(adim.discharges.final[1:6,3]),median(adim.discharges.final[1:6,4]),median(adim.discharges.final[1:6,5]),median(adim.discharges.final[1:6,6]),median(adim.discharges.final[1:6,7]))



adim.discharges.final %>%
  kbl(caption = "Table 7. Catchments ID and their respective dimensionless discharges for different return periods after removing 08MH141",col.names = c("Q2.33/Q2.33","Q5/Q2.33","Q10/Q2.33","Q25/Q2.33","Q50/Q2.33","Q100/Q2.33","Q200/Q2.33")) %>%
  kable_classic(full_width = F, html_font = "Cambria")
Table 7. Catchments ID and their respective dimensionless discharges for different return periods after removing 08MH141
Q2.33/Q2.33 Q5/Q2.33 Q10/Q2.33 Q25/Q2.33 Q50/Q2.33 Q100/Q2.33 Q200/Q2.33
08GA010 1 1.225803 1.395786 1.595493 1.734886 1.867370 1.994600
08GA061 1 1.359424 1.690135 2.163366 2.559338 2.994785 3.475173
08GA065 1 1.270691 1.530044 1.916912 2.253798 2.637115 3.074579
08MH006 1 1.320896 1.575366 1.885183 2.107115 2.321699 2.530711
08MH058 1 1.304333 1.544693 1.836546 2.045211 2.246717 2.442793
08MH076 1 1.403872 1.761283 2.226973 2.576081 2.923325 3.269169
Median 1 1.312615 1.560030 1.901048 2.180456 2.479408 2.802645
#Plot Regional Fit-Index Flood
plot(x=returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[1,2:length(returnPeriods)],xlab="",ylab="",ylim=c(1,4),pch=0,cex.axis=1.3,cex.main=1.3,cex.lab=1.3,log="x",main="Index Flood - Regional Fit-Final")
points(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[2,2:length(returnPeriods)],pch=2,col="black")
points(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[3,2:length(returnPeriods)],pch=3,col="black")
points(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[4,2:length(returnPeriods)],pch=11,col="black")
points(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[5,2:length(returnPeriods)],pch=15,col="blue")
points(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[6,2:length(returnPeriods)],pch=16,col="red")
lines(returnPeriods[2:length(returnPeriods)],y=adim.discharges.final[7,2:length(returnPeriods)],lty=1,lwd=2,col="red")

mtext(side=1, line=2, "Return Period (year)", font=2,cex=1.3)
mtext(side=2, line=3, "Q/Q2.33", font=2, cex=1.3)

grid(nx=NULL, ny=NULL, col= "lightgray", lty="dotted", equilogs=FALSE)

legend("topleft", legend=c(stationID[-7],"Median (Regional Fit)"),
       col=c("black","black","black","black","blue","red","red"), lty=c(NA,NA,NA,NA,NA,NA,1),lwd=c(NA,NA,NA,NA,NA,NA,2),pch=c(0,2,3,11,15,16,NA), cex=1)

#Regression after removing Catchment 08MH141

#Q2.33 discharge for all catchments
#Q2.33 discharge for all catchments
Q2.33=fitted.discharges[1:6,1]


#Linear Regression
regression<-lm(log(Q2.33)~log(dA[1:6]))

#Coefficients
coeff<-regression$coefficients
y<-log(dA[1:6])*coeff[2]+coeff[1]

plot(log(dA[1:6]),log(Q2.33),xlab="",ylab="",main="Regression Analysis - Final - Drainage Area X Q2.33")
lines(log(dA[1:6]),y,lty=1,col="red")
## sign check to avoid having plus followed by minus for negative coefficients
eq <- paste0("ln(Q2.33)= ", coeff[1],
             ifelse(sign(coeff[2])==1, " + ", " - "), abs(coeff[2]), " ln(DA) ")

mtext(side=1, line=3, expression('ln(DA[' ~ km^2 ~'])'), font=2,cex=1.3)
mtext(side=2, line=2, expression('ln(Q2.33[' ~ m^3/s ~'])'), font=2, cex=1.3)


## printing of the equation
mtext(eq, 3, line=-2)

To regionalize the design flow events for the Mossom Creek we first compute the Q2.33 discharge at the study site using the Mossom Creek drainage area (3.188 km²) and the final regression analysis equation.

\[ ln(Q_{2.33})=0.09769367087988+0.990996382591051 \cdot ln(DA) \] \[ ln(Q_{2.33})=0.09769367087988+0.990996382591051 \cdot ln(3.188) \] \[ ln(Q_{2.33})=4.150293 \] \[ Q_{2.33}=e^{4.150293} \] \[ Q_{2.33}=63.45 \ m^3/s \]

Now the dimensionless regional discharges for the different return periods from the regional fit (represented by the median in Table 7) is multiplied by the Q2.33 discharge computed to Mossom Creek resulting in the design flow events for the Mossom Creek, summarized in Table 8.

#Computing Q2.33 at MossomCreek using regression analysis
lnQ2.33=coeff[2]+coeff[2]*3.188
Q2.33=exp(lnQ2.33)


MossomCreek=adim.discharges.final[7,4:7]*Q2.33
names(MossomCreek)=c("Q25","Q50","Q100","Q200")

MossomCreek=rbind(c("Q25","Q50","Q100","Q200"),MossomCreek)

MossomCreek %>%
  kbl(caption = "Table 8. Design flows at Mossom Creek (m³/s)",col.names = c("","","","")) %>%
  kable_classic(full_width = F, html_font = "Cambria")
Table 8. Design flows at Mossom Creek (m³/s)
Q25 Q50 Q100 Q200
MossomCreek 120.626381183791 138.355578475596 157.324799272791 177.83504230504

5.CONCLUSIONS

We have computed the design flow events corresponding to 25, 50, 100, and 200 year return periods for the Mossom Creek system by applying a regionalization analysis based on the index-flood method. There are different regionalization techniques that might bring different outputs. For this assignment we have focus only on the index-flood method, however, it would be a good designing practice evaluating the sensitivity of the design flow events to different regionalization methods depending on the level of significance and risk tolerance assumed for the project.

Similarly, there are also different homogeneity test strategies that might be complementary evaluated. For the sake of simplicity, we have used the 10 year flow ratio. Although some studies have shown that homogeneity may be assumed on the basis of the 10 year peak flow ratio (Ponce, 1998), the individual frequency curves may show wide and sometimes systematic differences at longer run periods.

REFERENCES

Chin, D.A. (2014). Water Resources Engineering. 3rd Edition. Pearson Education, Inc. upper Saddle River, New Jersey. ISBN:0-13-283321-2.

Dalrymple, T. (1960). Flood-frequency analyses, Manual of Hydrology: Part.3. Flood-flow Techniques, Geological Survey Water Supply Paper 1543-A. U.S. Government Printing Office, Washington, D.C.

Filliben, J.J. (1975). The probability plot correlation coefficient test for normality. Technometrics 17, 111–117. http://dx.doi.org/10.2307/1268008.

Hosking, J. R. M. & Wallis, J. R. (1997). Regional Frequency Analysis - An Approach Based on L-Moments. Cambridge University Press, Cambridge, UK. ISBN: 13-978-0-521-43045-6.

McCuen, R.H. (1998). Hydrologic Analysis and Design. 2nd Edition. Pretence-Hall. Upper Saddle River, New Jersey. ISBN: 0-13-134958-9.

Naghettini, M. & Pinto, E.J.A. (2007). Hidrologia Estatística. CPRM. Belo Horizonte. ISBN: 978-85-7499-023-1.

Ponce, V.M. (1989). Engineering Hydrology: Principles and Practices. Pretence-Hall, Englewood Cliffs, New Jersey. NY. ISBN: 0-13-277831-9.

Tung, Y.K.; Yen, B.C.; Melching, C.S. (2006). Hydrosystems Enineering Reliability Assessment and Risk Analysis.. McGraw-Hill, New York. ISBN: 0-07-145158-7.