CE 470: Stream Riparian Systems and Fluvial Morphology

Name: Cassio Rampinelli

Date: September 15, 2019

Assignment: Homework 1

Note: This script was done in R programming language

______________________________________________________________________________

1.PURPOSE

This assignment is intended to evaluate hydrologic data from a flow gauge in the East Branch of the Ausable River at Ausable Froks, NY (Site Number: 04275000). The analysis was performed in order to achieve the following goals:

Develop a flow duration curve for the daily mean flow values;
Develop flood frequency curves using both the Gumbel and Log Pearson Distributions for:
- Annual Duration Series;
- Partial Duration Series;
Develop a stage-discharge rating curve using both the measured and gauge data and fit a regression curve to each set of data.

The description of the methodology , data collection, as well as all calculations are presented in details in the following sections.

2.METHODS

2.1 Site Location and Data

The study area is located at Au Sable Forks, Essex County, NY. The USGS flow gage number is 04275000, positioned at the East branch Ausable River (Latitude 44°26’14.6“, Longitude 73°40’51.5”). The drainage area is 198 \(mi^2\). Two data set types were employed. One consists of daily mean discharge and the other refers to the measured discharges with the related water levels. The former was employed to develop the flow duration curve and the frequency analysis. The latter was used to stablish the rating curve. Both data sets were downloaded from the USGS web site as a txt table format and loaded as a data frame format in R programming language. Figure 1 shows a layout of the study site.

Figure 2.1. Layout of the study area

2.1.1 Daily mean discharge data set

The daily mean discharge data set covers the period between 1924/09/05 to 2019/09/10, with a gap between 1995/09/30 to 2016/03/08.Figure 2 presents the available daily discharge time series.

Figure 2.2. Daily discharge record at Flow Gage 04275000, covering 1924-09-05 to 2019-09-10

Some data manipulation was employed to better accommodate the data by removing unnecessary columns and lines until it resulted in only two columns in which the first column refers to date and the second column contains the discharges. The R code used in this manipulation is presented bellow with some comments. The initial lines of the resultant table is also presented.

#Clear R memory
rm(list=ls()) 

#Load packae to handle time series
library(tseries)

#Load txt file and do manipulations/adjustments
dataSet=read.table(file="dataSet.txt",sep="\t")
dataSet=as.matrix(dataSet)
dataSet=dataSet[,-(1:2)]
dataSet=dataSet[,-3]
dataSet=dataSet[-(1:2),]

#Provide names for the two remaining columns
colnames(dataSet)=c("date","discharge_cfs")

#Provide a brief overview from the initial rows of the columns
head(dataSet)

##      date         discharge_cfs
## [1,] "1924-09-05" "95.0"       
## [2,] "1924-09-06" "91.0"       
## [3,] "1924-09-07" "99.0"       
## [4,] "1924-09-08" "91.0"       
## [5,] "1924-09-09" "83.0"       
## [6,] "1924-09-10" "445"

Finally, a time series object named discharges.ts was created to facilitate the next step computations.The gaps were filled with “NA” (Not a Number) notation. A replot from Figure 2.2 is presented considering the discharges.ts object, as well as a summary of the main statistics.

#Save only the discharges from the dataSet table in a variable named discharges
discharges<-dataSet[,2]
#Convert the variables as numeric
discharges=as.numeric(discharges)

#Create a time series object to be used next
discharges.ts=ts(discharges,frequency=365.25,start=c(1924,9,5))

#Plotting the time series
plot(discharges.ts,main="Flow time series", ylab="Discharges(cms)")

#Present a summary of the main statistics
summary(discharges.ts)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    20.0    85.0   151.0   316.8   330.0  8200.0    7466

2.1.2 Daily mean discharge data set

The following routine show how the original stage-discharge data was loaded. The data consists of water levels (stage) and the associated discharges. After loading the original data, the initial lines of the resultant table is shown as well as a scatter plot. In the Rating Curve section this original data will be manipulated to stablish a rating curve.

#Loading the original data
rc.dataSet=read.table(file="ratingCurveData.txt",sep="\t",header=T)

#Showing the initial rows of the resultant table
head(rc.dataSet)

##   flow_cfs height_ft
## 1     70.1      0.65
## 2    289.0     17.75
## 3     57.5     18.65
## 4     50.3      0.60
## 5     73.4      0.65
## 6     52.6      1.06

#Scatter plot of the data
plot(x=rc.dataSet[,1],y=rc.dataSet[,2],xlab="Discharge (cfs)",ylab="Stage (ft)",main="Original Stage-Discharge Data")

2.2 Flow Duration Curve

The flow-duration curve (FDC) is the relation between the magnitudes of streamflow, q, at a point and the frequency (probability) with which those magnitudes are exceeded over an extended time period [2]. In statistical terms, the FDC is a graph plotting the magnitudes, q, of the variable Q (average daily flow, y-axis) vs. the fraction of time, \(EP_Q(q)\) that Q exceeds any specified value Q = q(x-axis). \(EP_Q(q)\) is called the exceedance probability (or exceedance frequency), and it is related to the pth quantile of streamflow, \(q_p\), as

\[ EP_Q(q_p)=1-F_Q(q_p)=Pr{Q>q_p} \] The R code below manipulates the discharges values in order to obtain the flow duration curve. The code is structured according to the following sequence. • Removing NA values from the flow data base • The flow data is sorted in decreasing order • The percentage of time the flow is equaled or less is computed as the order of the flow divided by the total number of observation multiplied by 100; • A data frame is built joining the percentage of time flow is equaled or less than and the associated discharge

#Re-loading discharges removing NA values
flow<-na.omit(discharges)

#Sorting discharges in decreasing order
flow<-sort(flow,decreasing=T)

#Creating a data frame in which x column is hte percent of ie less than a specific time and
#y is the correspondent discharge.
df<-data.frame(x=100/length(flow)*1:length(flow),y=flow)

#Plot
plot(x = df$x, y = df$y, type = "l", log = "y",ylab="Discharge (cfs)",xlab="Percentage of Time Flow is Equaled or Less Than (%)",main="Flow Duration Curve")
grid()

x=df$x
y=df$y

#Table with the flow duration

percentage=c(5,10,20,30,40,50,60,70,80,90,95,99)

vazoes=c(y[which.min(abs(x - 5))],y[which.min(abs(x - 10))],y[which.min(abs(x - 20))],y[which.min(abs(x - 30))],y[which.min(abs(x - 40))],y[which.min(abs(x - 50))],y[which.min(abs(x - 60))],y[which.min(abs(x - 70))],y[which.min(abs(x - 80))],y[which.min(abs(x - 90))],y[which.min(abs(x - 95))],y[which.min(abs(x - 99))])

duration.dataframe<-cbind(percentage,vazoes)


colnames(duration.dataframe)=c("%","Discharge(cfs)")

duration.dataframe

##        % Discharge(cfs)
##  [1,]  5           1150
##  [2,] 10            738
##  [3,] 20            414
##  [4,] 30            273
##  [5,] 40            200
##  [6,] 50            151
##  [7,] 60            120
##  [8,] 70             94
##  [9,] 80             76
## [10,] 90             57
## [11,] 95             46
## [12,] 99             35

2.3 Flood Frequency Analysis

The flood frequency analysis employed both the Gumbel and Log Pearson Distributions for annual and partial duration series. For the annual duration series each year contributes one value to the series. For the partial duration series those events whose magnitude exceeded 2000 cfs were chosen. This threshold was arbitrarily adopted based on the order of magnitude of the lowest floods along all the years.A brief overview of Probability Density Function (PDF) and of the Cumulative Density Function (CDF) for both distributions is introduced. Next, the R code developed to fitting both distributions is shown with comments. Plots confronting the empirical probabilities and recurrence time with the fitting for both distributions are presented.

Gumbel distribution (Extreme-value Type I)

The Probability Density Function (PDF) for the Type I extreme-value distribution (Gumbel) for maximum values is given by the following expression:

\[ f(x)=\frac{1}{\alpha}exp \Bigg[-\frac{x-\xi}{\alpha}-exp\Bigg(-\frac{x-\xi}{\alpha}\Bigg)\Bigg] \]

In which \(x\) represents the discharge (associated with a recurrence time), \(\alpha\) and \(\xi\) are the scale and location parameters respectively.

The Cumulative Distribution Function (CDF) is given by:

\[ F(x)=exp\Bigg[-exp\Bigg(-\frac{x-\xi}{\alpha}\Bigg)\Bigg] \] The population parameters (\(\alpha\), \(\xi\)) must be estimated. To do so, the Method of Moments, Method of L-Moments or the Maximum-Likelihood Method can be applied. The details about each method is described in [1]. For the parameters estimates, a function from the R package (lmon) will be applied. This function uses the Method of L-Moments. The developed R code is presented bellow, as well as some plots with the outputs of the fit. Some comments are provided in the text code.

Log Pearson Type II Distribution (LP3)

The LP3 distribution has shape, scale and location parameters \(\alpha\), \(\beta\), \(\tau\) with the following PDF[3]

\[ f(x)=\frac{1}{|\beta|\Gamma(\alpha)}\Bigg(\frac{x-\tau}{\beta}\Bigg)^{\alpha-1}exp\Bigg(-\frac{x-\tau}{\beta}\Bigg) \] defined for \(\alpha>0\) and \((x-\tau)/\beta>0\), with \(\Gamma(\alpha)=\)complete gamma function. In this case, \(x\) is the natural logarithm of the discharges.

The CDF of the LP3 is written with \(z=\frac{ln(x)-\tau}{\beta}\) [3] as:

\[ F(x)=\frac{G[\alpha,z]}{\Gamma(\alpha)}, for\ \gamma_x>0, \ exp[\tau]\le x < \infty; \]

\[ F(x)=1-\frac{G[\alpha,z]}{\Gamma(\alpha)}, for \ \gamma_x<0, \ 0< x\le \infty; \]

\(G[\alpha,z]=\)lower incomplete gamma function.

As the Gumbel distribution, a R function was applied to estimate the parameters as shown in the R code bellow.

2.3.1 Annual Duration Series

#Loading Packages to manipulate time series data
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

#Loading
library(lmom)

#Creating a daily index for all data range
data<-data_frame(date=seq(as.Date("1924-09-05"),as.Date("2019-09-10"),by=1),amount=discharges.ts)

#Grouping maximum average daily discharge for each year
max.by.year<-data %>% group_by(year=floor_date(date, "year")) %>% summarize(amount=max(amount))

#Plotting the maximum discharges by year
plot(max.by.year,type="l",ylab="Discharge (cfs)",main="Maximum Daily Average Discharge")

#Recording the maximum discharges by year and removing N.A. values
maximas<-max.by.year$amount
maximas<-maximas[!is.na(maximas)]

#Sorting maxima by decreasing order
sorted.maximas<-sort(maximas,decreasing=T)

#Computing the empirical probabilities
p<-(c(1:length(sorted.maximas)))/(length(sorted.maximas)+1)

#Computing the recurrence time
tr<-1/p

#Estimating the parameters for Gumbel distribution
fit<-samlmu(maximas)
para<-pelgum(fit)
para

##       xi    alpha 
## 3653.891 1150.727

#Estimating the parameters for Log Pearson type 3 distribution
para3<-pelpe3(fit)
para3

##           mu        sigma        gamma 
## 4318.1081081 1439.4404259    0.7594543

#Plot cumulative probability x discharges for empirical and fitting distribution
plot(1-p,sorted.maximas,ylab="discharge (cfs)",xlab="Cumulative probability",main="")

#Log pearson type 3 fitting
lines(cdfpe3(sorted.maximas,para3),sorted.maximas,col="red")

#Gumbel fitting
lines(cdfgum(sorted.maximas,para),sorted.maximas,col="blue",lty=2)
grid()
#Legend
legend("topleft", legend=c("LP3", "Gumbel"),
       col=c("red", "blue"), lty=1:2, cex=1)

#Plotting empirical recurrence time and discharges
plot(tr,sorted.maximas,xlab="Recurrence Time (years)",ylab="discharge (cfs)",ylim=c(1200,9000),xlim=c(0,80))
grid()

#Fitting recurrence time employing Gumbel distribution
y<-c(9000,sorted.maximas)
gumbel.accum<-cdfgum(y,para)
fitted.tr<-1/(1-gumbel.accum)
lines(fitted.tr,y,col="blue",lty=2)

#Fitting recurrence time emplyoing Log Pearson 3 distribution
lp3.accum<-cdfpe3(y,para3)
fitted.tr3<-1/(1-lp3.accum)
lines(fitted.tr3,y,col="red")
legend("topleft", legend=c("LP3", "Gumbel"),
       col=c("red", "blue"), lty=1:2, cex=1)

2.3.2 Partial Duration Series

The same general process employed for the partial annual duration series was used for the partial duration series, except for the fact that all the discharges higher the 2000 cfs were taking in consideration to compose the maximum discharges time series. The R code below starts selecting the discharges higher than 2000 cfs and sorting them by decreasing order. Then, the same process from the previous section was repeated.

#Selecting 
flow<-na.omit(discharges)
partial.Data<-flow[which(flow>2000)]
partial.Data.sorted<-sort(partial.Data,decreasing=T)


#Plotting the maximum discharges by year
plot(partial.Data,type="l",ylab="Discharge (cfs)",xlab="# observations higher than 2000 cfs",main="Discharges higher than 2000 cfs")

#Computing the empirical probabilities
p<-0
p<-(c(1:length(partial.Data.sorted)))/(length(partial.Data.sorted)+1)

#Computing the recurrence time
tr<-1/p

#Estimating the parameters for Gumbel distribution
fit<-samlmu(partial.Data)
para<-pelgum(fit)
para

##        xi     alpha 
## 2552.8249  750.5112

#Estimating the parameters for Log Pearson type 3 distribution
para3<-pelpe3(fit)
para3

##          mu       sigma       gamma 
## 2986.031746 1059.235634    2.153598

#Plot cumulative probability x discharges for empirical and fitting distribution
plot(1-p,partial.Data.sorted,ylab="discharge (cfs)",xlab="Cumulative probability",main="")

#Log pearson type 3 fitting
lines(cdfpe3(partial.Data.sorted,para3),partial.Data.sorted,col="red")

#Gumbel fitting
lines(cdfgum(partial.Data.sorted,para),partial.Data.sorted,col="blue",lty=2)
grid()
#Legend
legend("topleft", legend=c("LP3", "Gumbel"),
       col=c("red", "blue"), lty=1:2, cex=1)

#Plotting empirical recurrence time and discharges
plot(tr,partial.Data.sorted,xlab="Recurrence Time (years)",ylab="discharge (cfs)",ylim=c(1200,9000),xlim=c(0,80))
grid()

#Fitting recurrence time employing Gumbel distribution
y<-c(9000,partial.Data.sorted)
gumbel.accum<-cdfgum(y,para)
fitted.tr<-1/(1-gumbel.accum)
lines(fitted.tr,y,col="blue",lty=2)

#Fitting recurrence time emplyoing Log Pearson 3 distribution
lp3.accum<-cdfpe3(y,para3)
fitted.tr3<-1/(1-lp3.accum)
lines(fitted.tr3,y,col="red")
legend("topleft", legend=c("LP3", "Gumbel"),
       col=c("red", "blue"), lty=1:2, cex=1)

2.4 Rating Curve

The Stage-Discharge rating curve refers to the relation between the water level and the associated discharge. For this task a power regression was applied to fit a rating curve to the data. The following equation model was adopted:

\[ h=a\cdot Q ^b \] Where

h= stage, or water level (ft)

Q = discharge (cfs)

a,b = coefficients to be determined

Two models were adjusted. One model considered all the available data without any analysis and another regression model was fitted after taking in account only the clear measurements (days with sediment, ice cover or unreported condition was disregarded). The developed R code as well as the final plot after the regression analysis is presented below. The two pairs of coefficients for each model is also shown.

#Generating a power regression for all data
m<-nls(rc.dataSet$height_ft~a*rc.dataSet$flow_cfs^b,data=rc.dataSet,start=list(a=1,b=1))

#Summary of the regression statistics
summary(m)

## 
## Formula: rc.dataSet$height_ft ~ a * rc.dataSet$flow_cfs^b
## 
## Parameters:
##   Estimate Std. Error t value Pr(>|t|)    
## a 0.301915   0.018723   16.12   <2e-16 ***
## b 0.357766   0.008995   39.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.032 on 844 degrees of freedom
## 
## Number of iterations to convergence: 11 
## Achieved convergence tolerance: 8.067e-06

#Coefficients for the first model
a <- round(summary(m)$coefficients[1], 3)
b <- round(summary(m)$coefficients[2], 3)
a

## [1] 0.302

## [1] 0.358

#Plotting all data
plot(rc.dataSet$height_ft~ rc.dataSet$flow_cfs, main = "Rating Curve ",xlab="Discharge (cfs)",ylab="Height")

#plotting the first regression model
q <- seq(10, 25000, length = 300)
lines(q, a*q^b, lty = 1, col = "black")

#Loading the clear measurements
clear.data<-read.table(file="clearData.txt",header = T)

#Ploting the clear measurements
x=clear.data$discharge_cfs
y=clear.data$height_ft
points(x,y,pch=17,col="red")


#Generating a power regression for the second model
m2<-nls(clear.data$height_ft~a2*clear.data$discharge_cfs^b2,data=clear.data,start=list(a2=1,b2=1))

#Summary of the second regression
summary(m2)

## 
## Formula: clear.data$height_ft ~ a2 * clear.data$discharge_cfs^b2
## 
## Parameters:
##    Estimate Std. Error t value Pr(>|t|)    
## a2 0.187959   0.004580   41.04   <2e-16 ***
## b2 0.420344   0.003171  132.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1525 on 121 degrees of freedom
## 
## Number of iterations to convergence: 12 
## Achieved convergence tolerance: 4.02e-07

#Coefficients for the second model
a2 <- round(summary(m2)$coefficients[1], 3)
b2 <- round(summary(m2)$coefficients[2], 3)
lines(q, a2*q^b2, lty = 2, col = "red")

a2

## [1] 0.188

b2

## [1] 0.42

#Plot legend
legend("top", legend=c("All Data", "Clear Measurements","Regression-all data", "Regression-clear Measurements"),
       col=c("black", "red","black","red"), lty=c(NA,NA,1,2),pch=c(1,17,NA,NA))
grid()

All Data Regression \[ h=0.302\cdot Q ^{0.358} \]
Clear Measurements Regression \[ h=0.188\cdot Q ^{0.42} \]

3.DISCUSSION

The flow duration curve indicates the percentage of days in a year the mean daily flow is equaled or exceeded. The shape of the flow duration curve reflects the ability of the basin to store water temporarily in the ground and to release it later as a contribution from groundwater[4] . As can be seen in the presented flow duration curve, the low-flow end of the curve is steep what might indicate that the amount of groundwater storage above the channel bed level is small for this site. In addition, the general configuration of the obtained flow duration curve is relatively steep what is related to a variable stream. The relatively high variability of the streamflow can also be checked by comparing the discharge that is sustained 50% (151 cfs) of the time with the one that is maintained 99% of the time (35 cfs). The former is more than 4 times greater than the latter.

Regarding the frequency analysis, for the annual duration time series, as observed in the cumulative probability plot, by a visual assessment, both fit performed relatively well considering the observed empirical probability and the fitted models (Gumbel and LP3). However, comparing both models by the Discharge x Recurrence time plot, the graph indicates that from the recurrence time of around 15 years, the Gumbel model results in higher discharges than the LP3 model. For flows associated with recurrence times less than 15 years, both models are basically equivalent.

For the partial duration series, the cumulative probability plot visually indicates a better performance for the LP3 model that generated a regression curve well fitted to the data in comparison to the Gumbel model. The Discharge x Recurrence Time plot also corroborates to visually demonstrate that LP3 better fitted the observed data. LP3 model also resulted in higher magnitudes for predicted flows when compared to the Gumbel model.

Comparing the fits for both types of series, the annual time series results in higher magnitudes for the discharges, considering the same recurrence time of the partial duration series, regardless the adopted distribution. For designing purpose, the annual time series is usually most adopted in practical designing models around the world. Once for the annual time series both models well fitted the data, it is believed that both could be applied in practical cases. However, taking in account the different sources of uncertainties (statistical model parameters uncertainties, uncertainty related to the measurements and to the rating curve model), one can argue that employing the Gumbel model (that generates higher flow values) would be interesting for the sake of the safety.

Finally, regarding the rating curve, the scatter plot for all data shows some discrepancies associated with some data. The station report provides several notes indicating days in which ice and sediment debris were observed during the discharge measurements. Certainly, this occurrence interferes in the quality and representativeness of the data. In order to avoid that, a regression model was fitted only for the clear measurements resulting in a better agreement between the curve and the data. However, in the rating curve plot for discharges higher than 7,500 cfs the regression model underestimates the flow. This might be related to the channel characteristics that changes its main features for water levels higher than 7.5 ft. An alternative to better adjust the regression, would be fit another curve only for data above this threshold. In this case, the rating curve should be composed by a piece wise function, and the uncertainty associated with higher discharges would be slightly reduced.

4.REFERENCES

[1] Chin, D.A. (2014) Water Resources Engineering, 3rd Edition.

[2] Dingman, S. L. (2002) Physical Hydrology, 2nd Edition.

[3] Griffis, V.W. Stendiger, J.R. (2007). Log-Pearson Type 3 Distribution and Its Application in Flood Frequency Analysis. I: Distribution Characteristics. Journal of Hydrologic Engineering, 12 (5):482-491

[4] Leopold,L.B. (1994) A View of the River, Harvard University Press.

[5] USGS. National Water Information System. Accessed in 9/12/2019. https://waterdata.usgs.gov/nwis/dv?cb_00060=on&format=gif_stats&site_no=04275000&referred_module=sw&period=&begin_date=1924-09-05&end_date=2019-09-10