Tutorial 3: Hydrological analysis using probability concepts

This tutorial will look into different ways historical streamflow data can be analysed in R using Probability concepts.

Looking at the Monthly flows in the South Maroochy River

Plotting summary statistics for each month can be used to identify seasonality in flow data. A box and whisker plot of monthly flows is a good way to show this seasonality. A plot will be made of flows in the South Maroochy River.

First we get the monthly flow report for the 141001B South Maroochy River at Kiamba. The information can be accessed from the [Water Monitoring Information Portal] (https://water-monitoring.information.qld.gov.au/). Select custom outputs and check Stream Discharge (Megalitres/Day) and make sure Rainfall is unchecked. Select Download for the output and Monthly for the data interval.

library(zoo)
library(hydroTSM)
library(lubridate)  # to create an index of months from zoo time series.
library(lattice)

stdate<-strsplit(toString(dd[1,1])," ")[[1]][1]
findate<-strsplit(toString(dd[nrow(dd),1])," ")[[1]][1]
date_ind<-seq(as.Date(stdate,format="%d/%m/%y"),as.Date(findate,format="%d/%m/%y"),by="month")

#can automate the number of column numbers (eg. c(2, 4, 6)) based on the number of variables
zd<-zoo(dd[,c(2,4,6)],date_ind)
tt <- time(zd)
zz <- cbind(zd, year = year(tt), month = month(tt), day = day(tt))

Jan <- dd[,2][zz$month==1]
Jan <- na.omit(Jan)
Feb <- dd[,2][zz$month==2]
Feb <- na.omit(Feb)
Mar <- dd[,2][zz$month==3]
Mar <- na.omit(Mar)
Apr <- dd[,2][zz$month==4]
Apr <- na.omit(Apr)
May <- dd[,2][zz$month==5]
May <- na.omit(May)
Jun <- dd[,2][zz$month==6]
Jun <- na.omit(Jun)
Jul <- dd[,2][zz$month==7]
Jul <- na.omit(Jul)
Aug <- dd[,2][zz$month==8]
Aug <- na.omit(Aug)
Sep <- dd[,2][zz$month==9]
Sep <- na.omit(Sep)
Oct <- dd[,2][zz$month==10]
Oct <- na.omit(Oct)
Nov <- dd[,2][zz$month==11]
Nov <- na.omit(Nov)
Dec <- dd[,2][zz$month==12]
Dec <- na.omit(Dec)

boxplot(Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec, log="y", main=" South Maroochy River at Kiamba", ylab="Average Daily Flow (ML/day)", names=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"), range=20)
points(c(mean(log(Jan)),mean(log(Feb)),mean(log(Mar)),mean(log(Apr)),mean(log(May)),mean(log(Jun)),mean(log(Jul)),mean(log(Aug)),mean(log(Sep)),mean(log(Oct)),mean(log(Nov)),mean(log(Dec))), col="red", pch=18)
legend('bottomright',legend="mean", col="red", pch=18)
Figure 1.2.6: Boxplot of the monthly flows at Kiamba

Figure 1.2.6: Boxplot of the monthly flows at Kiamba

Figure 1.2.6 shows that the greatest flows are from February to April and the lowest flows are in September and October. This is fairly typical of a river in South East Queensland.

Coefficient of variation and L-moments

The Coefficient of variation is often used to measure the variability of annual flows. It is described by the following equation: \[C_v = \frac{S_Q}{Q}\] \[\text{where } Q = \text{ mean annual discharge and } S_Q = \text{ standard deviation of the mean annual discharge.}\]

A high value of Cv indicates that flows are more variable from year to year. For example, Joyces creek has a mean annual discharge of 9870 ML and a standard deviation of 7850. The coefficient of variation is 7850/9870 = 0.80. The average coefficient of variation for Australian streams under 1000km^2 is 0.59. This means that Joyces creek is more variable than other Australian creeks of similar sizes.

L-moments are an alternative method of summarising data. This is done by calculating 4 L-moments which can be used to obtain the variation and skewness of the information. To show the calculation of an L-moment a small data set called example will be created example <- c(12,42,54,16). The first L moment is simply the mean of the mean of the data L1 <- mean(c(12,42,54,16)). The second L-moment is calculated as follows:

  1. Draw a subsample of two from all the available data and subtract the smaller value from the larger value. For example a sub sample with the first two numbers 12 and 42: 42-12.

  2. Do this for every possible combination of the two data values c(42-12,54-12,16-12,54-42,42-16,54-16).

  3. Get the average difference between the two values mean(c(42-12,54-12,16-12,54-42,42-16,54-16)).

  4. Divide this average by two and you have the estimated value of the second L moment mean(c(42-12,54-12,16-12,54-42,42-16,54-16))/2.

The equations for the 1st and second L-moments are: \[\lambda_1 = E[X_{(1:1)}]\] \[\lambda_2 = \frac{1}{2}E[X_{(2:2)}-X_{(1:2)}]\] The equations for the 3rd and 4th are: \[\lambda_3 = \frac{1}{3}E[X_{(3:3)}-2X_{(2:3)}+X_{(1:3)}]\] \[\lambda_4 = \frac{1}{4}E[X_{(4:4)}-3X_{(3:4)}+3X_{(2:4)}-X_{(1:4)}]\]

\[\text{where } X_{(i:j)} \text{ stands for the } i \text{th smallest variable in a sample size } j\]

The code below shows how each L-moment is calculated.

example <- c(12,42,54,16)
L1 <- mean(c(12,42,54,16))
L2 <- mean(c(42-12,54-12,16-12,54-42,42-16,54-16))/2
L3 <- mean(c(54-2*42+12,42-2*16+14,54-2*16+12,54-2*42+16))/3
L4 <- mean(c(54-3*42+3*16-12))/4
paste(c("1st L-moment: ", signif(L1,3), ", 2nd L-moment: ", signif(L2,3), ", 3rd L-moment: ", signif(L3,3), ", 4th L-moment: ", signif(L4,3)), collapse="")
## [1] "1st L-moment: 31, 2nd L-moment: 12.7, 3rd L-moment: 2.17, 4th L-moment: -9"

Information about the data can now be obtained using the L-moment ratios. The ratios are defined as: \[\tau_2=\frac{\lambda_2}{\lambda_1}\] \[\tau_3=\frac{\lambda_3}{\lambda_2}\] \[\tau_4=\frac{\lambda_4}{\lambda_2}\]

\[\text{where } \tau_2 \text{ is the L-coefficient of variation } (L-C_v), \tau_3 \text{ is the L-coefficient of skewness, and }\] \[\tau_4 \text{ is the L-coefficient of kurtosis.}\] It has been shown how to calculate the L-moments of a data set containing 4 values but what about the typical large data sets containing many years of historical information that we are likely to encounter in out engineering career? It is impractical to calculate these by hand and even time consuming to do it in Excel. Fortunately R has a function that can calculate these almost instantly.

The data frame Joyces contains annual discharges for Joyces Creek at Strathlea (the same information which can be found in Appendix 1 of Ladson). The Lmoments() function from the Lmoments package will be used to calculate the L-moments of the data frame.

#install.packages("Lmoments", repos="http://cran.rstudio.com/")
library(Lmoments)
Lmom <- Lmoments(Joyces[,2])

paste(c("The L-moment coefficient of variation is ", signif((Lmom[2]/Lmom[1]),3)), collapse="")
## [1] "The L-moment coefficient of variation is 0.452"
paste(c("The L-moment coefficient of skewness is ", signif((Lmom[3]/Lmom[2]),3)), collapse="")
## [1] "The L-moment coefficient of skewness is 0.174"
paste(c("The L-moment coefficient of kurtosis is ", signif((Lmom[4]/Lmom[2]),3)), collapse="")
## [1] "The L-moment coefficient of kurtosis is 0.0323"

Questions

Question 1: Create code in R to determine probability (P) when AEP (p), successes (k) and trials (n) are input. Use this code to determine the probability of 2 50 year floods occurring in 100 years.

Question 2: Present the answer from question 2 as a sentence using R. Example output: P(3 in 100) = 0.182, so therefore there is an 18.2% chance that three 50-year floods will occur in any 100 year period. Hint: the c() function adds the strings and variables to a list and the paste function with collapse = “” combines the list elements and prints it. for example paste(c(“The AEP is”, AEP), collapse = “”) will print “The AEP is 0.02” if the AEP variable is set as 1/50.

Question 3: Create code in R to determine the design AEP when probability (P), successes (k) and trials (n) are input.