EX 5. Streamflow analysis: Annual water balance

In this exercise, you will: - read USGS streamflow data into R
- calculate monthly discharge
- calculate runoff coefficients

I. Load the libraries you’ll need: dataRetrieval

Additional information: The USGS has a website dedicated to the dataRetrieval package: https://waterdata.usgs.gov/blog/dataretrieval/ There are packages for streamflow and water quality data retrieval and analysis.

Here, we install and use the dataRetrieval package from the USGS.

install.packages("dataRetrieval")

library(dataRetrieval)  # USGS package that gets streamflow data direct from the USGS website

See what dataRetrieval can do, using “vignette”:

vignette("dataRetrieval",package = "dataRetrieval")

II. Load and summarize rainfall data

A. Load the daily time series for Poway station.

See Exercise 4. Subset to only post-Oct-1955 (see Ex4).

Name the annual precipitation data.frame “P.annual.wy” with columns “WaterYear” and “Rainfall.mm”.

B. Summarize preciptation data to annual values by water year and plot

See Exercise 4… Look at your data. Is WaterYear correct?

Aggregate the data to annual (water year) total precipitation (see Ex 4) and plot to make sure it worked:

Make sure your column names are “WaterYear” and “Rainfall.mm”.

I. Load and summarize streamflow data

A. Load streamflow at the Los Penasquitos River.

1. Specify what station you’re interested in.

2. See what data is available.

station.site.meta.data <- read_waterdata_monitoring_location("USGS-11023340")
print(station.site.meta.data)

## Simple feature collection with 1 feature and 40 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -117.1217 ymin: 32.9431 xmax: -117.1217 ymax: 32.9431
## Geodetic CRS:  WGS 84
## # A tibble: 1 × 41
##   monitoring_location_id monitoring_location_…¹ county_name altitude_method_code
##   <chr>                  <chr>                  <chr>       <chr>               
## 1 USGS-11023340          11023340               San Diego … X                   
## # ℹ abbreviated name: ¹monitoring_location_number
## # ℹ 37 more variables: horizontal_position_method_name <chr>,
## #   national_aquifer_code <chr>, monitoring_location_name <chr>,
## #   minor_civil_division_code <chr>, altitude_method_name <chr>,
## #   original_horizontal_datum <chr>, aquifer_type_code <chr>,
## #   agency_code <chr>, agency_name <chr>, district_code <chr>,
## #   site_type_code <chr>, vertical_datum <chr>, …

This view is truncated…to see the full view, click on the station.site.meta.data object in the Environment tab.

Read metadata about the time series available at this site:

station.ts.meta.data <- read_waterdata_ts_meta("USGS-11023340")
print(station.ts.meta.data)

## Simple feature collection with 6 features and 19 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -117.1217 ymin: 32.9431 xmax: -117.1217 ymax: 32.9431
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 20
##   time_series_id            parameter_name parameter_code computation_identifier
##   <chr>                     <chr>          <chr>          <chr>                 
## 1 2322b169ab07447fbbba0843… Discharge      00060          Mean                  
## 2 28d9c4a8e02a41e796503689… Discharge      00060          Instantaneous         
## 3 475cc462e04248c5b1b85bd8… Gage height    00065          Instantaneous         
## 4 9dff54f6f7eb40e2942a850d… Stream level,… 63160          Instantaneous         
## 5 da30330107734ebbbafd128c… Discharge      00060          Max At Event Time     
## 6 f021b825fe1949b5bcc2717b… Gage height    00065          Max At Event Time     
## # ℹ 16 more variables: begin_utc <dttm>, end <dttm>, thresholds <chr>,
## #   parent_time_series_id <chr>, unit_of_measure <chr>, last_modified <dttm>,
## #   computation_period_identifier <chr>, parameter_description <chr>,
## #   end_utc <dttm>, sublocation_identifier <chr>, statistic_id <chr>,
## #   web_description <chr>, begin <dttm>, primary <chr>,
## #   monitoring_location_id <chr>, geometry <POINT [°]>

This view is also truncated…to see the full view, click on the station.ts.meta.data object in the Environment tab.

3. Read in daily discharge data direct from the USGS website.

parameter.code = "00060"  # this is the code for stream discharge.
start.date = ""  # Blanks get all of the data
end.date = ""
Qin = read_waterdata_daily("USGS-11023340",parameter.code,time=c(start.date,end.date))
head(Qin)

## Simple feature collection with 6 features and 11 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -117.1217 ymin: 32.9431 xmax: -117.1217 ymax: 32.9431
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 12
##   daily_id              parameter_code time_series_id time       unit_of_measure
##   <chr>                 <chr>          <chr>          <date>     <chr>          
## 1 b163b966-3de2-4786-a… 00060          2322b169ab074… 1964-10-01 ft^3/s         
## 2 07434da0-3f85-4d1c-9… 00060          2322b169ab074… 1964-10-02 ft^3/s         
## 3 9f9c4e16-9972-4431-a… 00060          2322b169ab074… 1964-10-03 ft^3/s         
## 4 0a76230a-e5ed-48fd-a… 00060          2322b169ab074… 1964-10-04 ft^3/s         
## 5 7a0f074c-3b33-4e88-a… 00060          2322b169ab074… 1964-10-05 ft^3/s         
## 6 21270e0e-dcac-4a90-8… 00060          2322b169ab074… 1964-10-06 ft^3/s         
## # ℹ 7 more variables: last_modified <dttm>, monitoring_location_id <chr>,
## #   statistic_id <chr>, value <dbl>, approval_status <chr>, qualifier <chr>,
## #   geometry <POINT [°]>

4. Rename the columns of x.

The names for the discharge and QA columns aren’t very nice, so rename them:

Q.daily.df <- data.frame(Qin[,c("time","value","qualifier")])
names(Q.daily.df) = c("Date","Q.ft.s","QA")
head(Q.daily.df)

##         Date Q.ft.s QA                        NA
## 1 1964-10-01    0.5    POINT (-117.1217 32.9431)
## 2 1964-10-02    0.5    POINT (-117.1217 32.9431)
## 3 1964-10-03    0.5    POINT (-117.1217 32.9431)
## 4 1964-10-04    0.6    POINT (-117.1217 32.9431)
## 5 1964-10-05    0.6    POINT (-117.1217 32.9431)
## 6 1964-10-06    0.6    POINT (-117.1217 32.9431)

5. Tabulate the quality infomation.

The quality codes can be found by googling “usgs streamflow data quality flags”, which will get you here: https://help.waterdata.usgs.gov/codes-and-parameters/instantaneous-value-qualification-code-uv_rmk_cd

Here are the values you’ll see for Los Pen:

Blank: Approved for publication – Processing and review completed.

Create a table of the number of occurrences of each flag:

table(Q.daily.df$QA)  #  Creates a table of the counts of each quality flag value.
#  Based on the quality flags and this table, can you use all of the data?

B. Calculate the depth of annual discharge in mm/yr.

1. Use the aggregate function as you did in Exercise 4 to calculate total rainfall by water year.

Be sure to rename the columns of the data frame that has the annual mean values. ** Also be sure to convert the year and mean to numeric, if they are factors. **

  # What from and to dates should you use for your dataset?
breaks <- seq(from=as.Date("1965-10-01"), to=as.Date("2024-10-01"), by="year")  
years.breaks = as.numeric(format(breaks,"%Y"))
labels.wy = years.breaks[2:length(breaks)]
Q.daily.df$WaterYear <- cut(Q.daily.df$Date, breaks,labels=labels.wy)

Q.data.avail.by.wy = aggregate(Q.daily.df$Q.ft.s,by=list(Q.daily.df$WaterYear),FUN=length)
Q.annual.mean.cfs = aggregate(Q.daily.df$Q.ft.s,by=list(Q.daily.df$WaterYear),FUN=mean)
names(Q.annual.mean.cfs) = c("WaterYear","MeanQcfs")

# The output of aggregate is sometimes a factor, so convert both Year and MeanQ to numeric.
Q.annual.mean.cfs$WaterYear = as.numeric(as.character(Q.annual.mean.cfs$WaterYear))

Q.annual.mean.cfs$MeanQcfs = as.numeric(as.character(Q.annual.mean.cfs$MeanQcfs))
head(Q.annual.mean.cfs)

##   WaterYear MeanQcfs
## 1      1966 8.196438
## 2      1967 6.814000
## 3      1968 1.462350
## 4      1969 6.770712
## 5      1970 2.001315
## 6      1971 2.317260

2. Calculate annual runoff in mm/year.

For the calculation, you need the drainage area of your watershed.

drain.area.mi2 = station.site.meta.data$drainage_area
print(drain.area.mi2)

## [1] 42.1

Now calculate annual discharge from your watershed in mm by doing units conversions on annual.mean.cfs. (write out the units conversion first on paper!). I’m not showing you the code…you have to write it.

Call this new annual discharge data frame “Q.ann.mm.df”. Use the data.frame command to make sure the result is a data.frame class. Make sure it has column names of “WaterYear” and “Q.ann.mm”.

II. Annual rainfall-runoff relationships.

A. Plot annual rainfall vs annual runoff:

First, you need to merge the datasets, so you have a data frame with only the years that have data for both.

Q.P.merge = merge(Q.ann.mm.df,P.annual.wy,by.x="WaterYear",by.y="WaterYear")
#  Print Q.P.merge to make sure it worked.

plot(Q.P.merge$Rainfall.mm,Q.P.merge$Q.ann.mm,xlab="Annual precipitation, mm",ylab="Annual runoff, mm")
#  Add a 1:1 line to see if runoff is ever greater than precipitation.
abline(0,1)  #  Run ?abline to see how to use it.

B. Calculate the annual runoff coefficient (Q/P) for each water year and plot its timeseries.

Note: below I call the annual runoff coefficient variable “Q.P”. I’m not showing he code to do this… There’s one value that looks odd…is that value realistic? Why or why not? Look back at Ex 4 and the number of days with rainfall values.

Exclude the year with low rainfall due to missing data. You could remove the whole year, or just set the value to NA in the time series.

Q.P.merge$Rainfall.mm[Q.P.merge$Rainfall.mm<50]=NA
# And with the Q.P time series:
Q.P[is.na(Q.P.merge$Rainfall.mm)] = NA

And replot:

plot(Q.P.merge$WaterYear,Q.P,xlab="Year",ylab="Annual Q/P",type="l")

C. Multipanel plots.

In order to clarify the relationship between P, and Q/P, plot both on a single, multipanel plot.

par(mfrow=c(2,1))  # mfrow creates a multipanel plot.  c(3,1) makes the plot have 3 rows and 1 column.
par(mar=c(0,0,0,0),oma=c(4,4,1,1))  # mar is the margins between each plot.  oma is the outer margins of the plot.
# First panel.
plot(Q.P.merge$WaterYear,Q.P.merge$Rainfall.mm,xaxt="n",type="l")
  #  xaxt="n" means don't plot the x-axis tics or labels.
axis(side=1,labels=FALSE)  #  side=1 means the bottom, side=2 means the left side.
mtext("Annual P, mm",side=2,line=2.5) # adds text to the left side (y-axis).

# Second panel.
plot(Q.P.merge$WaterYear,Q.P,xlab="Year",ylab="Annual Q/P",type="l")
mtext("Annual Q/P",side=2,line=2)

D. Plotting multiple colors for points in a series.

Plot Q/P vs P, and color the points by the year

# Generate a vector where the text is "black"
wy = Q.P.merge$WaterYear
colvec = rep("black",times=length(wy))

# Choose some break years, and assign the colors to years in those intervals.
colvec[(wy>1990) & (wy<=2000)] = "grey"  # 1991-2000 will be colored grey
colvec[(wy>2000)] = "white"              # 2000-2015 will be colored white

plot(Q.P.merge$Rainfall.mm,Q.P,xlab="Precipitation, mm",ylab="Q/P",bg=colvec,col="black",pch=22)
#  "bg" is the argument for the fill color.
#  pch is the symbol type

#  try ?pch to see the options.

legend("bottomright",c("1965-1990","1991-2000","2001-2022"),pch=22,pt.bg=c("black","grey","white"))

  # legend uses the argument "pt.bg" for the background color of the points

E. Plotting multiple colors/symbols: Method using “points”

A second way to plot different series as different point symbols is to use “points”.

#  First, add Q/P to the original dataframe that has all the data.  This makes sure that Q/P is attached to each of the smaller dataframes you'll make in the next step.
Q.P.merge$Q.P = Q.P 
# Next, separate the one dataframe into 3 separate dataframes by the year.
Q.P.merge.pre.1990 = Q.P.merge[Q.P.merge$WaterYear <= 1990,]
Q.P.merge.1990.2000 = Q.P.merge[(Q.P.merge$WaterYear > 1990) & (Q.P.merge$WaterYear <= 2000),]
Q.P.merge.post.2000 = Q.P.merge[Q.P.merge$WaterYear > 2000,]

#  Find the full range of y values in the original dataframe.
y.value.range = range(Q.P.merge$Q.P,na.rm=TRUE)
x.value.range = range(Q.P.merge$Rainfall.mm,na.rm=TRUE)

#  Now, plot the first series using "plot"
plot(Q.P.merge.pre.1990$Rainfall.mm,Q.P.merge.pre.1990$Q.P,col="black",pch=22,xlim=x.value.range,ylim=y.value.range)

# Now, add additional series using "points"
points(Q.P.merge.1990.2000$Rainfall.mm,Q.P.merge.1990.2000$Q.P,col="grey",pch=19)
points(Q.P.merge.post.2000$Rainfall.mm,Q.P.merge.post.2000$Q.P,col="blue",pch=12)

# Add a legend
legend("bottomright",c("1965-1990","1991-2000","2001-2022"),pch=c(22,19,12),col=c("black","grey","blue"))

Plotting tip: let’s turn the y-axis numbers so they are not turned 90 degrees:

plot(Q.P.merge.pre.1990$Rainfall.mm,Q.P.merge.pre.1990$Q.P,col="black",pch=22,xlim=x.value.range,ylim=y.value.range,las=1)

# Now, add additional series using "points"
points(Q.P.merge.1990.2000$Rainfall.mm,Q.P.merge.1990.2000$Q.P,col="grey",pch=19)
points(Q.P.merge.post.2000$Rainfall.mm,Q.P.merge.post.2000$Q.P,col="blue",pch=12)

# Add a legend
legend("bottomright",c("1965-1990","1991-2000","2001-2022"),pch=c(22,19,12),col=c("black","grey","blue"))

For the HW, you’ll make the axis labels nice…!

You’ll also plot rainfall vs runoff:

plot(Q.P.merge$Rainfall.mm,Q.P.merge$Q.ann.mm,xlab="Annual Precipitation, mm",ylab="Annual Q, mm",bg=colvec,col="black",pch=22)
#  "bg" is the argument for the fill color.
#  pch is the symbol type

#  try ?pch to see the options.

legend("bottomright",c("1965-1990","1991-2000","2001-2022"),pch=22,pt.bg=c("black","grey","white"))