In this exercise, you will: - read USGS streamflow data into R
- calculate monthly discharge
- calculate runoff coefficients
The USGS has a website dedicated to R: https://owi.usgs.gov/R/ There are packages for streamflow and water quality data retrieval and analysis. Here, we install and use the dataRetrieval package from the USGS.
install.packages("dataRetrieval",
repos=c("http://owi.usgs.gov/R",
getOption("repos")))
library(zoo)
library(raster)
library(rgdal)
library(hydroTSM)
library(dataRetrieval) # USGS package that gets streamflow data direct from the USGS website
See what dataRetrieval can do, using “vignette”:
vignette("dataRetrieval",package = "dataRetrieval")
First, calculate daily rainfall for Poway station by running Exercise 4, and then save the daily precipitation series in your personal directory, so you don’t have to re-run the code every time:
outdir = << your class data directory >>
setwd(outdir)
write.csv(poway.precip.daily,file="poway_daily_precip.csv")
# write.table is used for data frames
Close Rstudio (don’t save the workspace), and restart Rstudio. This will clear the variables. Now, reload your daily precipitation data. Be sure to have the exact same name when saving the csv and when reading it back to R.
What happens if the name you use for the file in read.csv doesn’t match? Experiment with a wrong file name and see what error you get.
daily.rainfall = read.csv(file="poway_precip_daily.csv", stringsAsFactors = FALSE)
# stringsAsFactors = FALSE means all text will be read as character strings, rather than being converted to factors.
# Factors can be useful for some analyses, but they royally mess up Date functions.
# and look at the data:
head(daily.rainfall)
## X Date.text Precip.inches Date
## 1 1 10/4/1962 0.02 10/4/1962
## 2 2 10/5/1962 0.00 10/5/1962
## 3 3 10/6/1962 0.00 10/6/1962
## 4 4 10/7/1962 0.00 10/7/1962
## 5 5 10/8/1962 0.00 10/8/1962
## 6 6 10/9/1962 0.00 10/9/1962
What class is your Date column?
class(daily.rainfall$Date)
Now convert the Date to a Date object. NOTE: The “format” argument depends on the format of the Date.text column.
If your Date.text column looks like this: “10/4/1962”, the format argument will be “%m/%d/%Y” If your Date.text column looks like this: “1962-10-04”, then the format argument will be “%Y-%m-%d”
What happens if you use the wrong format? What error do you get?
daily.rainfall$Date = as.Date(daily.rainfall$Date,format="%m/%d/%Y")
And confirm that it worked as expected:
head(daily.rainfall)
class(daily.rainfall$Date)
*** If this doesnt’ work, check the “format” argument.
breaks <- seq(from=as.Date("1962-10-01"), to=as.Date("2015-10-01"), by="year")
years.breaks = as.numeric(format(breaks,"%Y"))
labels.wy = years.breaks[2:length(breaks)] # Take all but the first year in the breaks. Why?
daily.rainfall$WaterYear <- cut(daily.rainfall$Date, breaks,labels=labels.wy)
Look at your data. Is WaterYear correct?
# First look at the first 600 lines
daily.rainfall[1:600,]
# Now look at the last 600 lines. Is WaterYear correct?
nrecords = length(daily.rainfall$Date)
daily.rainfall[(nrecords-600):nrecords,]
## WaterYear Precip.in
## 1 1963 9.22
## 2 1964 7.56
## 3 1965 8.88
## 4 1966 14.16
## 5 1967 12.80
## 6 1968 6.50
Look at the class of the y
class(rainfall.annual.wy$WaterYear)
## [1] "factor"
It’s a factor, which is a difficult class to use. For example, it results in crummy plots:
plot(rainfall.annual.wy$WaterYear,rainfall.annual.wy$Precip.in,ylab="Annual Precipitation, inches",xlab="",type="l")
What if you try to convert this factor to numeric?
as.numeric(rainfall.annual.wy$WaterYear)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
Yep, doesn’t work…you need to first convert the factor to a character, and then to numeric. We’ll do that in one line:
rainfall.annual.wy$WaterYear = as.numeric(as.character(rainfall.annual.wy$WaterYear))
Now see if the plot looks better:
plot(rainfall.annual.wy$WaterYear,rainfall.annual.wy$Precip.in,ylab="Annual Precipitation, inches",xlab="",type="l")
# Try the code below with the site.code here, then use the site code for your watershed.
site.code = "11023340" # The USGS streamgage code for Los Penasquitos
readNWISsite(site.code) # Note: doing readNWISsite("11022480") gives the same result.
what.data = whatNWISdata(siteNumber = site.code)
what.data[1:10,] # just look a first 10 records
parameter.code = "00060" # this is the code for stream discharge.
start.date = "" # Blanks get all of the data
end.date = ""
x = readNWISdv(site.code,parameter.code,start.date,end.date)
head(x)
## agency_cd site_no Date X_00060_00003 X_00060_00003_cd
## 1 USGS 11023340 1964-10-01 0.5 A
## 2 USGS 11023340 1964-10-02 0.5 A
## 3 USGS 11023340 1964-10-03 0.5 A
## 4 USGS 11023340 1964-10-04 0.6 A
## 5 USGS 11023340 1964-10-05 0.6 A
## 6 USGS 11023340 1964-10-06 0.6 A
The names for the discharge and QA columns aren’t very nice, so rename them:
names(x)[c(4,5)] = c("Q.ft.s","QA")
head(x)
## agency_cd site_no Date Q.ft.s QA
## 1 USGS 11023340 1964-10-01 0.5 A
## 2 USGS 11023340 1964-10-02 0.5 A
## 3 USGS 11023340 1964-10-03 0.5 A
## 4 USGS 11023340 1964-10-04 0.6 A
## 5 USGS 11023340 1964-10-05 0.6 A
## 6 USGS 11023340 1964-10-06 0.6 A
The quality codes can be found by googling “usgs streamflow data quality flags”, which points you to http://waterdata.usgs.gov/usa/nwis/uv?codes_help.
Scroll down the webiste to “Daily Value Qualification Code” to see what the quality flags mean.
table(x$QA) # Creates a table of the counts of each quality flag value.
# Based on the quality flags and this table, can you use all of the data?
Be sure to rename the columns of the data frame that has the annual mean values. ** Also be sure to convert the year and mean to numeric, if they are factors. **
# What from and to dates should you use for your dataset?
breaks <- seq(from=as.Date("1965-10-01"), to=as.Date("2015-10-01"), by="year")
years.breaks = as.numeric(format(breaks,"%Y"))
labels.wy = years.breaks[2:length(breaks)]
x$WaterYear <- cut(x$Date, breaks,labels=labels.wy)
data.avail.by.wy = aggregate(x$Q.ft.s,by=list(x$WaterYear),FUN=length)
annual.mean.cfs = aggregate(x$Q.ft.s,by=list(x$WaterYear),FUN=mean)
names(annual.mean.cfs) = c("WYear","MeanQcfs")
# The output of aggregate is sometimes a factor, so convert both Year and MeanQ to numeric.
annual.mean.cfs$WYear = as.numeric(as.character(annual.mean.cfs$WYear))
annual.mean.cfs$MeanQcfs = as.numeric(as.character(annual.mean.cfs$MeanQcfs))
head(annual.mean.cfs)
## WYear MeanQcfs
## 1 1966 8.196438
## 2 1967 6.814000
## 3 1968 1.462350
## 4 1969 6.770712
## 5 1970 2.001315
## 6 1971 2.317260
site.data = readNWISsite(site.code)
drain.area.mi2 = site.data$drain_area_va
drain.area.mi2
## [1] 42.1
Call this new annual discharge data frame (in mm), and call it “Q.ann.df”. Use the data.frame command to make sure the result is a data.frame class. Make sure it has column names of “WaterYear” and “Q.ann.mm”.
First, you need to merge the datasets, so you have a data frame with only the years that have data for both.
Q.P.merge = merge(Q.ann.df,rainfall.annual.wy,by.x="WaterYear",by.y="WaterYear")
# Print Q.P.merge to make sure it worked.
Now, calculate precipitation in mm, and give it the name Precip.mm in the dataframe Q.P.merge. I’m not showing the code to do this…
Q.P.merge$Precip.mm = Q.P.merge$Precip.in * 25.4
plot(Q.P.merge$Precip.mm,Q.P.merge$Q.ann.mm,xlab="Annual precipitation, mm",ylab="Annual runoff, mm")
# Add a 1:1 line to see if runoff is ever greater than precipitation.
abline(0,1) # Run ?abline to see how to use it.
Q.P = Q.P.merge$Q.ann.mm/Q.P.merge$Precip.mm
plot(Q.P.merge$WaterYear,Q.P,xlab="Year",ylab="Annual Q/P",type="l")
In order to clarify the relationship between P, and Q/P, plot both on a single, multipanel plot.
par(mfrow=c(2,1)) # mfrow creates a multipanel plot. c(3,1) makes the plot have 3 rows and 1 column.
par(mar=c(0,0,0,0),oma=c(4,4,1,1)) # mar is the margins between each plot. oma is the outer margins of the plot.
# First panel.
plot(Q.P.merge$WaterYear,Q.P.merge$Precip.mm,xaxt="n",type="l")
# xaxt="n" means don't plot the x-axis tics or labels.
axis(side=1,labels=FALSE) # side=1 means the bottom, side=2 means the left side.
mtext("Annual P, mm",side=2,line=2.5) # adds text to the left side (y-axis).
# Second panel.
plot(Q.P.merge$WaterYear,Q.P,xlab="Year",ylab="Annual Q/P",type="l")
mtext("Annual Q/P",side=2,line=2)
Plot Q/P vs P, and color the points by the year
# Generate a vector where the text is "black"
wy = Q.P.merge$WaterYear
colvec = rep("black",times=length(wy))
# Choose some break years, and assign the colors to years in those intervals.
colvec[(wy>1990) & (wy<=2000)] = "grey" # 1991-2000 will be colored grey
colvec[(wy>2000)] = "white" # 2000-2008 will be colored white
plot(Q.P.merge$Precip.mm,Q.P,xlab="Precipitation, mm",ylab="Q/P",bg=colvec,col="black",pch=22)
# "bg" is the argument for the fill color.
# pch is the symbol type
# try ?pch to see the options.
legend("bottomright",c("1965-1990","1991-2000","2001-2008"),pch=22,pt.bg=c("black","grey","white"))
# legend uses the argument "pt.bg" for the background color of the points
A second way to plot different series as different point symbols is to use “points”.
# First, add Q/P to the original dataframe that has all the data. This makes sure that Q/P is attached to each of the smaller dataframes you'll make in the next step.
Q.P.merge$Q.P = Q.P
# Next, separate the one dataframe into 3 separate dataframes by the year.
Q.P.merge.pre.1990 = Q.P.merge[Q.P.merge$WaterYear <= 1990,]
Q.P.merge.1990.2000 = Q.P.merge[(Q.P.merge$WaterYear > 1990) & (Q.P.merge$WaterYear <= 2000),]
Q.P.merge.post.2000 = Q.P.merge[Q.P.merge$WaterYear > 2000,]
# Find the full range of y values in the original dataframe.
y.value.range = range(Q.P.merge$Q.P)
x.value.range = range(Q.P.merge$Precip.mm)
# Now, plot the first series using "plot"
plot(Q.P.merge.pre.1990$Precip.mm,Q.P.merge.pre.1990$Q.P,col="black",pch=22,xlim=x.value.range,ylim=y.value.range)
# Now, add additional series using "points"
points(Q.P.merge.1990.2000$Precip.mm,Q.P.merge.1990.2000$Q.P,col="grey",pch=19)
points(Q.P.merge.post.2000$Precip.mm,Q.P.merge.post.2000$Q.P,col="blue",pch=12)
# Add a legend
legend("bottomright",c("1965-1990","1991-2000","2001-2008"),pch=c(22,19,12),col=c("black","grey","blue"))