Introduction

Now that you know a bit more about the dataRetrieval package, let’s use it to import and examine water quality data.

Water quality data includes parameters such as pH, temperature, and concentrations of things like chloride, nitrates, phosphorus, chlorophyll A, and many others. To see a full list of the available parameters and their 5-digit ID code, either visit [this USGS website link] (https://help.waterdata.usgs.gov/code/parameter_cd_query?fmt=rdb&inline=true&group_cd=%) or use the code below:

##       parameter_cd parameter_group_nm
## 16975        80295           Sediment
## 16976        80296           Sediment
## 16977        80297           Sediment
## 16978        80298           Sediment
## 16979        91145           Sediment
## 16980        99409           Sediment
##                                                                                                                                                       parameter_nm
## 16975                                                              Suspended sediment load, water, unfiltered, estimated by regression equation, pounds per second
## 16976                                                          Suspended sediment load, water, unfiltered, estimated by regression equation, short tons per second
## 16977 Suspended sediment load, water, unfiltered, computed, the product of regression-computed suspended sediment concentration and streamflow, short tons per day
## 16978                                  Suspended sediment load, water, unfiltered, regression computed, turbidity and streamflow as regressors, short tons per day
## 16979                                                                                                       Bedload sediment, total sample mass, dry weight, grams
## 16980                                                  Suspended sediment concentration, water, unfiltered, estimated by regression equation, milligrams per liter
##       casrn  srsname parameter_units
## 16975  <NA>     <NA>            lb/s
## 16976  <NA>     <NA>          tons/s
## 16977  <NA>     <NA>        tons/day
## 16978  <NA>     <NA>        tons/day
## 16979  <NA>     <NA>               g
## 16980  <NA> Sediment            mg/l

Example 1

Let’s import and analyze some water quality data. This code will pull phosphorus discharge (pounds per day) and suspended solids discharge (short tons per day) for Honey Creek in Wauwatosa.

parametercodes <- c("91050","91055")
startT <- "2008-12-01"
endT <- "2009-09-29"
wqdata <- readNWISdv(siteNumbers = "04087119", parameterCd = parametercodes, startDate = startT, endDate = endT )
wqdata <- renameNWISColumns(wqdata)
head(wqdata)
##   agency_cd  site_no       Date X_91050 X_91050_cd X_91055 X_91055_cd
## 1      USGS 04087119 2008-12-01    2.29          A   0.060          A
## 2      USGS 04087119 2008-12-02    0.61          A   0.010          A
## 3      USGS 04087119 2008-12-03    0.75          A   0.020          A
## 4      USGS 04087119 2008-12-04    0.49          A   0.010          A
## 5      USGS 04087119 2008-12-05    0.27          A   0.006          A
## 6      USGS 04087119 2008-12-06    0.27          A   0.005          A

Example 2

Next, let’s divide the phosphorus by the sediment data. This new column will give the percentage of phosphorus in the suspended solids. Since one short ton = 2000 pounds, first we’ll perform a unit conversion so that the percentage is accurate. Then, we’ll plot the percentage over time to see how that changes.

wqdata$percent <- ((wqdata$X_91050 / 2000) / wqdata$X_91055) * 100
ggplot(wqdata, aes(x=Date, y=percent)) + geom_point()

It appears that the highest percentage of phosphorus occured in January or February.

For the next analysis, we’ll first add a column of just the Month that the sample was taken. The month() command is part of the lubridate package.

The “label” option allows you to choose if you would like to use a month label like “Dec” or “Feb” or a numeric label like “01” or “12”.

wqdata$months <- month(wqdata$Date, label=TRUE)
head(wqdata)
##   agency_cd  site_no       Date X_91050 X_91050_cd X_91055 X_91055_cd
## 1      USGS 04087119 2008-12-01    2.29          A   0.060          A
## 2      USGS 04087119 2008-12-02    0.61          A   0.010          A
## 3      USGS 04087119 2008-12-03    0.75          A   0.020          A
## 4      USGS 04087119 2008-12-04    0.49          A   0.010          A
## 5      USGS 04087119 2008-12-05    0.27          A   0.006          A
## 6      USGS 04087119 2008-12-06    0.27          A   0.005          A
##    percent months
## 1 1.908333    Dec
## 2 3.050000    Dec
## 3 1.875000    Dec
## 4 2.450000    Dec
## 5 2.250000    Dec
## 6 2.700000    Dec

Next we’ll use ggplot to create box plots of the phosphorus percentages, grouped by date.

ggplot(wqdata, aes(x=months, y=percent)) + geom_boxplot()

This plot shows that the highest percentages of phosphorus content were in January and December.

Conclusion

With the dataRetrieval and ggplot tools previewed here, many different kinds of analysis are possible. Water Quality data is often provided free from government agencies like the USGS, and data records go back decades, or even longer for some streams.