Now that you know a bit more about the dataRetrieval package, let’s use it to import and examine water quality data.
Water quality data includes parameters such as pH, temperature, and concentrations of things like chloride, nitrates, phosphorus, chlorophyll A, and many others. To see a full list of the available parameters and their 5-digit ID code, either visit [this USGS website link] (https://help.waterdata.usgs.gov/code/parameter_cd_query?fmt=rdb&inline=true&group_cd=%) or use the code below:
## parameter_cd parameter_group_nm
## 16975 80295 Sediment
## 16976 80296 Sediment
## 16977 80297 Sediment
## 16978 80298 Sediment
## 16979 91145 Sediment
## 16980 99409 Sediment
## parameter_nm
## 16975 Suspended sediment load, water, unfiltered, estimated by regression equation, pounds per second
## 16976 Suspended sediment load, water, unfiltered, estimated by regression equation, short tons per second
## 16977 Suspended sediment load, water, unfiltered, computed, the product of regression-computed suspended sediment concentration and streamflow, short tons per day
## 16978 Suspended sediment load, water, unfiltered, regression computed, turbidity and streamflow as regressors, short tons per day
## 16979 Bedload sediment, total sample mass, dry weight, grams
## 16980 Suspended sediment concentration, water, unfiltered, estimated by regression equation, milligrams per liter
## casrn srsname parameter_units
## 16975 <NA> <NA> lb/s
## 16976 <NA> <NA> tons/s
## 16977 <NA> <NA> tons/day
## 16978 <NA> <NA> tons/day
## 16979 <NA> <NA> g
## 16980 <NA> Sediment mg/l
Let’s import and analyze some water quality data. This code will pull phosphorus discharge (pounds per day) and suspended solids discharge (short tons per day) for Honey Creek in Wauwatosa.
parametercodes <- c("91050","91055")
startT <- "2008-12-01"
endT <- "2009-09-29"
wqdata <- readNWISdv(siteNumbers = "04087119", parameterCd = parametercodes, startDate = startT, endDate = endT )
wqdata <- renameNWISColumns(wqdata)
head(wqdata)
## agency_cd site_no Date X_91050 X_91050_cd X_91055 X_91055_cd
## 1 USGS 04087119 2008-12-01 2.29 A 0.060 A
## 2 USGS 04087119 2008-12-02 0.61 A 0.010 A
## 3 USGS 04087119 2008-12-03 0.75 A 0.020 A
## 4 USGS 04087119 2008-12-04 0.49 A 0.010 A
## 5 USGS 04087119 2008-12-05 0.27 A 0.006 A
## 6 USGS 04087119 2008-12-06 0.27 A 0.005 A
Next, let’s divide the phosphorus by the sediment data. This new column will give the percentage of phosphorus in the suspended solids. Since one short ton = 2000 pounds, first we’ll perform a unit conversion so that the percentage is accurate. Then, we’ll plot the percentage over time to see how that changes.
wqdata$percent <- ((wqdata$X_91050 / 2000) / wqdata$X_91055) * 100
ggplot(wqdata, aes(x=Date, y=percent)) + geom_point()
It appears that the highest percentage of phosphorus occured in January or February.
For the next analysis, we’ll first add a column of just the Month that the sample was taken. The month() command is part of the lubridate package.
The “label” option allows you to choose if you would like to use a month label like “Dec” or “Feb” or a numeric label like “01” or “12”.
wqdata$months <- month(wqdata$Date, label=TRUE)
head(wqdata)
## agency_cd site_no Date X_91050 X_91050_cd X_91055 X_91055_cd
## 1 USGS 04087119 2008-12-01 2.29 A 0.060 A
## 2 USGS 04087119 2008-12-02 0.61 A 0.010 A
## 3 USGS 04087119 2008-12-03 0.75 A 0.020 A
## 4 USGS 04087119 2008-12-04 0.49 A 0.010 A
## 5 USGS 04087119 2008-12-05 0.27 A 0.006 A
## 6 USGS 04087119 2008-12-06 0.27 A 0.005 A
## percent months
## 1 1.908333 Dec
## 2 3.050000 Dec
## 3 1.875000 Dec
## 4 2.450000 Dec
## 5 2.250000 Dec
## 6 2.700000 Dec
Next we’ll use ggplot to create box plots of the phosphorus percentages, grouped by date.
ggplot(wqdata, aes(x=months, y=percent)) + geom_boxplot()
This plot shows that the highest percentages of phosphorus content were in January and December.
With the dataRetrieval and ggplot tools previewed here, many different kinds of analysis are possible. Water Quality data is often provided free from government agencies like the USGS, and data records go back decades, or even longer for some streams.