To better intrepret the data collected by volunteers in our water quality snapshots, we’ll look at seasonal patterns in a high frequency, long-term dataset. The Ames Water and Pollution Control Department has been monitoring water quality at three locations on the South Skunk River, above and below the wastewater treatment plant, on a weekly basis since 2003.
# import from csv.
# Ames Water and Pollution Control's Lab has provided data from their three sites on the South Skunk River from Jan 2003 to May 2021. The most recent date in this version is 2021-05-05
# Note that the LabID is unique for each batch of samples collected on the same date and place
# Two columns are not imported. "Comment" is for reporting limits on CBOD and can be ignored here.
# CollectionTime is usually blank.
ames_data <- read_csv("data/wpc_skunk_2003-2021-05.csv",
col_types = cols(CollectionDate = col_date(format = "%m/%d/%Y"),
CollectionTime = col_time(), LabID = col_integer(), Comment = col_skip(),
Note = col_character(), Symbol = col_character())) %>%
rename(Site = Description, Date = CollectionDate) %>%
# Parse date into year and month (3 letter abr) and week for later subgroup analysis
mutate(Year = year(Date), Month = month(Date, label = TRUE, abbr = TRUE), Week = week(Date)) %>%
# combine two similar methods for measuring nitrate. Methodology changed.
mutate(Analyte = ifelse(Analyte == "Nitrate + Nitrite Nitrogen as N", "Nitrate Nitrogen as N", Analyte)) %>%
# Drop most recent few months of data, to focus on complete years
filter(Year < 2021)
# Import weather data
weather_raw <- read_csv("data/weather-AMW-2003-2020.csv",
col_types = cols(day = col_date(format = "%Y-%m-%d"),
max_temp_f = col_double(), station = col_skip()),
skip = 3, na = "None")
# Precipitation, in inches, over past two days
weather <- select(weather_raw, date = day, precip_in)
# Replace null cells with zero
weather[is.na(weather)] <- 0
# Summarize last two days of precipitation
weather <- rename(weather, Date = date) %>%
mutate(precip_yesterday = lag(precip_in, default = 0)) %>%
mutate(precip_2day = (precip_in + precip_yesterday),
weather = ifelse(precip_2day >= 1.25, "1.25 inch, Heavy rain", ifelse(precip_2day < 0.1, "0 in, no rain", "0.1 in, Light rain")))
# Join to table
ames_wpc <- left_join(ames_data, weather, by = "Date")
“Box and whiskers” plots pack a lot of information into one graph. You may be familiar with percentiles from school, where they are sometimes used to display standardized test results. The bottom, middle, and top of the box represents the 25th percentile, 50th percentile (median), and 75th percentiles. The whiskers normally extend to the minimum and maximum values, but there are points a long way from the median (1.5 times the “interquartile range”) they are considered outliers and marked separately. Nitrate has a wide box (interquartile range) and few outliers. Phosphorus has a narrow box (interquartile range) and many outliers–phosphorus is usually close to zero, but on the days when it high, it can get really high!
library(patchwork)
filter(ames_wpc, Site == "River - Upstream", Analyte == "Nitrate Nitrogen as N") %>%
ggplot(aes(x = Site, group = Site, y = Result)) +
geom_boxplot(alpha = 0.5) +
geom_text(y = 10, label = "Median: 8.4 mg/L") +
geom_text(y = 5, label = "25th percentile: 4.0 mg/L") +
geom_text(y = 14, label = "75th percentile: 13.0 mg/L") +
labs(y = "Nitrate-N (mg/L)") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.x = element_blank() )+
filter(ames_wpc, Site == "River - Upstream", Analyte == "Total Phosphorus as P") %>%
ggplot(aes(x = Site, group = Site, y = Result)) +
geom_boxplot(alpha = 0.5) +
labs(y = "Total Phosphorus (mg/L)") +
geom_text(y = 0.3, label = "Median: 0.20 mg/L") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.x = element_blank() )+
# Layout, using patchwork library
plot_layout(nrow = 1) +
plot_annotation(title = "Examples of boxplots")
To help you visualize what’s going on, we’ll also add individual data points (color coded by weather conditions) and the mean (shown as a triangle). The median concentration is what you might encounter on a typical day–relevant for drinking water or fisheries. The mean concentration is what you’d encounter if you pooled all the samples–a better reprentation of the impacts to the Gulf of Mexico. In a data set with many outliers, like phosphorus (or to give a more familiar example, average income) these will not be the same. The median phosphorus concentration is what we might see on a typical day (0.20 mg/L), which is relevant for drinking water or fisheries. The mean phosphorus concentraiton (0.28 mg/L) is what you’d measure if you pooled all your water samples; this is relevant for impacts to downstream waters like the Gulf of Mexico.
library(patchwork)
filter(ames_wpc, Site == "River - Upstream", Analyte == "Nitrate Nitrogen as N") %>%
ggplot(aes(x = Site, group = Site, y = Result)) +
geom_jitter(aes(color= weather), alpha = 0.25) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
aes(group = Site), fill = "black", position = position_dodge2(1)) +
labs(y = "Nitrate-N (mg/L)", color = "Inches of rain in past two days") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.x = element_blank() )+
filter(ames_wpc, Site == "River - Upstream", Analyte == "Total Phosphorus as P") %>%
ggplot(aes(x = Site, group = Site, y = Result)) +
geom_jitter(aes(color= weather), alpha = 0.25) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
aes(group = Site), fill = "black", position = position_dodge2(1)) +
labs(y = "Total Phosphorus (mg/L)", color = "Inches of rain in past two days") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.x = element_blank() )+
# Layout, using patchwork library
plot_layout(nrow = 1) +
plot_annotation(title = "Examples of boxplots")
filter(ames_wpc, Site == "River - Upstream", Analyte == "Nitrate Nitrogen as N") %>%
ggplot(aes(x = Site, y = Result)) +
geom_boxplot(alpha = 0.5) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
fill = "black", position = position_dodge2(1)) +
labs(y = "Nitrate-N (mg/L)", x = "Entire year") +
theme(axis.text.x = element_blank())+
filter(ames_wpc, Site == "River - Upstream", Analyte == "Nitrate Nitrogen as N") %>%
ggplot(aes(x = Month, group = Month, y = Result)) +
geom_jitter(aes(color= weather), alpha = 0.25) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
aes(group = Month), fill = "black", position = position_dodge2(1)) +
labs(y = "Nitrate-N (mg/L)", color = "Inches of rain in past two days") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank() )+
plot_layout(nrow = 1, widths = c(1,3)) +
plot_annotation(title = "Nitrate in the South Skunk River")
Median nitrate concentratins in the South Skunk River are 8.4 mg/L. Nitrate concentrations in the South Skunk are usually higher than this during our spring snapshot in May. Nitrate concentrations in the South Skunk are usually lower this during fall snapshot in October. In October, nitrate concentrations can vary a lot depending on whether river levels are high or low. Nitrate concentrations are often diluted by a heavy rain, but do tend to be higher when drain tiles are flowing and rivers are high.
How dirty is the water–literally? There are several ways to measure sediment in the water. Laboratories allow sediment to settle out of a water sample, then dry and weigh it–this is called total suspended solids (TSS). Volunteers measure water clarity by letting water out of a secchi tube until the pattern at the bottom becomes visible, and then recording the depth of water. Instruments can be installed to continuously measure turbidity, a measure of how light is scattered by floating particles in the water.
Totally suspended solids usually low, but during large storms or spring snowmelt (see February and March), the water can very muddy. TSS has a highly skewed distribution: a few large storms (especially two in May) having a disproportionate impact on how much sediment is sent downstream.
filter(ames_wpc, Site == "River - Upstream", Analyte == "Total Suspended Solids") %>%
ggplot(aes(x = Site, y = Result)) +
geom_boxplot(alpha = 0.5) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
fill = "black", position = position_dodge2(1)) +
labs(y = "Total suspended solids (mg/L)", x = "Entire year") +
theme(axis.text.x = element_blank())+
filter(ames_wpc, Site == "River - Upstream", Analyte == "Total Suspended Solids") %>%
ggplot(aes(x = Month, group = Month, y = Result)) +
geom_jitter(aes(color= weather), alpha = 0.25) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
aes(group = Month), fill = "black", position = position_dodge2(1)) +
labs(color = "Inches of rain in past two days") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank() )+
plot_layout(nrow = 1, widths = c(1,3)) +
plot_annotation(title = "Sediment in the South Skunk River")
Typical conditions are easier to see on a logarithmic scale (where each tic mark is a factor of ten increase). Median sediment concentrations in the South Skunk River are 25 mg/L. During our May snapshot, concentrations are usually higher than this, but we usually schedule the event to avoid heavy rains. In October, average sediment concentrations are lower, but can vary widely depending on weather conditions and streamflow.
filter(ames_wpc, Site == "River - Upstream", Analyte == "Total Suspended Solids") %>%
ggplot(aes(x = Site, y = Result)) +
geom_boxplot(alpha = 0.5) +
scale_y_log10() +
stat_summary(fun=mean, geom="point", shape=24, size=3,
fill = "black", position = position_dodge2(1)) +
labs(y = "Total suspended solids (mg/L)", x = "Entire year") +
theme(axis.text.x = element_blank())+
filter(ames_wpc, Site == "River - Upstream", Analyte == "Total Suspended Solids") %>%
ggplot(aes(x = Month, group = Month, y = Result)) +
geom_jitter(aes(color= weather), alpha = 0.25) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
scale_y_log10() +
stat_summary(fun=mean, geom="point", shape=24, size=3,
aes(group = Month), fill = "black", position = position_dodge2(1)) +
labs(color = "Inches of rain in past two days") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank() )+
plot_layout(nrow = 1, widths = c(1,3)) +
plot_annotation(title = "Sediment in the South Skunk River, log scale")
Phosphorus can be bound to soil, so the patterns are similar to what we see for sediment: we often see high levels after heavy rains or during spring snowmelt. Unlike sediment, average monthly concentrations are highest in February and March. This could be related to manure.
filter(ames_wpc, Site == "River - Upstream", Analyte == "Total Phosphorus as P") %>%
ggplot(aes(x = Site, y = Result)) +
geom_boxplot(alpha = 0.5) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
fill = "black", position = position_dodge2(1)) +
labs(y = "Total phosphorus (mg/L)", x = "Entire year") +
theme(axis.text.x = element_blank())+
filter(ames_wpc, Site == "River - Upstream", Analyte == "Total Phosphorus as P") %>%
ggplot(aes(x = Month, group = Month, y = Result)) +
geom_jitter(aes(color= weather), alpha = 0.25) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
aes(group = Month), fill = "black", position = position_dodge2(1)) +
labs(y = "Total phosphorus (mg/L)", color = "Inches of rain in past two days") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank() )+
plot_layout(nrow = 1, widths = c(1,3)) +
plot_annotation(title = "Phosphorus in the South Skunk River")
For the snapshot events, we don’t send our volunteers out in the rain and we measure just orthophosphate, the form that is dissolved in water. Thus, we can expect that phosphorus results from volunteer snapshots will underestimate phosphorus concentrations. The City of Ames measures total phosphorus every week, and orthophosphate once a month. Here are the two forms compared.
filter(ames_wpc, Site == "River - Upstream", Analyte %in% c("Orthophosphate Phosphorus as P", "Total Phosphorus as P")) %>%
ggplot(aes(x = Analyte, group = Analyte, y = Result)) +
geom_jitter(aes(color= weather), alpha = 0.25) +
geom_boxplot(alpha = 0.5, , outlier.shape = NA) +
ylim(0, 2.5) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
fill = "black", position = position_dodge2(1)) +
labs(y = "Total phosphorus (mg/L)", x = "Entire year", color = "Inches of rain in past two days")
Many species of fish and aquatic life are sensitive to the amount of dissolved oxygen in the water. Dissolved oxygen is lowest in August (partly temperature, poss) and highest in January. Summer lows are partly a result of temperature (warmer water can’t hold as much oxygen) and but could related to algae blooms and nutrient enrichment. Oxygen also has a daily cycle–lowest overnight when plants and algae are respiring, highest in afternoon when plants and algae are photosynthesizing.
During May and October snapshot events, oxygen is usually at a level typical for the year. Dissolved oxygen also has a daily cycle–lowest overnight to early morning and highest in afternoon.
filter(ames_wpc, Site == "River - Upstream", Analyte == "Dissolved Oxygen") %>%
ggplot(aes(x = Site, y = Result)) +
geom_boxplot(alpha = 0.5) +
ylim(0,20) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
fill = "black", position = position_dodge2(1)) +
labs(y = "Dissolved Oxygen (mg/L)", x = "Entire year") +
theme(axis.text.x = element_blank())+
filter(ames_wpc, Site == "River - Upstream", Analyte == "Dissolved Oxygen") %>%
ggplot(aes(x = Month, group = Month, y = Result)) +
geom_jitter(aes(color= weather), alpha = 0.25) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
ylim(0,20) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
aes(group = Month), fill = "black", position = position_dodge2(1)) +
labs(y = "Dissolved Oxygen (mg/L)", color = "Inches of rain in past two days") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank() )+
plot_layout(nrow = 1, widths = c(1,3)) +
plot_annotation(title = "Oxygen in the South Skunk River")
Because bedrock is deeply buried by glacial till, coldwater trout streams are rare in central Iowa. Temperatures range from near zero degrees Celcius in winter to around 20 degrees Celcius in July. Tempeartures during the October snapshot are more variable but generally colder than in May.
# Note: there are few outliers where it appears that temperature was measured in F instead of C
filter(ames_wpc, Site == "River - Upstream", Analyte == "Temperature") %>%
ggplot(aes(x = Site, y = Result)) +
geom_boxplot(alpha = 0.5) +
ylim(0,30) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
fill = "black", position = position_dodge2(1)) +
labs(y = "Water Temperature (Degrees Celcius)", x = "Entire year") +
theme(axis.text.x = element_blank())+
filter(ames_wpc, Site == "River - Upstream", Analyte == "Temperature") %>%
ggplot(aes(x = Month, group = Month, y = Result)) +
geom_jitter(aes(color= weather), alpha = 0.25) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
ylim(0,30) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
aes(group = Month), fill = "black", position = position_dodge2(1)) +
labs(y = "Degrees Celcius", color = "Inches of rain in past two days") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank() )+
plot_layout(nrow = 1, widths = c(1,3)) +
plot_annotation(title = "Water temperature in the South Skunk River")
Acidity (pH) of the South Skunk River mostly stays between 7 and 9. Rainwater is more acidic than groundwater in our area, so we see pH drop after heavy rains. pH tends to be higher in spring than in fall, but the differences are too small, and test strips are not sensitive enough (measuring 7, 8, or 9) for this to be apparent in volunteer snapshot events.
filter(ames_wpc, Site == "River - Upstream", Analyte == "pH") %>%
ggplot(aes(x = Site, y = Result)) +
geom_boxplot(alpha = 0.5) +
ylim(5,10) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
fill = "black", position = position_dodge2(1)) +
labs(y = "pH", x = "Entire year") +
theme(axis.text.x = element_blank())+
filter(ames_wpc, Site == "River - Upstream", Analyte == "pH") %>%
ggplot(aes(x = Month, group = Month, y = Result)) +
geom_jitter(aes(color= weather), alpha = 0.25) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
ylim(5,10) +
stat_summary(fun=mean, geom="point", shape=24, size=3,
aes(group = Month), fill = "black", position = position_dodge2(1)) +
labs(y = "pH", color = "Inches of rain in past two days") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank() )+
plot_layout(nrow = 1, widths = c(1,3)) +
plot_annotation(title = "Acidity (pH) in the South Skunk River")