This is a prototype for an interactive map that would inform recreational users about the water quality at both swimming beaches and rivers in Iowa. Like the state’s 2024 Integrated Report, it uses data from the 2020-2022 recreational seasons to evaluate which lakes and rivers meet recreational standards. However, the purpose is to educate the public rather than making regulatory determinations. Toward that end, it offers the following improvements.

Importing the data

library(tidyverse)
data_raw <- read_csv("data_raw_AQuIA.csv", 
              col_types = cols(facilityID = col_number(), 
              siteID = col_character(), huc8 = col_character(), 
              sampleDate = col_datetime(format = "%m/%d/%Y %H:%M"),
              quantFlag = col_character(), quantLimit = col_number()))
data <- data_raw %>%
  # Split out date and time
    mutate(sampleDate = as_date(sampleDate), 
           sampleTime = format(as.POSIXct(sampleDate),format = "%H:%M:%S", tz = "America/Chicago"),
           sampleYear = year(sampleDate),
           sampleDate = as_date(sampleDate)) %>%
  # Fix coding of quantFlag
  mutate(upperLimit = ifelse(is.na(quantFlag), FALSE, TRUE)) %>%
  # remove extra columns
  select(siteID, siteName = name, type, county, latitude, longitude, sampleYear, sampleDate, sampleTime, result, unit, detect, detectLimit, upperLimit, quantLimit) %>%
  mutate(SS235 = ifelse(result > 235, TRUE, FALSE),
         SS2880 = ifelse(result > 2880, TRUE, FALSE)  )

The data comes from the Iowa Department of Natural Resources’ AQuIA database. The following query was used to compile all available E. coli data collected from the last four seasons.

In this dataset, E. coli counts are expressed as “Most Probable Number” or MPN/100mL and typically have an lower detection limit of 10 and upper limit of 24,000. Readings outside the bounds of the test are recorded in “result” as the detection limit and flagged in a separate column. In other datasets I have worked with, inconsistent handling of non-detects have resulted in non-numeric values in numeric fields that require cleanup before an average can be computed. Inconsistencies in spelling and capitalization for analyte names can cause part of dataset to be excluded from analysis if not corrected. This dataset requires very little cleanup.

To make the data table easier to work with, some columns not relevant to this analysis are dropped. Time information is split out into date, time, and year columns, and samples that exceed the single sample maximum (235 colonies/100 mL for primary contact, 2,880 for secondary contact) are flagged for later analysis.

Here is what the raw data table looks like.

data_raw

And here is the cleaned up version.

data

Summarizing the recreational season

E. coli bacteria criteria for recreational waters apply from March 15-November 15, when the water is warm enough that recreational use is possible. The filter step used here is a close approximation (the 75th thru 320th day of the year) but may be off by a day or two in a leap year. A minimum of seven samples are needed in a season to apply the standards. The standard has two parts:

1) A geometric mean for the season should not exceed 126 colonies/100mL (primary contact recreation) or 630 colonies/100 mL (secondary contact recreation).

2) No single sample should exceed 235 colonies/100 mL (primary contact) or 2,880 colonies/100mL (secondary contact). If more than 9 samples are collected, no more than 10% of samples should exceed this threshold.

# Calculate geometric mean
geomean <- data %>%
    group_by(siteID, siteName, type, county, latitude, longitude, sampleYear) %>%
  # Null values cause problems
filter(!is.na(result)) %>%
filter(yday(sampleDate) %in% c(75:320)) %>%
  summarize(n_samples = n(),
            detects = sum(detect),
            above_detect = sum(upperLimit),
            exceeds235 = sum(SS235),
            exceeds2880 = sum(SS2880),
            geomean = round(exp(mean(log(result))))) %>%
           # Flag waters that violate primary contact recreation standard
  mutate(
         violation = ifelse(exceeds235 > 0, TRUE, ifelse(geomean> 126, TRUE, FALSE))) %>%
  # Eliminate site-year combos with less than 7 samples
  filter(n_samples >= 7) %>%
  ungroup()
  

geomean

If a water body has sufficient data (7 samples during a recreational season) and either exceeds the geomean standard or the single sample maximum, it could potentially be impaired. Data from 2020-2022 are used for the 2024 assessment cycle.

impaired <- geomean %>%
  select(siteID, siteName, type, county, latitude, longitude, sampleYear, violation) %>%
  filter(sampleYear %in% c(2020, 2021, 2022)) %>%
  group_by(siteID, siteName, type, county, latitude, longitude) %>%
  pivot_wider(names_prefix = "yr", names_from = sampleYear, values_from = violation) %>%
  mutate(yrsImpaired = sum(yr2020, yr2021, yr2022, na.rm = TRUE),
         potentialImpaired = ifelse(yrsImpaired > 0, TRUE, FALSE))
 impaired        

Standards

The color coding on the map is based on the E. coli geometric mean. You can use the radio buttons to select a year, and click on a site to get a pop-up with additional information.

Green = Meets primary contact recreation standard (<126 colonies/100mL)

Yellow = Exceeds primary contact standard but meets secondary contact recreation standard (126-630)

Red = Exceeds secondary contact recreation standard (> 630)

library(leaflet)
color_ecoli <- colorBin(c('green','yellow', 'red'), bins = c(0,126,630, 10000))
popper <-  ~paste0(siteID, "<br/>",
                   siteName,
                   "<br/>E. coli geomean (colonies/100ML): ",
                   geomean,
                          "<br/>Number of samples: ",
                          n_samples,
                  "<br/>Number of samples exceeding 235 colonies/100mL: ",
                  exceeds235,
                  "<br/>Number of samples exceeding 2,880 colonies/100mL: ",
                  exceeds2880)

# Split out bigger data table by type and year
geomean_2023_lakes <- filter(geomean, sampleYear == "2023", type == "Lake")
geomean_2023_rivers <- filter(geomean, sampleYear == "2023", type == "River/Stream")
geomean_2022_lakes <- filter(geomean, sampleYear == "2022", type == "Lake")
geomean_2022_rivers <- filter(geomean, sampleYear == "2022", type == "River/Stream")
geomean_2021_lakes <- filter(geomean, sampleYear == "2021", type == "Lake")
geomean_2021_rivers <- filter(geomean, sampleYear == "2021", type == "River/Stream")
geomean_2020_lakes <- filter(geomean, sampleYear == "2020", type == "Lake")
geomean_2020_rivers <- filter(geomean, sampleYear == "2020", type == "River/Stream")

Lakes

leaflet() %>%
  addTiles() %>%
  addCircleMarkers(data = geomean_2023_lakes, group = "2023", 
                   lng = ~longitude, lat = ~latitude,
                   color = ~color_ecoli(geomean),
                   radius = 6,
                   stroke = FALSE, fillOpacity = 100,
                   popup = popper) %>%
    addCircleMarkers(data = geomean_2022_lakes, group = "2022", 
                   lng = ~longitude, lat = ~latitude,
                   color = ~color_ecoli(geomean),
                   radius = 6,
                   stroke = FALSE, fillOpacity = 100,
                   popup = popper) %>%
  addCircleMarkers(data = geomean_2021_lakes, group = "2021", 
                   lng = ~longitude, lat = ~latitude,
                   color = ~color_ecoli(geomean),
                   radius = 6,
                   stroke = FALSE, fillOpacity = 100,
                   popup = popper) %>%
    addCircleMarkers(data = geomean_2020_lakes, group = "2020", 
                   lng = ~longitude, lat = ~latitude,
                   color = ~color_ecoli(geomean),
                   radius = 6,
                   stroke = FALSE, fillOpacity = 100,
                   popup = popper) %>%
      # Hydrography
  addWMSTiles(
              "https://basemap.nationalmap.gov/arcgis/services/USGSHydroCached/MapServer/WmsServer",
              opt = WMSTileOptions(format = "image/png", transparent = TRUE),
                              layers = "0", group = "Hydrography", attribution = "USGS Hydrography") %>%
  
  # Layers control
  addLayersControl(baseGroups = c("2023","2022","2021","2020"),
                   overlayGroups = c("Hydrography"),
                   options = layersControlOptions(collapsed = FALSE))

Streams

leaflet() %>%
  addTiles() %>%
  addCircleMarkers(data = geomean_2023_rivers, group = "2023", 
                   lng = ~longitude, lat = ~latitude,
                   color = ~color_ecoli(geomean),
                   radius = 6,
                   stroke = FALSE, fillOpacity = 100,
                   popup = popper) %>%
    addCircleMarkers(data = geomean_2022_rivers, group = "2022", 
                   lng = ~longitude, lat = ~latitude,
                   color = ~color_ecoli(geomean),
                   radius = 6,
                   stroke = FALSE, fillOpacity = 100,
                   popup = popper) %>%
  addCircleMarkers(data = geomean_2021_rivers, group = "2021", 
                   lng = ~longitude, lat = ~latitude,
                   color = ~color_ecoli(geomean),
                   radius = 6,
                   stroke = FALSE, fillOpacity = 100,
                   popup = popper) %>%
    addCircleMarkers(data = geomean_2020_rivers, group = "2020", 
                   lng = ~longitude, lat = ~latitude,
                   color = ~color_ecoli(geomean),
                   radius = 6,
                   stroke = FALSE, fillOpacity = 100,
                   popup = popper) %>%
      # Hydrography
  addWMSTiles(
              "https://basemap.nationalmap.gov/arcgis/services/USGSHydroCached/MapServer/WmsServer",
              opt = WMSTileOptions(format = "image/png", transparent = TRUE),
                              layers = "0", group = "Hydrography", attribution = "USGS Hydrography") %>%
  
  # Layers control
  addLayersControl(baseGroups = c("2023","2022","2021","2020"),
                   overlayGroups = c("Hydrography"),
                   options = layersControlOptions(collapsed = FALSE))