Using R to Automate Workflows for a Water Quality Management System

Alex Manos, Pinellas County Division of Environmental Management

Agenda

1

Sampling Program

Overview of Pinellas County water quality sampling program design and implementation

2

Open Source Tools

Open-source tools used to automate processes for the water quality sampling program

3

Data
Products

Publicly available tools and services developed to facilitated data sharing and transparecny

Sampling Program

Tanim & Tobin 2018

Water Quality Monitoring Program

  • Sampling divided into 2 parts:

    • Fixed Sites (streams)

    • Stratified random (lakes, coastal marine)

  • 8 sampling periods per year for wet/dry season coverage
  • Nutrients, chl-a, water clarity, bacteria, in-situ
  • Sampling divided into “runs”, each with tidally influenced streams

Stream Sampling Date Selection

  • Stream sites selected in advanced based on watershed coverage
  • Daily tide heights checked at tidally influenced sites

    • Sampling dates selected based on optimal outgoing tide
    • Water quality should capture what’s leaving the watershed

Stratified Random Sampling

  • Hexagonal grids overlayed on each stratum
    • Coastal waters and large lakes
  • Random sites selected within each stratum to obtain 4 sample locations per stratum per period (32/year)
  • Captures water quality variability across large water bodies

Sampling Plan – Start to Finish

Get sampling dates/locations
Request sampling bottles from county utilities laboratory
Conduct ambient water quality sampling
Compile data from all data sources
Perform QA/QC on all the data and make corrections when applicable
Upload data to FDEP WIN
Submit data to other partners (USF Water Atlas)
Conduct statistical analysis for trends, nutrient loading, etc.

Goals for Automation

  1. Use open source tools to automate as much of the sampling program as possible
  1. Publish code and data for transparency and reproducibility


# Get landrun dates for each period and make sure no more than 2 of the same 
# dates are selected:
LRdate <- function(x){
  s <- FALSE
  while (!s){
  y <- x |>
    group_by(LR) |>
    sample_n(1) |>
    ungroup()
  z <- unique((y |> add_count(Date))$n)
  if (length(which(z > 2)) == 0) s <- TRUE
 }
  return(y)
}

Open Source Tools

Stream Sampling Dates

  • No previous automation
  • Tide charts inspected for each site for optimal sampling dates
  • rtide package used to automatically pull tidal height
  • Algorithm developed to select optimal sampling dates based on time of outgoing tide

Randomized Strata Sampling – Old

  • SAS code 20+ years old and divided into multiple files

  • Only 1 license available within our group

  • Limited staff knowledge of SAS

Randomized Strata Sampling – New

  • Old SAS code converted to R

  • Code available to all staff to view, edit, run

  • In-house expertise allows for customization

  • Combined with stream date code to generate yearly sampling schedule for streamlined process

Sample Bottle Kits

Data QA/QC Semi-Automation

Data QA/QC Semi-Automation

  • Previous automation processes were limited to HACH systems, all other QA/QC checks were done manually (Time/date, missing data)
  • Current process performs all checks automatically and generates PDF report detailing each check conducted

    • Reproducible QA/QC
    • Provides digital paper trail
  • Processing time before automation: 5-8 weeks
  • Processing time after automation: 2-4 weeks

Improvements in Productivity

  • Open-source tools provide benefits to all aspect of sampling program:

    • Faster processing time
    • Less chance for human error
    • Increased data reliability
    • Faster output of trends and other data products


  • Staff have more time to focus on other projects
Without Automation
Task Annual Time
Sampling Dates 20 hours
Bottle Kit Request 8 hours
Data QA/QC 160 hours
Reformatting Data 20 hours
Total 208 hours


With Automation
Task Annual Time
Sampling Dates 4 hours
Bottle Kit Request 8 minutes
Data QA/QC 24 hours
Reformatting Data 2 hours
Total 30 hours

Data Products

Faster Data Updates

  • Many data partners with the county

    • USF Water Atlas updates
    • FDEP WIN
    • Local municipalities

Water Quality Dashboard

https://pcdem.shinyapps.io/dashboard/

Keeping an Open Mind

  • Development of division Github account to provide public access to:

    • Code
    • Datasets
    • GIS files
  • Future efforts:

    • Provide public access to more of DEM data
    • Create more applications for public use
    • Package development?

Thank you!