This document summarizes and explores the water quality data to be used in my analysis of Mystic River bacteria data at outfalls. (Geographic data to come.)
Water quality data was extracted from the Mystic River Watershed Association (MyRWA) Water Quality Database. The database stores water quality data collected for multiple sampling programs conducted by MyRWA, as well as data shared from other agencies and organizations. The database contains important metadata such as field and lab methods as well as quality assurance information to ensure all data are properly characterized and used appropriately.
To extract the data, I used the myrwaR R package, currently under development by MyRWA. This package contains R functions for loading data from the MyRWA Database, merging precipitation data with water quality records, and computing wet/dry conditions.
Precipitation information was obtained by merging the sampling data with an hourly precipitation dataset obtained for Logan Airport from the Northeast Regional Climate Center and the NOAA Climate Data Online warehouse. Precipitation data is used to characterize each water quality sample as either dry or wet weather. This classification is based on a threshold of 48-hour antecedent precipitation > 0.25“. This default threshold parameter can be adjusted.
The core set of water quality data used in this analysis comes from the 15-year old Hotspot sampling program. This program tests for bacteria and other parameters at stormwater outfalls and in streams.
The entire Hotspot dataset (ProjectID = HOTSPOT) can be found here: Hotspot_all_data.csv.
Each observation (row) represents a measured parameter (or Characteristic) at a location at a time. Each row includes fields characterizing the sampling event (Datetime, LocationID, VisitID, ProjectID, SampleTypeID); the measurement (CharacteristicID/Name, ResultValue, Units, Qualifier, FlagID); the location (MunicipalityID, WaterBodyID, Latitude, Longitude, LocationTypeID/Name); and the weather (Precipation in last 48 hours, Wet/Dry code).
The hotspot program includes data both streams (LocationTypeID=22) and stormwater sewer outfalls (LocationTypeID= 27). Summaries below are for outfall data only.
Breakdown of total outfall locations in the watershed, number of outfalls tested, and number of bacteria samples is as follows:
| Total Outfalls | Outfalls Tested | Number of Samples |
|---|---|---|
| 1757 | 376 | 1233 |
The following tables shows the same totals broken out by town.
| MunicipalityID | TotalOutfalls | OutfallsTested | NumSamples |
|---|---|---|---|
| Arlington | 130 | 40 | 208 |
| Belmont | 23 | 13 | 81 |
| Boston | 51 | 21 | 36 |
| Burlington | 28 | 5 | 5 |
| Cambridge | 35 | 12 | 83 |
| Chelsea | 84 | 47 | 197 |
| East Boston | 35 | 17 | 44 |
| Everett | 6 | 6 | 11 |
| Lexington | 79 | 4 | 14 |
| Malden | 35 | 26 | 114 |
| Medford | 237 | 33 | 59 |
| Melrose | 25 | 20 | 54 |
| Revere | 49 | 25 | 71 |
| Somerville | 29 | 29 | 113 |
| Stoneham | 54 | 10 | 20 |
| Winchester | 102 | 23 | 48 |
| Winthrop | 5 | 5 | 15 |
| Woburn | 735 | 40 | 60 |
This figure shows the distribution of dry-weather (green) and wet-weather (blue) values for three bacteria parameters across all events. Log of MPN/100 ml is shown. Red line is the regulatory standard for safe boating for each parameter, a standard used in appraising outfalls.
The figure illustrates:
This figure shows the distribution of values at the outfalls with the largest number of E. coli measurements.
The figure illustrates: