The existing grading scheme for the Mystic has a few virtues:

Water quality is tied (more or less clearly) to recreation standards
It is a fairly simple calculation that people can understand
It weights wet- and dry-weather data in ways that are realistic
It is drawn from MyRWA baseline data, which puts MyRWA at the center of the conversation

It has flaws:

The baseline sampling scheme radically under-represents the Mystic River, especially in the fresh water section. There is only one freshwater sampling site in the fresh water section, very near Mystic Lakes.
The grade conflates data from different water bodies, by simple averaging, regardless of the size of the stream (or lake) or number of sites on that water body, resulting in a vague, cherry-picked, and hard-to-interpret “watershed” grade
The grade therefore tells little directly about water quality in the Mystic River itself, which causes great problems in communicating clearly.

The simplest proposal for a new grading rubric would be one that applies the same calculations used for the old grade, but broken out by water body. To get enough representative data from the Mystic River we need to look to MWRA data from the freshwater section of the Mystic.

The following calculations try to keep as much of the old system as possible, while addressing its weaknesses.

Load Water Quality Data

library(RODBC)
library(reshape2)
library(lubridate)
library(dplyr)
library(pander)
library(ggplot2)
library(dplyr)
library(knitr)
library(tidyr)

setwd("\\\\psf/Home/Dropbox/MysticDB")
source("./Rcode/Sandbox/Jeff/wq-precip-scripts/load_wq.R")
wq <- load_wq()

## Loading wq database: \\psf\Home\Dropbox\MysticDB\MysticDB_20150624.accdb
## Fetching tables...done
## Merging tables...done
## Excluding field blanks, duplicates...done
## Excluding flagged results...done

source('./Rcode/Sandbox/Jeff/wq-precip-scripts/load_precip.R')
precip <- load_precip()

## Loading precip file: \\psf\Home\Dropbox\MysticDB\Processed\Precip\LoganPrecip.xlsx

wq2 <- append_weather(wq,precip)

## Computing antecedent precip...done
## Computing DateHour column in wq dataframe...done
## Merging wq and precip...done

# Filter for Baseline and MWRA data only, surface samples only

wq3 <- tbl_df(wq2) %>%
  filter(ProjectID %in% c("BASE", "BHWQM", "CSORWM")) %>%
  filter(SampleDepthType %in% c("S", NA))

Create PassBoating and PassSwim columns

The old grading system assigned a grade by computing percent samples that met boating and swimming standards for the bacteria sampled. It combined E. coli and Enterococcus data by converting concentration data to the question of whether it exceeded the standard or not.

Appealing to the standards, we can associate each result with PassBoat and PassSwim variables (1 if pass, 0 if fail).

The mean of values in these columns for grouped rows (by WaterBody, Year, etc) will give proportion of samples in that group that pass the relevant standard, which is the old yardstick.

# Narrow results to ECOLI and ENT.
# Create new columns for PassBoat and PassSwim for each result, appealing to appropriate standard.
# Values are 1 if Pass and zero if fail.  Average  for grouped rows will now give 
# proportion/percentage of samples that pass the relevant standard for that group

wqgrade <- wq3 %>%
  filter(CharacteristicID == "ECOLI" | CharacteristicID == "ENT")%>%
  mutate(PassBoat = ifelse(
      ((CharacteristicID == "ECOLI" & ResultValue < 1260) |
        (CharacteristicID =="ENT" & ResultValue < 350))
      , 1, 0)) %>%
  mutate(PassSwim = ifelse(
    ((CharacteristicID == "ECOLI" & ResultValue < 235) |
       (CharacteristicID =="ENT" & ResultValue < 104))
    , 1, 0))

# This is the new dataframe with new columns PassBoat and PassSwim, =1 if pass =0 if not
# View(wqgrade)

Calculating weather-weighted percentages of samples meeting standards

In order to weight wet-weather days appropriately, the grade has calculated wet and dry proportions separately and then weights the dry value 75% and the wet value 25%. The last two columns of the table below represent the old scheme applied to each waterbody for 2014. Water bodies arranged in decreasing order of “swimmability.”

# now calculate the percentage of samples passing boat/swim by Weather, WaterBody, and year

# percentage of samples in 2014 meeting Swimming and boating standards in wet weather

wq_wet <- mutate(wqgrade, year = year(Datetime)) %>%
  filter(year == 2014) %>%
  filter(Weather == "Wet") %>%
  group_by(WaterBodyID, year) %>%
  summarize(PercPassBoatWet = mean(PassBoat)*100, PercPassSwimWet = mean(PassSwim)*100, n_wet = n())

# percentage of samples in 2014 meeting Swimming and boating standards in dry weather

wq_dry <- mutate(wqgrade, year = year(Datetime)) %>%
  filter(year == 2014) %>%
  filter(Weather == "Dry") %>%
  group_by(WaterBodyID, year) %>%
  summarize(PercPassBoatDry = mean(PassBoat)*100, PercPassSwimDry = mean(PassSwim)*100, n_dry = n())

# Join the tables and created a weighted sum column for swimming and boating

wq4 <- inner_join(wq_wet, wq_dry)%>%
  mutate(PassBoatWeighted = (PercPassBoatDry*.75 +PercPassBoatWet*.25)) %>%
  mutate(PassSwimWeighted = (PercPassSwimDry*.75 +PercPassSwimWet*.25))

wq4 <- tbl_df(wq4) %>%
  arrange(desc(PassSwimWeighted))

WaterBodyID	year	PercPassBoatWet	PercPassSwimWet	n_wet	PercPassBoatDry	PercPassSwimDry	n_dry	PassBoatWeighted	PassSwimWeighted
Upper Mystic Lake	2014	100.00000	80.000000	5	100.00000	100.00000	7	100.00000	95.000000
Mystic River (Salt)	2014	75.55556	60.000000	45	98.46154	92.30769	65	92.73504	84.230769
Mystic River (Fresh)	2014	75.63025	54.621849	119	99.29078	86.52482	282	93.37565	78.549079
Chelsea River	2014	66.66667	33.333333	3	100.00000	88.88889	9	91.66667	75.000000
Malden River	2014	45.00000	25.000000	20	100.00000	71.87500	32	86.25000	60.156250
Belle Isle Inlet	2014	100.00000	66.666667	3	100.00000	55.55556	9	100.00000	58.333333
Aberjona River	2014	60.00000	6.666667	15	100.00000	47.36842	19	90.00000	37.192982
Meetinghouse Brook	2014	40.00000	40.000000	5	57.14286	28.57143	7	52.85714	31.428571
Little River	2014	43.75000	6.250000	32	73.07692	38.46154	26	65.74519	30.408654
Alewife Brook	2014	41.12150	5.607477	107	75.86207	34.48276	87	67.17693	27.263938
Mill Creek	2014	66.66667	0.000000	3	77.77778	22.22222	9	75.00000	16.666667
Winns Brook	2014	20.00000	0.000000	5	50.00000	16.66667	6	42.50000	12.500000
Mill Brook	2014	60.00000	0.000000	5	71.42857	14.28571	7	68.57143	10.714286
Island End River	2014	0.00000	0.000000	3	33.33333	11.11111	9	25.00000	8.333333

If you look at the last three years of data, instead of just one, rank order is largely the same, certainly in the cleanest streams.

wq_wet2 <- mutate(wqgrade, year = year(Datetime)) %>%
  filter(year <= 2014 & year >= 2010) %>%
  filter(Weather == "Wet") %>%
  group_by(LocationID) %>%
  summarize(PercPassBoatWet = mean(PassBoat)*100, PercPassSwimWet = mean(PassSwim)*100, n_wet = n())

wq_dry2 <- mutate(wqgrade, year = year(Datetime)) %>%
  filter(year <= 2014 & year >= 2010) %>%
  filter(Weather == "Dry") %>%
  group_by(LocationID) %>%
  summarize(PercPassBoatDry = mean(PassBoat)*100, PercPassSwimDry = mean(PassSwim)*100, n_dry = n())

wq5 <- inner_join(wq_wet2, wq_dry2)%>%
  mutate(PassBoatWeighted = (PercPassBoatDry*.75 +PercPassBoatWet*.25)) %>%
  mutate(PassSwimWeighted = (PercPassSwimDry*.75 +PercPassSwimWet*.25))

wq5 <- tbl_df(wq5) %>%
  arrange(desc(PassSwimWeighted))

kable(wq5)

LocationID	PercPassBoatWet	PercPassSwimWet	n_wet	PercPassBoatDry	PercPassSwimDry	n_dry	PassBoatWeighted	PassSwimWeighted
MWRA137	95.65217	91.304348	23	100.00000	96.87500	96	98.91304	95.48234
MWRA059	95.83333	75.000000	48	100.00000	97.50000	160	98.95833	91.87500
UPL001	100.00000	83.333333	18	100.00000	94.59459	37	100.00000	91.77928
MWRA067	84.61538	65.384615	52	100.00000	96.87500	160	96.15385	89.00240
MYR071	90.47619	71.428571	21	100.00000	94.59459	37	97.61905	88.80309
MWRA167	81.03448	65.517241	58	100.00000	95.16129	186	95.25862	87.75028
MWRA069	82.75862	58.620690	58	98.78049	92.68293	82	94.77502	84.16737
MYRMMP	80.00000	60.000000	10	96.77419	90.32258	31	92.58065	82.74194
MWRA083	76.78571	41.964286	112	99.42529	91.66667	348	93.76539	79.24107
MWRA052	75.43860	50.877193	57	94.44444	86.66667	90	89.69298	77.71930
MYR275	77.77778	55.555556	9	93.75000	84.37500	32	89.75694	77.17014
MWRA176	54.71698	32.075472	53	98.74214	90.56604	159	87.73585	75.94340
CHR95S	70.00000	30.000000	10	100.00000	87.09677	31	92.50000	72.82258
MWRA057	75.00000	38.461539	52	98.73418	81.64557	158	92.80063	70.84956
MWRA056	51.92308	19.230769	52	100.00000	84.37500	160	87.98077	68.08894
MWRA066	58.53659	23.170732	82	96.96970	79.79798	198	87.36142	65.64117
MWRA177	55.81395	20.930233	86	98.93617	77.12766	188	88.15562	63.07830
BEI093	70.00000	50.000000	10	83.87097	48.38710	31	80.40323	48.79032
ABR049	55.55556	0.000000	18	94.59459	59.45946	37	84.83483	44.59459
ABR006	57.89474	21.052632	19	97.22222	50.00000	36	87.39035	42.76316
MEB001	47.61905	23.809524	21	84.21053	36.84211	38	75.06266	33.58396
ALB006	66.66667	4.761905	21	97.36842	39.47368	38	89.69298	30.79574
MWRA174	31.25000	4.687500	128	79.76190	38.09524	168	67.63393	29.74330
MWRA074	39.06250	4.687500	128	85.11905	36.90476	168	73.60491	28.85045
ABR028	61.90476	9.523810	21	94.59459	29.72973	37	86.42214	24.67825
MWRA183	37.50000	12.500000	8	50.00000	25.00000	24	46.87500	21.87500
MWRA172	30.76923	2.307692	130	84.52381	27.97619	168	71.08516	21.55907
MIB001	50.00000	0.000000	20	76.31579	23.68421	38	69.73684	17.76316
MAR036	22.22222	5.555556	18	72.97297	21.62162	37	60.28529	17.60511
MWRA070	21.53846	2.307692	130	80.00000	22.35294	170	65.38462	17.34163
MIC004	33.33333	0.000000	9	50.00000	21.87500	32	45.83333	16.40625
WIB001	42.10526	10.526316	19	69.44444	16.66667	36	62.60965	15.13158

Percent meeting Boating standards 2010-2014

g <- ggplot(wq5, aes(x=reorder(LocationID,-PassBoatWeighted), PassBoatWeighted))
g + geom_bar(stat = "identity", position="dodge") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous(limits = c(0,100))  + xlab("Water Body") + ylab("Weighted Percent Boatable")

Percent meeting Swimming standards 2010-2014

g <- ggplot(wq5, aes(x=reorder(LocationID,-PassSwimWeighted), y = PassSwimWeighted))
g + geom_bar(stat = "identity", position="dodge") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous(limits = c(0,100)) + xlab("Water Body") + ylab("Weighted Percent Swimmable")

Percent meeting Swimming and Boating Standards 2012-14, Alternate graph

# combine PassSwimWeighted and PassBoatWeighted into long form: PassType, PassValue

wq5_long <- gather(wq5, PassType, PassValue, PassBoatWeighted:PassSwimWeighted)
#View(wq5_long)

g <- ggplot(wq5_long, aes(x = reorder(LocationID, PassValue), PassValue, fill = PassType))
g + geom_bar(stat = "identity", position="dodge") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous(limits = c(0,100)) + xlab("Water Body") + ylab("Weighted Percent Meeting Standard") + scale_fill_brewer(palette = "Paired") + coord_flip()

New grade calculations by water body