The existing grading scheme for the Mystic has a few virtues:

It has flaws:

The simplest proposal for a new grading rubric would be one that applies the same calculations used for the old grade, but broken out by water body. To get enough representative data from the Mystic River we need to look to MWRA data from the freshwater section of the Mystic.

The following calculations try to keep as much of the old system as possible, while addressing its weaknesses.

Load Water Quality Data

library(RODBC)
library(reshape2)
library(lubridate)
library(dplyr)
library(pander)
library(ggplot2)
library(dplyr)
library(knitr)
library(tidyr)

setwd("\\\\psf/Home/Dropbox/MysticDB")
source("./Rcode/Sandbox/Jeff/wq-precip-scripts/load_wq.R")
wq <- load_wq()
## Loading wq database: \\psf\Home\Dropbox\MysticDB\MysticDB_20150624.accdb
## Fetching tables...done
## Merging tables...done
## Excluding field blanks, duplicates...done
## Excluding flagged results...done
source('./Rcode/Sandbox/Jeff/wq-precip-scripts/load_precip.R')
precip <- load_precip()
## Loading precip file: \\psf\Home\Dropbox\MysticDB\Processed\Precip\LoganPrecip.xlsx
wq2 <- append_weather(wq,precip)
## Computing antecedent precip...done
## Computing DateHour column in wq dataframe...done
## Merging wq and precip...done
# Filter for Baseline and MWRA data only, surface samples only

wq3 <- tbl_df(wq2) %>%
  filter(ProjectID %in% c("BASE", "BHWQM", "CSORWM")) %>%
  filter(SampleDepthType %in% c("S", NA))

Create PassBoating and PassSwim columns

The old grading system assigned a grade by computing percent samples that met boating and swimming standards for the bacteria sampled. It combined E. coli and Enterococcus data by converting concentration data to the question of whether it exceeded the standard or not.

Appealing to the standards, we can associate each result with PassBoat and PassSwim variables (1 if pass, 0 if fail).

The mean of values in these columns for grouped rows (by WaterBody, Year, etc) will give proportion of samples in that group that pass the relevant standard, which is the old yardstick.

# Narrow results to ECOLI and ENT.
# Create new columns for PassBoat and PassSwim for each result, appealing to appropriate standard.
# Values are 1 if Pass and zero if fail.  Average  for grouped rows will now give 
# proportion/percentage of samples that pass the relevant standard for that group

wqgrade <- wq3 %>%
  filter(CharacteristicID == "ECOLI" | CharacteristicID == "ENT")%>%
  mutate(PassBoat = ifelse(
      ((CharacteristicID == "ECOLI" & ResultValue < 1260) |
        (CharacteristicID =="ENT" & ResultValue < 350))
      , 1, 0)) %>%
  mutate(PassSwim = ifelse(
    ((CharacteristicID == "ECOLI" & ResultValue < 235) |
       (CharacteristicID =="ENT" & ResultValue < 104))
    , 1, 0))

# This is the new dataframe with new columns PassBoat and PassSwim, =1 if pass =0 if not
# View(wqgrade)

Calculating weather-weighted percentages of samples meeting standards

In order to weight wet-weather days appropriately, the grade has calculated wet and dry proportions separately and then weights the dry value 75% and the wet value 25%. The last two columns of the table below represent the old scheme applied to each waterbody for 2014. Water bodies arranged in decreasing order of “swimmability.”

# now calculate the percentage of samples passing boat/swim by Weather, WaterBody, and year

# percentage of samples in 2014 meeting Swimming and boating standards in wet weather

wq_wet <- mutate(wqgrade, year = year(Datetime)) %>%
  filter(year == 2014) %>%
  filter(Weather == "Wet") %>%
  group_by(WaterBodyID, year) %>%
  summarize(PercPassBoatWet = mean(PassBoat)*100, PercPassSwimWet = mean(PassSwim)*100, n_wet = n())

# percentage of samples in 2014 meeting Swimming and boating standards in dry weather

wq_dry <- mutate(wqgrade, year = year(Datetime)) %>%
  filter(year == 2014) %>%
  filter(Weather == "Dry") %>%
  group_by(WaterBodyID, year) %>%
  summarize(PercPassBoatDry = mean(PassBoat)*100, PercPassSwimDry = mean(PassSwim)*100, n_dry = n())

# Join the tables and created a weighted sum column for swimming and boating

wq4 <- inner_join(wq_wet, wq_dry)%>%
  mutate(PassBoatWeighted = (PercPassBoatDry*.75 +PercPassBoatWet*.25)) %>%
  mutate(PassSwimWeighted = (PercPassSwimDry*.75 +PercPassSwimWet*.25))

wq4 <- tbl_df(wq4) %>%
  arrange(desc(PassSwimWeighted))
WaterBodyID year PercPassBoatWet PercPassSwimWet n_wet PercPassBoatDry PercPassSwimDry n_dry PassBoatWeighted PassSwimWeighted
Upper Mystic Lake 2014 100.00000 80.000000 5 100.00000 100.00000 7 100.00000 95.000000
Mystic River (Salt) 2014 75.55556 60.000000 45 98.46154 92.30769 65 92.73504 84.230769
Mystic River (Fresh) 2014 75.63025 54.621849 119 99.29078 86.52482 282 93.37565 78.549079
Chelsea River 2014 66.66667 33.333333 3 100.00000 88.88889 9 91.66667 75.000000
Malden River 2014 45.00000 25.000000 20 100.00000 71.87500 32 86.25000 60.156250
Belle Isle Inlet 2014 100.00000 66.666667 3 100.00000 55.55556 9 100.00000 58.333333
Aberjona River 2014 60.00000 6.666667 15 100.00000 47.36842 19 90.00000 37.192982
Meetinghouse Brook 2014 40.00000 40.000000 5 57.14286 28.57143 7 52.85714 31.428571
Little River 2014 43.75000 6.250000 32 73.07692 38.46154 26 65.74519 30.408654
Alewife Brook 2014 41.12150 5.607477 107 75.86207 34.48276 87 67.17693 27.263938
Mill Creek 2014 66.66667 0.000000 3 77.77778 22.22222 9 75.00000 16.666667
Winns Brook 2014 20.00000 0.000000 5 50.00000 16.66667 6 42.50000 12.500000
Mill Brook 2014 60.00000 0.000000 5 71.42857 14.28571 7 68.57143 10.714286
Island End River 2014 0.00000 0.000000 3 33.33333 11.11111 9 25.00000 8.333333

If you look at the last three years of data, instead of just one, rank order is largely the same, certainly in the cleanest streams.

wq_wet2 <- mutate(wqgrade, year = year(Datetime)) %>%
  filter(year <= 2014 & year >= 2010) %>%
  filter(Weather == "Wet") %>%
  group_by(LocationID) %>%
  summarize(PercPassBoatWet = mean(PassBoat)*100, PercPassSwimWet = mean(PassSwim)*100, n_wet = n())

wq_dry2 <- mutate(wqgrade, year = year(Datetime)) %>%
  filter(year <= 2014 & year >= 2010) %>%
  filter(Weather == "Dry") %>%
  group_by(LocationID) %>%
  summarize(PercPassBoatDry = mean(PassBoat)*100, PercPassSwimDry = mean(PassSwim)*100, n_dry = n())

wq5 <- inner_join(wq_wet2, wq_dry2)%>%
  mutate(PassBoatWeighted = (PercPassBoatDry*.75 +PercPassBoatWet*.25)) %>%
  mutate(PassSwimWeighted = (PercPassSwimDry*.75 +PercPassSwimWet*.25))

wq5 <- tbl_df(wq5) %>%
  arrange(desc(PassSwimWeighted))

kable(wq5)
LocationID PercPassBoatWet PercPassSwimWet n_wet PercPassBoatDry PercPassSwimDry n_dry PassBoatWeighted PassSwimWeighted
MWRA137 95.65217 91.304348 23 100.00000 96.87500 96 98.91304 95.48234
MWRA059 95.83333 75.000000 48 100.00000 97.50000 160 98.95833 91.87500
UPL001 100.00000 83.333333 18 100.00000 94.59459 37 100.00000 91.77928
MWRA067 84.61538 65.384615 52 100.00000 96.87500 160 96.15385 89.00240
MYR071 90.47619 71.428571 21 100.00000 94.59459 37 97.61905 88.80309
MWRA167 81.03448 65.517241 58 100.00000 95.16129 186 95.25862 87.75028
MWRA069 82.75862 58.620690 58 98.78049 92.68293 82 94.77502 84.16737
MYRMMP 80.00000 60.000000 10 96.77419 90.32258 31 92.58065 82.74194
MWRA083 76.78571 41.964286 112 99.42529 91.66667 348 93.76539 79.24107
MWRA052 75.43860 50.877193 57 94.44444 86.66667 90 89.69298 77.71930
MYR275 77.77778 55.555556 9 93.75000 84.37500 32 89.75694 77.17014
MWRA176 54.71698 32.075472 53 98.74214 90.56604 159 87.73585 75.94340
CHR95S 70.00000 30.000000 10 100.00000 87.09677 31 92.50000 72.82258
MWRA057 75.00000 38.461539 52 98.73418 81.64557 158 92.80063 70.84956
MWRA056 51.92308 19.230769 52 100.00000 84.37500 160 87.98077 68.08894
MWRA066 58.53659 23.170732 82 96.96970 79.79798 198 87.36142 65.64117
MWRA177 55.81395 20.930233 86 98.93617 77.12766 188 88.15562 63.07830
BEI093 70.00000 50.000000 10 83.87097 48.38710 31 80.40323 48.79032
ABR049 55.55556 0.000000 18 94.59459 59.45946 37 84.83483 44.59459
ABR006 57.89474 21.052632 19 97.22222 50.00000 36 87.39035 42.76316
MEB001 47.61905 23.809524 21 84.21053 36.84211 38 75.06266 33.58396
ALB006 66.66667 4.761905 21 97.36842 39.47368 38 89.69298 30.79574
MWRA174 31.25000 4.687500 128 79.76190 38.09524 168 67.63393 29.74330
MWRA074 39.06250 4.687500 128 85.11905 36.90476 168 73.60491 28.85045
ABR028 61.90476 9.523810 21 94.59459 29.72973 37 86.42214 24.67825
MWRA183 37.50000 12.500000 8 50.00000 25.00000 24 46.87500 21.87500
MWRA172 30.76923 2.307692 130 84.52381 27.97619 168 71.08516 21.55907
MIB001 50.00000 0.000000 20 76.31579 23.68421 38 69.73684 17.76316
MAR036 22.22222 5.555556 18 72.97297 21.62162 37 60.28529 17.60511
MWRA070 21.53846 2.307692 130 80.00000 22.35294 170 65.38462 17.34163
MIC004 33.33333 0.000000 9 50.00000 21.87500 32 45.83333 16.40625
WIB001 42.10526 10.526316 19 69.44444 16.66667 36 62.60965 15.13158

Percent meeting Boating standards 2010-2014

g <- ggplot(wq5, aes(x=reorder(LocationID,-PassBoatWeighted), PassBoatWeighted))
g + geom_bar(stat = "identity", position="dodge") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous(limits = c(0,100))  + xlab("Water Body") + ylab("Weighted Percent Boatable")

Percent meeting Swimming standards 2010-2014

g <- ggplot(wq5, aes(x=reorder(LocationID,-PassSwimWeighted), y = PassSwimWeighted))
g + geom_bar(stat = "identity", position="dodge") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous(limits = c(0,100)) + xlab("Water Body") + ylab("Weighted Percent Swimmable") 

Percent meeting Swimming and Boating Standards 2012-14, Alternate graph

# combine PassSwimWeighted and PassBoatWeighted into long form: PassType, PassValue

wq5_long <- gather(wq5, PassType, PassValue, PassBoatWeighted:PassSwimWeighted)
#View(wq5_long)

g <- ggplot(wq5_long, aes(x = reorder(LocationID, PassValue), PassValue, fill = PassType))
g + geom_bar(stat = "identity", position="dodge") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous(limits = c(0,100)) + xlab("Water Body") + ylab("Weighted Percent Meeting Standard") + scale_fill_brewer(palette = "Paired") + coord_flip()