1. Goal of this document and Summary

Goal: to inform planning of 2015 hotspot surveys by looking at historical summaries of MyRWA hotspot data.

Questions:

What municipalities or classes of site or particular locations should we prioritize in 2015?

2. Assemble Data

First, load water quality data.

setwd("\\\\psf/Home/Dropbox/MysticDB")
#setwd("C:/Users/Monitoring Director/Dropbox/MysticDB")

source('./Rcode/Sandbox/Jeff/wq-precip-scripts/load_wq.R')
wq <- load_wq()

source('./Rcode/Sandbox/Jeff/wq-precip-scripts/load_precip.R')
precip <- load_precip()
wq2 <- append_weather(wq,precip)

Narrow to Hotspot bacteria data, and create a subset of samples taken at outfalls only

# filter for hotspot data only 
hot <- tbl_df(wq2)  %>%
  filter(ProjectID == "HOTSPOT") %>%
  filter(CharacteristicID == "ECOLI" | CharacteristicID == "ENT") 
 # filter(LocationTypeID == 27)

# filter for outfalls only
hot_outfalls <- tbl_df(wq2)  %>%
  filter(ProjectID == "HOTSPOT") %>%
  filter(CharacteristicID == "ECOLI" | CharacteristicID == "ENT") %>%
  filter(LocationTypeID == 27)

3. Percentage of outfalls sampled, by town

# To calculate percentage of outfalls visited by town, need number of outfalls for which there are bacteria results, divided by total number of outfalls, sorted by town. 

# First, start with hotspot results for outfalls, then find the number of unique outfalls associated with bacteria results, grouped by town

VisitedByTown <- hot_outfalls %>%
  group_by(MunicipalityID, LocationID) %>%
  summarize(n = n())

all_vis_town <- unique(VisitedByTown[,c("MunicipalityID", "LocationID")]) %>%
  group_by(MunicipalityID) %>%
  summarize(n = n())

#print(all_vis_town)

# For total outfalls by town, get Locations table, exported manually from the database, filter for type 27 (stormater outfall), relabel location ID field, and group by town

Location <- read.csv("//psf/Home/Dropbox/MysticDB/Rcode/Sandbox/Andy/data_scratch/Location.csv")

all_locs <- tbl_df(Location) %>%
  filter(LocationTypeID == 27) %>%
  select(LocationID = ID, MunicipalityID, LocationTypeID) %>%
  group_by(MunicipalityID) %>%
  summarize(n = n())
 
#print(all_locs)

#Join the two resulting tables (visited outfalls by town, all outfalls by town),  relabel columns, and created a percent-sampled column by dividing Sampled by All

perc_sampled <- all_locs %>%
  left_join(all_vis_town, by = "MunicipalityID") %>%
  select(MunicipalityID, Total = n.x, Sampled = n.y) %>%
  mutate(PercSampled = (Sampled / Total) * 100) %>%
  arrange(desc(PercSampled))
## Warning: joining factors with different levels, coercing to character
## vector
pander(perc_sampled)
MunicipalityID Total Sampled PercSampled
Somerville 29 29 100
Winthrop 5 5 100
Everett 6 5 83.33
Malden 35 26 74.29
Melrose 25 18 72
Belmont 23 13 56.52
Chelsea 84 47 55.95
Revere 49 24 48.98
East Boston 35 17 48.57
Boston 51 21 41.18
Cambridge 35 12 34.29
Arlington 130 37 28.46
Winchester 102 23 22.55
Burlington 28 5 17.86
Medford 237 33 13.92
Stoneham 54 7 12.96
Woburn 735 38 5.17
Lexington 79 4 5.063
Reading 15 NA NA

4. Summaries by Municipality

Note: These are all hotspot samples (unlike summary above, which was outfalls), including stream samples.

I plan to use these and other tables in planning future surveys. Where have we not focussed in the last three years that would make sense to return to? [This will need more thought. But these results themselves are interesting to look through.]

Three-year summary

  • How many samples by town, in last three years?
  • What were the median, 75th and 90th quantile values in each town?
three <- hot %>%
  filter(Datetime > "2012-01-01")

byTown3yr <- three %>%
  group_by(MunicipalityID, year(Datetime)) %>%
  summarize(q90 = quantile(ResultValue, 0.9, na.rm = TRUE), 
            q75 = quantile(ResultValue, 0.75, na.rm = TRUE),
            median = median(ResultValue, na.rm = TRUE), n = n()) %>%
  arrange(desc(n), desc(q90))

Total number of samples in last three years = 322

MunicipalityID year(Datetime) q90 q75 median n
Arlington 2014 5655 1901 1500 10
Arlington 2012 26498 13918 2611 8
Arlington 2013 8018 5528 1379 3
Belmont 2012 5071 2927 444 12
Belmont 2014 4306 2329 49 8
Belmont 2013 1790 1475 1265 4
Boston 2013 110 110 110 1
Boston 2014 75 75 75 1
Cambridge 2013 114196 30838 14500 10
Cambridge 2014 2221 1432 117 3
Cambridge 2012 2747 2747 2747 1
Chelsea 2013 2793 207.8 57 12
Chelsea 2012 310 150 10 11
Chelsea 2014 10 10 10 1
East Boston 2013 784 784 784 1
Everett 2013 241960 138220 9208 7
Everett 2014 53980 19434 127 4
Everett 2012 104 104 104 1
Lexington 2012 68638 24536 29.5 4
Lexington 2014 4869 1774 35 4
Malden 2013 103366 13272 2600 22
Malden 2012 17544 6420 1596 10
Malden 2014 9360 1549 516 9
Medford 2014 82390 10808 1743 26
Medford 2013 7471 2356 465.5 26
Medford 2012 857 857 857 1
Melrose 2012 17433 2517 767 12
Melrose 2013 7040 2134 252 8
Melrose 2014 11034 4037 744 7
Revere 2014 673.4 182 10 8
Revere 2013 200.6 84.5 20 4
Somerville 2014 9678 3266 2069 21
Somerville 2013 2000 1107 169 7
Stoneham 2014 30140 7945 2747 9
Stoneham 2013 38 38 38 1
Winchester 2013 20585 1642 455 13
Winchester 2014 217798 181554 121148 2
Winchester 2012 1 1 1 1
Winthrop 2014 28.9 25.75 20.5 2
Woburn 2012 122 93.5 12 11
Woburn 2014 182.6 131 58 9
NA 2013 3920 3080 270 5
NA 2014 518.8 433 290 2

Five-year history

  • How many samples by town, in last five years?
  • What were the median, 75th and 90th quantile values in each town?
five <- hot %>%
  filter(Datetime > "2010-01-01")

byTown5yr <- five %>%
  group_by(MunicipalityID) %>%
  summarize(q90 = quantile(ResultValue, 0.9, na.rm = TRUE), 
            q75 = quantile(ResultValue, 0.75, na.rm = TRUE),
            median = median(ResultValue, na.rm = TRUE), n = n()) %>%
  arrange(desc(n), desc(q90))

Total number of samples in last five years = 625

MunicipalityID q90 q75 median n
Chelsea 6867.0 2416.50 362.0 74
Malden 16366.0 3808.50 970.0 68
Medford 63490.0 7201.50 1159.5 64
Somerville 18876.0 4826.00 2069.0 59
Arlington 14138.0 2523.50 395.0 55
Belmont 7945.0 2456.50 526.0 51
Melrose 9128.8 3237.00 395.0 43
Revere 24060.0 833.00 120.0 37
Winchester 20486.0 2400.00 432.0 35
Woburn 372.0 196.00 64.0 35
Cambridge 35796.0 19945.00 3922.0 23
Stoneham 9774.6 5885.75 580.5 18
Everett 192475.0 33550.00 1280.0 14
Lexington 6932.0 312.50 34.0 11
Boston 41754.8 19863.00 5794.0 9
East Boston 30120.0 5040.00 1726.0 9
Winthrop 3991.6 1342.00 537.0 7
NA 3640.0 1828.00 270.0 7
Burlington 2120.0 1700.00 770.0 5
Wakefield 324.0 324.00 324.0 1
  • How many samples by town, in all years?
  • What were the median, 75th and 90th quantile values in each town?
all <- hot 
  #filter(Datetime > "2012-01-01")

# to create a column with just the year as the name for matrix below

byTownAll <- all %>%
  group_by(MunicipalityID) %>%
  summarize(q90 = quantile(ResultValue, 0.9, na.rm = TRUE), 
            q75 = quantile(ResultValue, 0.75, na.rm = TRUE),
            median = median(ResultValue, na.rm = TRUE), n = n()) %>%
  arrange(desc(n), desc(q90))

Total number of samples in all years = 2089

MunicipalityID q90 q75 median n
Arlington 9678.0 1900.00 399.0 321
Medford 10792.0 1956.50 540.0 219
Chelsea 5055.8 1080.00 199.0 217
Belmont 8600.5 2098.50 356.0 180
Winchester 2418.0 1077.00 289.0 163
Woburn 1161.1 473.25 131.5 163
Malden 15531.0 7700.00 1050.0 153
Somerville 17765.0 3274.50 725.5 132
Cambridge 12200.0 2000.00 800.0 117
Revere 24380.2 2572.50 329.5 98
East Boston 9625.2 939.00 120.0 77
Melrose 10543.2 1649.50 593.0 75
Boston 9403.0 944.00 78.5 42
Stoneham 8118.3 1109.50 384.2 40
Everett 33736.0 6400.00 560.0 33
Lexington 6165.2 1709.50 138.0 27
Winthrop 1811.0 917.00 374.5 16
Burlington 2167.6 1792.00 936.0 8
NA 3640.0 1828.00 270.0 7
Wakefield 324.0 324.00 324.0 1

–How many samples by town in EACH year?

MunicipalityID 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Arlington 23 83 19 32 10 36 13 15 35 15 19 8 3 10
Belmont 10 28 10 26 1 24 4 12 14 16 11 12 4 8
Boston 0 0 20 0 1 8 4 0 0 7 0 0 1 1
Burlington 0 0 0 0 3 0 0 0 0 0 5 0 0 0
Cambridge 10 30 13 19 7 8 0 5 2 8 1 1 10 3
Charlestown 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Chelsea 0 0 47 1 0 17 22 20 36 10 40 11 12 1
East Boston 0 0 8 18 4 19 10 9 0 8 0 0 1 0
Everett 0 0 8 0 0 7 4 0 0 2 0 1 7 4
Lexington 0 0 0 0 10 3 3 0 0 3 0 4 0 4
Malden 0 6 12 4 0 6 42 15 0 4 23 10 22 9
Medford 0 45 12 31 0 17 23 26 1 10 1 1 26 26
Melrose 0 1 0 0 9 4 12 6 0 6 10 12 8 7
Reading 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Revere 0 0 6 7 11 5 17 7 8 10 15 0 4 8
Somerville 1 21 9 6 0 17 5 7 7 29 2 0 7 21
Stoneham 0 0 0 0 2 2 1 2 15 4 4 0 1 9
Wakefield 0 0 0 0 0 0 0 0 0 0 1 0 0 0
Winchester 0 19 0 14 13 26 11 25 20 4 15 1 13 2
Winthrop 0 0 0 4 0 0 2 3 0 2 3 0 0 2
Woburn 0 23 0 17 21 22 18 5 22 13 2 11 0 9
NA 0 0 0 0 0 0 0 0 0 0 0 0 5 2