Goal: to inform planning of 2015 hotspot surveys by looking at historical summaries of MyRWA hotspot data.
Questions:
What municipalities or classes of site or particular locations should we prioritize in 2015?
First, load water quality data.
setwd("\\\\psf/Home/Dropbox/MysticDB")
#setwd("C:/Users/Monitoring Director/Dropbox/MysticDB")
source('./Rcode/Sandbox/Jeff/wq-precip-scripts/load_wq.R')
wq <- load_wq()
source('./Rcode/Sandbox/Jeff/wq-precip-scripts/load_precip.R')
precip <- load_precip()
wq2 <- append_weather(wq,precip)
Narrow to Hotspot bacteria data, and create a subset of samples taken at outfalls only
# filter for hotspot data only
hot <- tbl_df(wq2) %>%
filter(ProjectID == "HOTSPOT") %>%
filter(CharacteristicID == "ECOLI" | CharacteristicID == "ENT")
# filter(LocationTypeID == 27)
# filter for outfalls only
hot_outfalls <- tbl_df(wq2) %>%
filter(ProjectID == "HOTSPOT") %>%
filter(CharacteristicID == "ECOLI" | CharacteristicID == "ENT") %>%
filter(LocationTypeID == 27)
# To calculate percentage of outfalls visited by town, need number of outfalls for which there are bacteria results, divided by total number of outfalls, sorted by town.
# First, start with hotspot results for outfalls, then find the number of unique outfalls associated with bacteria results, grouped by town
VisitedByTown <- hot_outfalls %>%
group_by(MunicipalityID, LocationID) %>%
summarize(n = n())
all_vis_town <- unique(VisitedByTown[,c("MunicipalityID", "LocationID")]) %>%
group_by(MunicipalityID) %>%
summarize(n = n())
#print(all_vis_town)
# For total outfalls by town, get Locations table, exported manually from the database, filter for type 27 (stormater outfall), relabel location ID field, and group by town
Location <- read.csv("//psf/Home/Dropbox/MysticDB/Rcode/Sandbox/Andy/data_scratch/Location.csv")
all_locs <- tbl_df(Location) %>%
filter(LocationTypeID == 27) %>%
select(LocationID = ID, MunicipalityID, LocationTypeID) %>%
group_by(MunicipalityID) %>%
summarize(n = n())
#print(all_locs)
#Join the two resulting tables (visited outfalls by town, all outfalls by town), relabel columns, and created a percent-sampled column by dividing Sampled by All
perc_sampled <- all_locs %>%
left_join(all_vis_town, by = "MunicipalityID") %>%
select(MunicipalityID, Total = n.x, Sampled = n.y) %>%
mutate(PercSampled = (Sampled / Total) * 100) %>%
arrange(desc(PercSampled))
## Warning: joining factors with different levels, coercing to character
## vector
pander(perc_sampled)
| MunicipalityID | Total | Sampled | PercSampled |
|---|---|---|---|
| Somerville | 29 | 29 | 100 |
| Winthrop | 5 | 5 | 100 |
| Everett | 6 | 5 | 83.33 |
| Malden | 35 | 26 | 74.29 |
| Melrose | 25 | 18 | 72 |
| Belmont | 23 | 13 | 56.52 |
| Chelsea | 84 | 47 | 55.95 |
| Revere | 49 | 24 | 48.98 |
| East Boston | 35 | 17 | 48.57 |
| Boston | 51 | 21 | 41.18 |
| Cambridge | 35 | 12 | 34.29 |
| Arlington | 130 | 37 | 28.46 |
| Winchester | 102 | 23 | 22.55 |
| Burlington | 28 | 5 | 17.86 |
| Medford | 237 | 33 | 13.92 |
| Stoneham | 54 | 7 | 12.96 |
| Woburn | 735 | 38 | 5.17 |
| Lexington | 79 | 4 | 5.063 |
| Reading | 15 | NA | NA |
Note: These are all hotspot samples (unlike summary above, which was outfalls), including stream samples.
I plan to use these and other tables in planning future surveys. Where have we not focussed in the last three years that would make sense to return to? [This will need more thought. But these results themselves are interesting to look through.]
three <- hot %>%
filter(Datetime > "2012-01-01")
byTown3yr <- three %>%
group_by(MunicipalityID, year(Datetime)) %>%
summarize(q90 = quantile(ResultValue, 0.9, na.rm = TRUE),
q75 = quantile(ResultValue, 0.75, na.rm = TRUE),
median = median(ResultValue, na.rm = TRUE), n = n()) %>%
arrange(desc(n), desc(q90))
Total number of samples in last three years = 322
| MunicipalityID | year(Datetime) | q90 | q75 | median | n |
|---|---|---|---|---|---|
| Arlington | 2014 | 5655 | 1901 | 1500 | 10 |
| Arlington | 2012 | 26498 | 13918 | 2611 | 8 |
| Arlington | 2013 | 8018 | 5528 | 1379 | 3 |
| Belmont | 2012 | 5071 | 2927 | 444 | 12 |
| Belmont | 2014 | 4306 | 2329 | 49 | 8 |
| Belmont | 2013 | 1790 | 1475 | 1265 | 4 |
| Boston | 2013 | 110 | 110 | 110 | 1 |
| Boston | 2014 | 75 | 75 | 75 | 1 |
| Cambridge | 2013 | 114196 | 30838 | 14500 | 10 |
| Cambridge | 2014 | 2221 | 1432 | 117 | 3 |
| Cambridge | 2012 | 2747 | 2747 | 2747 | 1 |
| Chelsea | 2013 | 2793 | 207.8 | 57 | 12 |
| Chelsea | 2012 | 310 | 150 | 10 | 11 |
| Chelsea | 2014 | 10 | 10 | 10 | 1 |
| East Boston | 2013 | 784 | 784 | 784 | 1 |
| Everett | 2013 | 241960 | 138220 | 9208 | 7 |
| Everett | 2014 | 53980 | 19434 | 127 | 4 |
| Everett | 2012 | 104 | 104 | 104 | 1 |
| Lexington | 2012 | 68638 | 24536 | 29.5 | 4 |
| Lexington | 2014 | 4869 | 1774 | 35 | 4 |
| Malden | 2013 | 103366 | 13272 | 2600 | 22 |
| Malden | 2012 | 17544 | 6420 | 1596 | 10 |
| Malden | 2014 | 9360 | 1549 | 516 | 9 |
| Medford | 2014 | 82390 | 10808 | 1743 | 26 |
| Medford | 2013 | 7471 | 2356 | 465.5 | 26 |
| Medford | 2012 | 857 | 857 | 857 | 1 |
| Melrose | 2012 | 17433 | 2517 | 767 | 12 |
| Melrose | 2013 | 7040 | 2134 | 252 | 8 |
| Melrose | 2014 | 11034 | 4037 | 744 | 7 |
| Revere | 2014 | 673.4 | 182 | 10 | 8 |
| Revere | 2013 | 200.6 | 84.5 | 20 | 4 |
| Somerville | 2014 | 9678 | 3266 | 2069 | 21 |
| Somerville | 2013 | 2000 | 1107 | 169 | 7 |
| Stoneham | 2014 | 30140 | 7945 | 2747 | 9 |
| Stoneham | 2013 | 38 | 38 | 38 | 1 |
| Winchester | 2013 | 20585 | 1642 | 455 | 13 |
| Winchester | 2014 | 217798 | 181554 | 121148 | 2 |
| Winchester | 2012 | 1 | 1 | 1 | 1 |
| Winthrop | 2014 | 28.9 | 25.75 | 20.5 | 2 |
| Woburn | 2012 | 122 | 93.5 | 12 | 11 |
| Woburn | 2014 | 182.6 | 131 | 58 | 9 |
| NA | 2013 | 3920 | 3080 | 270 | 5 |
| NA | 2014 | 518.8 | 433 | 290 | 2 |
five <- hot %>%
filter(Datetime > "2010-01-01")
byTown5yr <- five %>%
group_by(MunicipalityID) %>%
summarize(q90 = quantile(ResultValue, 0.9, na.rm = TRUE),
q75 = quantile(ResultValue, 0.75, na.rm = TRUE),
median = median(ResultValue, na.rm = TRUE), n = n()) %>%
arrange(desc(n), desc(q90))
Total number of samples in last five years = 625
| MunicipalityID | q90 | q75 | median | n |
|---|---|---|---|---|
| Chelsea | 6867.0 | 2416.50 | 362.0 | 74 |
| Malden | 16366.0 | 3808.50 | 970.0 | 68 |
| Medford | 63490.0 | 7201.50 | 1159.5 | 64 |
| Somerville | 18876.0 | 4826.00 | 2069.0 | 59 |
| Arlington | 14138.0 | 2523.50 | 395.0 | 55 |
| Belmont | 7945.0 | 2456.50 | 526.0 | 51 |
| Melrose | 9128.8 | 3237.00 | 395.0 | 43 |
| Revere | 24060.0 | 833.00 | 120.0 | 37 |
| Winchester | 20486.0 | 2400.00 | 432.0 | 35 |
| Woburn | 372.0 | 196.00 | 64.0 | 35 |
| Cambridge | 35796.0 | 19945.00 | 3922.0 | 23 |
| Stoneham | 9774.6 | 5885.75 | 580.5 | 18 |
| Everett | 192475.0 | 33550.00 | 1280.0 | 14 |
| Lexington | 6932.0 | 312.50 | 34.0 | 11 |
| Boston | 41754.8 | 19863.00 | 5794.0 | 9 |
| East Boston | 30120.0 | 5040.00 | 1726.0 | 9 |
| Winthrop | 3991.6 | 1342.00 | 537.0 | 7 |
| NA | 3640.0 | 1828.00 | 270.0 | 7 |
| Burlington | 2120.0 | 1700.00 | 770.0 | 5 |
| Wakefield | 324.0 | 324.00 | 324.0 | 1 |
all <- hot
#filter(Datetime > "2012-01-01")
# to create a column with just the year as the name for matrix below
byTownAll <- all %>%
group_by(MunicipalityID) %>%
summarize(q90 = quantile(ResultValue, 0.9, na.rm = TRUE),
q75 = quantile(ResultValue, 0.75, na.rm = TRUE),
median = median(ResultValue, na.rm = TRUE), n = n()) %>%
arrange(desc(n), desc(q90))
Total number of samples in all years = 2089
| MunicipalityID | q90 | q75 | median | n |
|---|---|---|---|---|
| Arlington | 9678.0 | 1900.00 | 399.0 | 321 |
| Medford | 10792.0 | 1956.50 | 540.0 | 219 |
| Chelsea | 5055.8 | 1080.00 | 199.0 | 217 |
| Belmont | 8600.5 | 2098.50 | 356.0 | 180 |
| Winchester | 2418.0 | 1077.00 | 289.0 | 163 |
| Woburn | 1161.1 | 473.25 | 131.5 | 163 |
| Malden | 15531.0 | 7700.00 | 1050.0 | 153 |
| Somerville | 17765.0 | 3274.50 | 725.5 | 132 |
| Cambridge | 12200.0 | 2000.00 | 800.0 | 117 |
| Revere | 24380.2 | 2572.50 | 329.5 | 98 |
| East Boston | 9625.2 | 939.00 | 120.0 | 77 |
| Melrose | 10543.2 | 1649.50 | 593.0 | 75 |
| Boston | 9403.0 | 944.00 | 78.5 | 42 |
| Stoneham | 8118.3 | 1109.50 | 384.2 | 40 |
| Everett | 33736.0 | 6400.00 | 560.0 | 33 |
| Lexington | 6165.2 | 1709.50 | 138.0 | 27 |
| Winthrop | 1811.0 | 917.00 | 374.5 | 16 |
| Burlington | 2167.6 | 1792.00 | 936.0 | 8 |
| NA | 3640.0 | 1828.00 | 270.0 | 7 |
| Wakefield | 324.0 | 324.00 | 324.0 | 1 |
–How many samples by town in EACH year?
| MunicipalityID | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Arlington | 23 | 83 | 19 | 32 | 10 | 36 | 13 | 15 | 35 | 15 | 19 | 8 | 3 | 10 |
| Belmont | 10 | 28 | 10 | 26 | 1 | 24 | 4 | 12 | 14 | 16 | 11 | 12 | 4 | 8 |
| Boston | 0 | 0 | 20 | 0 | 1 | 8 | 4 | 0 | 0 | 7 | 0 | 0 | 1 | 1 |
| Burlington | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 0 |
| Cambridge | 10 | 30 | 13 | 19 | 7 | 8 | 0 | 5 | 2 | 8 | 1 | 1 | 10 | 3 |
| Charlestown | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Chelsea | 0 | 0 | 47 | 1 | 0 | 17 | 22 | 20 | 36 | 10 | 40 | 11 | 12 | 1 |
| East Boston | 0 | 0 | 8 | 18 | 4 | 19 | 10 | 9 | 0 | 8 | 0 | 0 | 1 | 0 |
| Everett | 0 | 0 | 8 | 0 | 0 | 7 | 4 | 0 | 0 | 2 | 0 | 1 | 7 | 4 |
| Lexington | 0 | 0 | 0 | 0 | 10 | 3 | 3 | 0 | 0 | 3 | 0 | 4 | 0 | 4 |
| Malden | 0 | 6 | 12 | 4 | 0 | 6 | 42 | 15 | 0 | 4 | 23 | 10 | 22 | 9 |
| Medford | 0 | 45 | 12 | 31 | 0 | 17 | 23 | 26 | 1 | 10 | 1 | 1 | 26 | 26 |
| Melrose | 0 | 1 | 0 | 0 | 9 | 4 | 12 | 6 | 0 | 6 | 10 | 12 | 8 | 7 |
| Reading | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Revere | 0 | 0 | 6 | 7 | 11 | 5 | 17 | 7 | 8 | 10 | 15 | 0 | 4 | 8 |
| Somerville | 1 | 21 | 9 | 6 | 0 | 17 | 5 | 7 | 7 | 29 | 2 | 0 | 7 | 21 |
| Stoneham | 0 | 0 | 0 | 0 | 2 | 2 | 1 | 2 | 15 | 4 | 4 | 0 | 1 | 9 |
| Wakefield | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Winchester | 0 | 19 | 0 | 14 | 13 | 26 | 11 | 25 | 20 | 4 | 15 | 1 | 13 | 2 |
| Winthrop | 0 | 0 | 0 | 4 | 0 | 0 | 2 | 3 | 0 | 2 | 3 | 0 | 0 | 2 |
| Woburn | 0 | 23 | 0 | 17 | 21 | 22 | 18 | 5 | 22 | 13 | 2 | 11 | 0 | 9 |
| NA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 2 |