Reference: Guidance from and previous work done by Jeff Walker (PhD Tufts Uni.), http://rpubs.com/heflopod/
Data Source: MysticDB_20140602.accdb by MyRWA
library(myrwaR) #package by Jeff that loads MyRWA's water data
library(dplyr) # for manipulating datasets
library(lubridate) #for manipulating date/times
library(ggplot2) #powerhouse for data visualization
library(ggmap) # for map visualization
theme_set(theme_bw()) #change the default theme to black/white
opts_chunk$set(tidy = FALSE)
#wq is the data.frame pulled out from the water quality database
#CharacteristicID and ProjectID are corresponding IDs from the database
ggplot(wq, aes(CharacteristicID, fill=ProjectID)) + geom_bar(position="dodge") + theme(axis.text.x=element_text(angle=90, hjust=1)) + labs(title = "Number of Samples vs Water Quality Testing", y='# Samples')
Result showed MyRWA's CSORWM (MWRA CSO Testing Program) collected the highest numbers of samples, particularly DO, DO SAT, PH, SPCOND and TEMP_WATER. And MyRWA's BHWQM (MWRA Boston Harbor Testing) being the most comprehensive compares to other projects.
#hotspot is the data frame filtered to only contain HOSPOT project data
ggplot(filter(hotspot, CharacteristicID=="ECOLI"), aes(Datetime, ResultValue)) + geom_point(size=3, colour="#CC0000",na.rm = TRUE) + labs(title= "Time Series of ECOLI concentration (HOSPOT)",y="ECOLI concentrations, CFU/100ml") + theme(axis.text.x=element_text(face="bold",angle=90, hjust=1))
One can easily spot the extreme event (outlier?) at the beginning of 2004. Data came from MyRWA's Hotspot Mystic River Watershed MA Program, for which staff and volunteers sample each month intensively along a river or stream. And the goal of this program is to identify pollution 'hot spots' in the watershed.
ggplot(wq.base, aes(LocationID, fill=CharacteristicID, order = -as.numeric(CharacteristicID))) + geom_bar() + labs(title = "The temporal distribution of samples for each water quality testing for the BASE project.", y="# Samples") + theme(axis.text.x=element_text(angle=90, hjust=1, vjust=0.5))
We can see for each site in the BASE Project, most of the testings are evently distributed. Slight emphasis was placed for traditional water quality testings, like DO, ECOLI, Water Temperature, and TSS. The BASE Project (Baseline Mystic River Watershed MA) contains Water quality data collected since 2000 by MyRWA from fifteen sites across the watershed that documents trends in water quality.
sites <- data.frame(SITE = c("MEDOF00900", "MEDOF00910"), GROUP = c("Medford Site",
"Medford Site"), LON = c(-71.1303049996, -71.1305350005), LAT = c(42.4172916677,42.4174366679))
map <- get_map(location=c(lon=mean(range(sites$LON)), lat=mean(range(sites$LAT))),
zoom = 18, scale = "auto", maptype="satellite")
ggmap(map) + geom_point(aes(x=LON, y=LAT, color=GROUP), data=sites, size=5) + labs(title= "Satellite View of Two Medford Sites") + geom_text(aes(x=LON, y=LAT, label=SITE,colour = GROUP), data=sites, hjust=-0.3)
Mapping two stations near Tufts University. The two sites I randomly picked turn out to be crossing the river and a bridge from each other.