Exploratory Data Visualization of Water Quality Data

this time with RMarkdown

Yi Xuan (MyRWA Programmer Intern)

Summer 2014, Mystic River Watershed Association

Reference: Guidance from and previous work done by Jeff Walker (PhD Tufts Uni.), http://rpubs.com/heflopod/

Data Source: MysticDB_20140602.accdb by MyRWA

library(myrwaR) #package by Jeff that loads MyRWA's water data
library(dplyr) # for manipulating datasets
library(lubridate) #for manipulating date/times 
library(ggplot2) #powerhouse for data visualization
library(ggmap) # for map visualization
theme_set(theme_bw()) #change the default theme to black/white
opts_chunk$set(tidy = FALSE)

Number of Samples Across Different Kinds of Testings And Projects

#wq is the data.frame pulled out from the water quality database
#CharacteristicID and ProjectID are corresponding IDs from the database 
ggplot(wq, aes(CharacteristicID, fill=ProjectID)) + geom_bar(position="dodge") + theme(axis.text.x=element_text(angle=90, hjust=1)) + labs(title = "Number of Samples vs Water Quality Testing", y='# Samples')

plot of chunk unnamed-chunk-3

Result showed MyRWA's CSORWM (MWRA CSO Testing Program) collected the highest numbers of samples, particularly DO, DO SAT, PH, SPCOND and TEMP_WATER. And MyRWA's BHWQM (MWRA Boston Harbor Testing) being the most comprehensive compares to other projects.

Ecoli Counts from MyRWA's Hotspot Project Showing Abnomality at 2004

#hotspot is the data frame filtered to only contain HOSPOT project data
ggplot(filter(hotspot, CharacteristicID=="ECOLI"), aes(Datetime, ResultValue)) + geom_point(size=3, colour="#CC0000",na.rm = TRUE) + labs(title= "Time Series of ECOLI concentration (HOSPOT)",y="ECOLI concentrations, CFU/100ml") + theme(axis.text.x=element_text(face="bold",angle=90, hjust=1))

plot of chunk unnamed-chunk-5

One can easily spot the extreme event (outlier?) at the beginning of 2004. Data came from MyRWA's Hotspot Mystic River Watershed MA Program, for which staff and volunteers sample each month intensively along a river or stream. And the goal of this program is to identify pollution 'hot spots' in the watershed.

Plot of MyRWA's BASE (Baseline) Project

ggplot(wq.base, aes(LocationID, fill=CharacteristicID, order = -as.numeric(CharacteristicID))) + geom_bar()  + labs(title = "The temporal distribution of samples for each water quality testing for the BASE project.", y="# Samples") + theme(axis.text.x=element_text(angle=90, hjust=1, vjust=0.5))

plot of chunk unnamed-chunk-7

We can see for each site in the BASE Project, most of the testings are evently distributed. Slight emphasis was placed for traditional water quality testings, like DO, ECOLI, Water Temperature, and TSS. The BASE Project (Baseline Mystic River Watershed MA) contains Water quality data collected since 2000 by MyRWA from fifteen sites across the watershed that documents trends in water quality.

Map of Two Medford Sites in Satellite View

sites <- data.frame(SITE = c("MEDOF00900", "MEDOF00910"), GROUP = c("Medford Site", 
    "Medford Site"), LON = c(-71.1303049996, -71.1305350005), LAT = c(42.4172916677,42.4174366679))
map <- get_map(location=c(lon=mean(range(sites$LON)), lat=mean(range(sites$LAT))),
               zoom = 18, scale = "auto", maptype="satellite")
ggmap(map) + geom_point(aes(x=LON, y=LAT, color=GROUP), data=sites, size=5) + labs(title= "Satellite View of Two Medford Sites") + geom_text(aes(x=LON, y=LAT, label=SITE,colour = GROUP), data=sites, hjust=-0.3)

plot of chunk unnamed-chunk-8

Mapping two stations near Tufts University. The two sites I randomly picked turn out to be crossing the river and a bridge from each other.