========================================================
Reference: Guidance from and previous work done by Jeff Walker (PhD Tufts), http://rpubs.com/heflopod/
Data Source: MyRWA’s Access Database
After pulling out data from the database, select a random HOTSPOT sampling date and futher select only ECOLI data
HOTSPOT (Hotspot Mystic River Watershed MA) were water data collected by MyRWA staff and volunteers
#override the default wrapping width
options(width=130)
##changing DateHour format
wq2$DateHour <-as.Date(wq2$DateHour,'%m/%d/%Y')
#select a subset from the data.frame
hotspotBarPlot<-subset(subset(wq2, DateHour=="2013-08-15") , CharacteristicID=="ECOLI")
#check out what's in the data.frame
head(hotspotBarPlot,5)
## DateHour LocationTypeID LocationID VisitID CharacteristicID ResultValue Units Qualifier FlagID SampleTypeID
## 12348 2013-08-15 27 MEDOF01030 30676 ECOLI 568 MPN/100ml <NA> <NA> S
## 12352 2013-08-15 27 MEDOFWB010 30678 ECOLI 2452 MPN/100ml <NA> <NA> S
## 12355 2013-08-15 22 WNB04 30677 ECOLI 6932 MPN/100ml <NA> <NA> S
## 12364 2013-08-15 27 MEDOF50009 30679 ECOLI 242 MPN/100ml <NA> <NA> S
## 12372 2013-08-15 27 MEDOF04100 30680 ECOLI 626 MPN/100ml <NA> <NA> S
## CharacteristicName SampleType Datetime ProjectID SampleDepthType Comment MunicipalityID
## 12348 Escherichia coli S 2013-08-15 07:26:00 HOTSPOT <NA> Some suspended material Medford
## 12352 Escherichia coli S 2013-08-15 08:23:00 HOTSPOT <NA> <NA> Medford
## 12355 Escherichia coli S 2013-08-15 08:03:00 HOTSPOT <NA> <NA> Medford
## 12364 Escherichia coli S 2013-08-15 08:45:00 HOTSPOT <NA> Insufficient flow for YSI data Medford
## 12372 Escherichia coli S 2013-08-15 09:02:00 HOTSPOT <NA> <NA> Medford
## WaterBodyID Latitude Longitude WaterType LocationTypeName Precip.48 Weather
## 12348 Mystic River (Fresh) 42.42 -71.13 Fresh Storm Sewer 0 Dry
## 12352 Winter Hill Brook 42.40 -71.10 Fresh Storm Sewer 0 Dry
## 12355 Winter Hill Brook 42.40 -71.10 Fresh River/Stream 0 Dry
## 12364 Mystic River (Fresh) 42.42 -71.10 Fresh Storm Sewer 0 Dry
## 12372 Mystic River (Fresh) 42.42 -71.10 Fresh Storm Sewer 0 Dry
Levels of ECOLI found during Hotspot sampling 08/15/2013
ggplot(hotspotBarPlot, aes(x=LocationID, y=ResultValue, width =.65)) +
geom_bar(fill="lightgoldenrod1", colour="black",stat = "identity") +
#control x axis labeling and its text display angle
theme(axis.text.x=element_text(angle=45, hjust=1, vjust=NULL)) +
labs(title="Levels of ECOLI Found \n During Hotspot sampling 08/15/2013",
y='ResultValue, MPN/100ml') +
geom_text(aes(label=ResultValue, x = LocationID, y = ResultValue),
position = position_dodge(width = 0.8), vjust=-.6) +
# add two horizontal lines intercept at particular values
geom_hline(aes(yintercept=1260), colour="#CC0033", size=1.25)+
geom_hline(aes(yintercept=235), colour="#FF9900", size=1.25)+
#title and its size specification
theme(plot.title = element_text(size = rel(1.5)))+
#adding text/ standard to the plot
annotate("text", label = "Boating Standard, 1,260 E.coli/100ml",
x = 10, y = 1000, size = 4, colour = "black") +
annotate("text", label = "Swimming Standard, 235 E.coli/100ml",
x = 10, y = 130, size = 4, colour = "black") +
scale_y_log10(limits=c(1,20000),breaks=c(1, 10, 100,1000,10000))
On that data, only three sites, all from Winchester had ECOLI levels under regulatory limits.The worst site was from Winter Hill Brook (WINB04) at Medford, which had ECOLI at 6932 MPN per 100ml.
Now let’s take a look at ECOLI data from Lexington
#select a subset of the data.frame
multiplot <- subset(subset(wq2, MunicipalityID=="Lexington" ), CharacteristicID=="ECOLI" )
#override the default wrapping width
options(width=130)
head(multiplot,6)
## DateHour LocationTypeID LocationID VisitID CharacteristicID ResultValue Units Qualifier FlagID SampleTypeID
## 5457 2005-08-30 22 MIB032 27354 ECOLI 1102 CFU/100ml <NA> <NA> S
## 5494 2005-08-30 22 FEB018 27362 ECOLI 586 CFU/100ml <NA> <NA> S
## 5495 2005-08-30 22 MUB019W 27361 ECOLI 1041 CFU/100ml <NA> <NA> S
## 5496 2005-08-30 22 MUB019E 27359 ECOLI 3921 CFU/100ml <NA> <NA> S
## 5497 2005-08-30 22 MIB030 27357 ECOLI 5654 CFU/100ml <NA> <NA> S
## 5530 2005-08-30 22 SHB183 27367 ECOLI 2317 CFU/100ml <NA> <NA> S
## CharacteristicName SampleType Datetime ProjectID SampleDepthType Comment
## 5457 Escherichia coli S 2005-08-30 06:49:00 HOTSPOT S 1st called SIB047
## 5494 Escherichia coli S 2005-08-30 07:53:00 HOTSPOT S <NA>
## 5495 Escherichia coli S 2005-08-30 07:35:00 HOTSPOT S no oily sheen
## 5496 Escherichia coli S 2005-08-30 07:28:00 HOTSPOT S oily sheen
## 5497 Escherichia coli S 2005-08-30 07:06:00 HOTSPOT S 1st called SIB027, strong H2S odor, low flow
## 5530 Escherichia coli S 2005-08-30 08:15:00 HOTSPOT S slight H2S smell
## MunicipalityID WaterBodyID Latitude Longitude WaterType LocationTypeName Precip.48 Weather
## 5457 Lexington Mill Brook 42.43 -71.20 Fresh River/Stream 0.57 Wet
## 5494 Lexington Fessenden Brook 42.44 -71.20 Fresh River/Stream 0.57 Wet
## 5495 Lexington Munroe Brook 42.43 -71.19 Fresh River/Stream 0.57 Wet
## 5496 Lexington Munroe Brook 42.43 -71.19 Fresh River/Stream 0.57 Wet
## 5497 Lexington Mill Brook 42.43 -71.19 Fresh River/Stream 0.57 Wet
## 5530 Lexington Shaker Glen Brook 42.45 -71.20 Fresh River/Stream 0.57 Wet
ggplot(multiplot, aes(DateHour, ResultValue,color=Weather)) +
#setting the shape and size of points
geom_point( stat="identity",
shape= 20,size = 4,na.rm=TRUE) +
#overwriting default colour, assign new colours to points
scale_colour_manual(values=c("red", "blue"))+
#warp multiple plots by locationID, set arbitrary row number
facet_wrap(~LocationID, nrow=5, scale="free_x") +
#specify the x axis datetime tick mark
scale_x_date(breaks="12 month",limits=c(min(multiplot$DateHour),max=max(multiplot$DateHour)),
labels = date_format("%Y")) +
labs(title="Multi plot of ECOLI levels From Lexiton Mulnicipality Data ",
y='ResultValue, Ecoli Count')+
#set title size and colour
theme(plot.title = element_text(face="bold",size = rel(2), colour = "dodgerblue4"))+
theme(axis.text.x=element_text(angle=90, hjust=1, vjust=0.5))+
# add two horizontal lines intercept at particular values
geom_hline(aes(yintercept=1260), colour="maroon1", size=1.25,alpha=0.5)+
geom_hline(aes(yintercept=235), colour="steelblue", size=1.25,alpha=0.5)+
# set log y axis and its ticks points, and y axis break/ tick marks
scale_y_log10(limits=c(1,1.2e5),breaks=c(1, 10, 100,1000,10000))
Turn out there aren’t many sites and samples from Lexington. Upon futher inspection, in reality Mystic River doesn’t flow through Lexington, where just some ponds and small streams scatter trhoughout.
A summary of all the HOTSPOT sites in the database, Arlington comes up on top with 2220 sites
summary(wq2$MunicipalityID)
## Arlington Belmont Boston Burlington Cambridge Charlestown Chelsea East Boston Everett Lexington
## 2220 1125 235 87 758 0 1413 418 232 209
## Malden Medford Melrose Reading Revere Somerville Stoneham Wakefield Winchester Winthrop
## 855 1181 416 12 638 690 197 14 932 220
## Woburn NA's
## 1019 31
Excel style, a quick achive plot for ECOLI level at WIN2-5d, a site at Aberjona River, Winchester, MA
ggplot(archivePlot, aes(x=Datetime, y=ResultValue,color=Weather)) +
labs(title="Achive plot of ECOLI levels found at site WIN2-5d ",
y='ResultValue, Ecoli Count')+
#hide vertical gridlines and make it color black
theme(panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_blank(),
panel.grid.major.y=element_line(colour = 'lightyellow', size = 0.5),
panel.grid.minor.y=element_line(colour = 'lightyellow', size = 0.5))+
# add two horizontal lines intercept at particular values
geom_hline(aes(yintercept=1260), colour="#CC0033", size=1.25,alpha=0.5)+
geom_hline(aes(yintercept=235), colour="#FF9900", size=1.25,alpha=0.5)+
geom_point(aes(), stat="identity",
shape= 19,size = 5,na.rm=TRUE) +
geom_text(aes(label=ResultValue, x = Datetime, y = ResultValue),
position = position_dodge(width = 0.8), vjust=-.7)+
#set title size, font and colour
theme(plot.title = element_text(face="bold",size = rel(2), colour = "lemonchiffon3"))+
# set log y axis and its range, and y axis break/ tick marks
scale_y_log10(limits=c(1,1.0e5),breaks=c(1, 10, 100,1000,10000))
## ymax not defined: adjusting position using y instead
Next, let’s check out what’s going on for the Aberjona River. Visualizing using geom_tile
Aberjona<-subset(subset(wq2, WaterBodyID=="Aberjona River"), LocationTypeName="Storm Sewer")
ggplot(Aberjona, aes(LocationID, CharacteristicID)) +
geom_tile(aes(fill=Weather),width=0.9, height=0.5) +
theme(axis.text.x=element_text(angle=90, hjust=1, vjust=0.5))
From the geom_tile visualization, we can tell that the majority of the samples during dry, and some during rainy days (thanks to MyRWA’s staff and volunteers).
Important water quality parameters (ECOLI, Salinity, DO and the like) were captured for most of the sites at the Aberjona River. On the other hand, Potassium, TSS and TP(Total Phosphorus), not so much.
ggplot(Aberjona, aes(LocationID, CharacteristicID)) +
geom_tile(aes(fill=LocationTypeName),width=0.9, height=0.5) +
theme(axis.text.x=element_text(angle=90, hjust=1, vjust=0.5))
Water quality visualization by LocationTypeName, which has two values: River/Stream and STorm Sewer. Sites initial started with ABR (Aberjona River) are River/Stream type, on the other hand, other sites with initial started with WIN (Winchester, MA) or WOB (Woburn, MA) are Storm Sewer type, and from local’s combined sewer system.