1 Introduction

1.1 Problem Statement

Dissolved oxygen (DO) is an important indicator of water quality that until recently has been cost-prohibitive to monitor extensively in both space and time. Continuous water-quality data, particularly in coastal environments with tidal flow, is necessary for resource managers to understand the dynamic changes in water quality that occur tidally, daily, seasonally, and spatially. Great South Bay, New York is an important ecological system for the the communities of Long Island’s southern shore. During the 1970’s, approximately 70% of the United States’ shellfishing harvest came from Great South Bay. Since then, harvests have dwindled due to a multitide of reasons including government restirictions and water quality.

Low DO concentrations limit the survival of some economically and ecologically valuable species, such as the hard clam. For this reason, investments for restoration of these species without fully understanding the local water-quality conditions can result in an unsuccessful restoration efforts. Continuous and spatially distributed water-quality data are critical for regulatory agencies to support informed decisions in managing aquatic resources. Specifically, the proposed DO monitoring would provide more complete and accurate information to aid in the classification of water-quality impairments associated with Long Island estuaries as tabulated in the New York State Department of Environmental Conservation (NYSDEC) Waterbody Inventory and Priority Waterbodies List. This list aims to describe water quality and supported water uses, identify problems, and prioritize restorative measures.

1.2 Objectives of this project

The goal of this project is to provide exploratory data analysis of a DO study conducted in 2016 and 2017 by the U.S. Geological Survey, in conjunction with the Nature Conversancy. Trends in DO concentrations will be evaluated in terms of tidal, diurnal (day/night), seasonal, and spatial influences. Data will also be analysed in terms of NYDEC classifications of chronic and acute events. Daily DO averages of under 3.0 mg/L indicate an acute event while concentrations between 3.0 and 4.9 mg/L are allowable for an alotted number of days. These thresholds will be discussed in detail in the following sections.

1.3 Data collection method

During the warm weather months of 2016-2017, 12 optical sensors deployed in Great South Bay recording salinity (ppm), and dissolved oxygen (mg/L) at 6 minute intervals. Site locations are represented in the map below. The sensors were periodically serviced by cleaning and recalibrating the sensors, and downloading raw data. The raw DO data was then adjusted for salinity, and corrections for fouling and sensor drift were applied.

Map created in ArcGIS of site locations of DO sensor locations. Contributing watershed boundaries are shaded. A watershed can be thought of as a surface water contribting area due to topological features.

Map created in ArcGIS of site locations of DO sensor locations. Contributing watershed boundaries are shaded. A watershed can be thought of as a surface water contribting area due to topological features.

2 Data preperation

In this section, data is imported into RStudio and tidied to prepare for the exploratory data analysis that follows. The objective of this section is to create two related tables. The first table provides site information and can be thought of a spatial table and describes the locations of the deployed sensors. The second table will provide a time-series for the period of record at six minute intervals, and includes time-varying descriptive variables. The two tables are linked by the USGS assigned site number of the sensor locations.

2.1 Site Information Table

Information from the 12 sites were loaded into RStudio from the National Water information System (NWIS) database (https://waterdata.usgs.gov/nwis). NWIS is an online database that houses water quality datasets on a national scale. An R package, ‘dataRetrieval’ was developed to load sets directly into R. The package is used to site information of the DO study, including the USGS assigned site number (SiteNum) and Station Name (StationName), Latitude (Lat) and Longitude (Long). A variable for watershed was then seperated to indicate the source of contributing surface water. The SiteNum variable is used as a primary key to relate the site information table (spatial) to the time series table (temporal).

2.1.1 Load data from NWIS

##            SiteNum      Lat         StationName    watershed      Long
## 1  404133073041801 40.69250  Great South Bay 9  Browns River -73.07167
## 2  404146073074301 40.69611 Great South Bay 15  Greens Creek -73.12861
## 3  404200073034301 40.70000  Great South Bay 8  Browns River -73.06194
## 4  404211073061701 40.70306 Great South Bay 13  Greens Creek -73.10472
## 5  404213073041801 40.70361  Great South Bay 7  Browns River -73.07167
## 6  404213073070801 40.70361 Great South Bay 14  Greens Creek -73.11889
## 7  404226073034201 40.70722  Great South Bay 6  Browns River -73.06167
## 8  404240073074301 40.71111 Great South Bay 12  Greens Creek -73.12861
## 9  404253073065101 40.71472 Great South Bay 11  Greens Creek -73.11417
## 10 404254073041701 40.71500  Great South Bay 5  Browns River -73.07139
## 11 404306073034301 40.71833  Great South Bay 4  Browns River -73.06194
## 12 404320073072601 40.72222 Great South Bay 10  Greens Creek -73.12389

2.2 Dissolved Oxygen Time Series Table

This table contains a continuous time series for each year of the study at 6 minute intervals.

2.2.1 Load DO data from NWIS

DO data was read in directly from the NWIS database using the dataRetrieval package, providing the USGS assigned site number(SiteNum), dateTime of recorded measurement, and concentration of DO recorded by the sensor in mg/L (DO_mgl).

##           SiteNum            dateTime DO_mgL        date monthday
## 1 404133073041801 2016-07-26 13:18:00    4.7 2016-07-26     07-26
## 2 404133073041801 2016-07-26 13:24:00    4.8 2016-07-26     07-26
## 3 404133073041801 2016-07-26 13:30:00    4.9 2016-07-26     07-26
## 4 404133073041801 2016-07-26 13:36:00    5.0 2016-07-26     07-26
## 5 404133073041801 2016-07-26 13:42:00    5.1 2016-07-26     07-26
## 6 404133073041801 2016-07-26 13:48:00    5.2 2016-07-26     07-26

2.2.2 Add descriptive variables

Concentrations of Water quality parameters such as DO are influenced by a multitude of factors as described in the introduction. By adding descriptive variables to the time series we can analyse conditions that lead to DO concentrations.

Date/Time variables

Tide Stage The estuaries of Long Island, including Great South Bay are tidally influenced, as freshwater from surface runoff and groundwater contribution, mix with saltwater from the Atlanctic Ocean. Here, classifications of tide stage are derived.

Methodology: classify one tide cycle in 4 stages of tide condition. High tide to high tide is 12 hours and 25 minutes based on observation. Classifications include the following stored as factors - high tide/falling tide, high tide/rising tide, low tide/rising tide, low tide/falling tide, and are created by cutting a tide cylce interval into equal parts. The cycle is then repeated for the rest of the time series, and joined to the main DO time series table by joining by the dateTime variable.

  1. 2017 tide stage factor
  1. 2016 year tide data The same steps as described above were used to create 2016 tide factors
  1. combine tide.2016 and tide.2017

Water Temperature
Water temperature is provided from a nearby monitoring station, but is only available for 2017.

3 Exploratory Data Analysis

In this section, DO data will be explored, primarily using visualizations generated from the ggplot2 package.

3.2 Summary of Time of Day by year and watershed

Further yearly trends are depicted in the bar graph below as the data are categorized by month, and dodged with watershed, and faceted by time of day. The Green River watershed had consistently higher DO averages for both years. The Brown Creek watershed shows a different pattern between years, with DO dropping between July and August.

3.3 Distribution of Time of Day by year and watershed

The dotted lines in the boxplot below show threshold values 4.9 and 30. mg/L (discussed in more detail later). Diurnal (day/night) influences are added. The 2016 records are once again lower. Here it can seen that not only are the Brown Creek data lower, but there is also more variation.

3.4 Distribution of tide stage by year and month

The effect of tidal influence on DO does not seem to be signifcant according to the boxplot below. Once again the increased variation of the 2016 data, particularly September and October is seen.

3.5 Distribution of time of day by watershed

The historams below show differences in the distribution of the watersheds. Green River shows an approximately normal distribtion while Brown Creek has a pronounced skew, resulting in more data points under the 3.0 mg/L threshold.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

3.7 Water temperature

As water temperature increases, oxygen solubility decreases, which explains some of the trends in DO concentrations presented above. This relationship is visible in the data and shown with a scatterplot of daily averaged DO vs water temperature. Pearson’s correlation coefficient is calculated to quantify the relationship.

## Warning: Ignoring unknown parameters: model
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## [1] -0.4539608

4 NYDEC criteria

Criteria for regulatory and management purposes are governed by the New York State Department of Environment Conservation, Division of Water (NYSDEC). NYSDEC has defined what constitutes a low DO event based on U.S. Environemntal Protection Agency guidelines. Additional information can be found at https://www.dec.ny.gov/docs/water_pdf/togs116.pdf

In this section the DO.data dataframe will be evaluated in terms of NYSDEC guidelines. Chronic and acute DO events will be derived.

4.1 Chronic Events

Chronic DO events are defined by NYSDEC as being between 4.8 mg/L and 3.0 mg/L, where concentrations are allowed for a a period of time, dictated by an empirical formula.

NYSDEC regulation regarding chronic DO criteria

NYSDEC regulation regarding chronic DO criteria

The provided equation governing chronic events will evaluated.

Cut DO values from of the theoretical table

## [1] "[2.8,3.11]"  "(3.11,3.41]" "(3.41,3.72]" "(3.72,4.03]" "(4.03,4.33]"
## [6] "(4.33,4.64]"
## # A tibble: 6 x 2
##   DO.lb max.days
##   <fct>    <dbl>
## 1 2.8          0
## 2 3.11         3
## 3 3.41         7
## 4 3.72        10
## 5 4.03        15
## 6 4.33        23

Minimum daily averages Daily averages of DO are calculated at each site and the minimum is returned to capture the lowest daily value.

## # A tibble: 6 x 2
##   date          DO.daily.min
##   <chr>                <dbl>
## 1 "2016-07-26 "         4.12
## 2 "2016-07-27 "         3.72
## 3 "2016-07-28 "         5.16
## 4 "2016-07-29 "         4.86
## 5 "2016-07-30 "         4.50
## 6 "2016-07-31 "         5.09

Find days where daily DO from the DO.data dataframe is between 3.0-4.8 mg/L

##         date DO.daily.min group
## 1 2016-07-26     4.122314     1
## 2 2016-07-27     3.722917     1
## 3 2016-07-30     4.497500    NA
## 4 2016-08-04     3.994286    NA
## 5 2016-08-07     3.153750     4
## 6 2016-08-08     3.600000     4

Chronic violations

## # A tibble: 5 x 6
##   group start                 avg event.days max.days `max.days - event.da~
##   <dbl> <dttm>              <dbl>      <int>    <dbl>                 <dbl>
## 1     4 2016-08-07 00:00:00  3.7           3        3                     0
## 2    10 2017-07-07 00:00:00  3.49          7        0                    -7
## 3    17 2017-08-08 00:00:00  3.56          6        0                    -6
## 4    19 2017-08-19 00:00:00  3.94          5        3                    -2
## 5    NA 2016-07-30 00:00:00  3.73         13        0                   -13

Using the provided criteria, only 5 chronic events were detected for the period of record.

4.2 Acute violations

Crietria: The DO concentration shall not fall below the acute standard of 3.0 mg/L at any time.

## # A tibble: 12 x 2
##    SiteNum         acute.fraction
##    <chr>                    <dbl>
##  1 404133073041801         0.0553
##  2 404146073074301         0.0433
##  3 404200073034301         0.0656
##  4 404211073061701         0.0682
##  5 404213073041801         0.0793
##  6 404213073070801         0.0254
##  7 404226073034201         0.165 
##  8 404240073074301         0.0157
##  9 404253073065101         0.0791
## 10 404254073041701         0.287 
## 11 404306073034301         0.0573
## 12 404320073072601         0.0417
## # A tibble: 6 x 2
##   month acute.fraction
##   <chr>          <dbl>
## 1 06            0     
## 2 07            0.0777
## 3 08            0.0967
## 4 09            0.121 
## 5 10            0.0415
## 6 11            0
## # A tibble: 2 x 2
##   watershed    acute.fraction
##   <chr>                 <dbl>
## 1 Browns River         0.121 
## 2 Greens Creek         0.0461
## # A tibble: 2 x 4
##   year  acute.fraction observations violations
##   <chr>          <dbl>        <int>      <int>
## 1 2016          0.129        154913      19956
## 2 2017          0.0889       146430      13013
## # A tibble: 2 x 2
##   tod.fac acute.fraction
##   <chr>            <dbl>
## 1 day             0.0849
## 2 night           0.0838

From the tables, it can be determined that acute events are most common in the month of September, in 2016, and in the Browns River watershed. The map below, created using ArcGIS uses graduated symbols to show sites that had the most common acute violations.

Map created in ArcGIS of site locations of DO sensor locations symbolized by frequency of acute violations.

Map created in ArcGIS of site locations of DO sensor locations symbolized by frequency of acute violations.

5 Conclusions

5.1 Summary

As seen from the exploratory data analysis provided in this project, DO concentrations are dynamic and fluctuate across time and space. The data indicate that differences between yearly and diuranl cycles. 2016 and 2017 showed differences in DO concentrations, as did values between day and night. The day/night discrepencies however were not as drastic as anticipated. There were clearly defined spatial patters seen in the dataset as well, as potentially significant disceprencies were seen between the two watersheds of the study area.Tidal influences appear to not be a major contributing factor. Water temperature appears to play a major role, as it was highly correlated with DO.

5.2 Expansion Opportunities

This project could be improved by the acquisition of additional datsets and additional analysis. 1. The most glaring need is for ecological data, namely a shellfish survey spanning 2016 and 2017, to correlate DO concentrations with ecological health of Great South Bay.
2. Water temperature data for 2016 would further demonstrate the linkage between DO and temperature. 3. Data from extreme weather events, such as cyclones, could be a potential explanation for some of the acute DO events seen in the datset.
4. A watershed analysis should be performed on the two watersheds, quantifying possible sources of low DO, such as nitrate and stormwater, by looking at impervious surface converage, septic vs sewered areas, and land use.

5.3 Value of Analysis

Analysis such as these are important in understanding environmental data. These data originate mainly from government funded organizations, and it is therefore critical that as much information as possible be garnered from them as possible to maximize value from taxpayer money. Monitoring environmental parameters is time consuming and expensive, yet crucial for the long-term outlook of our communites. As such, these data need to contribute to improved knowledge so that regulatory agencies can make more informed decisions, applying resources in the most sensible areas.