Introduction

Dissolved Oxygen is an important measure of water quality that is key to a healthy ecosystem. Additionally, dissolved oxygen had a strong relationship with temperature. This explore dissolved oxygen data and show this relationship using data collected across several waterbodies by the Massachusetts Department of Environmental Protection in 2019 and 2020. The data used for this analysis can be found at the MASS DEP Watershed Planning Program website.

Data Preparation

In order to analyze the data, packages must be loaded, and the data must be read into a dataframe.

#Load Packages
library(tidyverse)
library(here)
library(skimr)
library(janitor)
library(ggbeeswarm)

wq<- read.csv(here("data", "wqual.csv"))#read in data to dataframe

For ease of use in data analysis, the names can be foramatted in a more uniform way. Additionally, several columns will not be used for this analysis only the relevant columns will be kept for the analyzed data.

cleanwq <- wq %>%
  clean_names() %>% #create uniform names
  select(waterbody, unique_id, latitude, longitude, date, time, n_depth, n_temp, n_ph, n_spcond, n_tds, n_do, n_dosat) #keep only relevant columns

The date column is currently formatted in a way that is not conductive for analysis. To fix this formatting issue, the column data is formatted as a date and then formatted to a categorical variable so that date can be used as a category to split data.

cleanwq$date <- as.Date(cleanwq$date, format="%m/%d/%Y") 
cleanwq$date <- as.factor(cleanwq$date) 

Finally, while the waterbody column contains understandable names that are useful as titles for creating understandable categories, there are several Unnamed Tributaries meaning this column is not a unique identifier for each waterbody. To create a unique identifier with the waterbody name in it, the waterbody and unique id columns were combined for analysis. A glimpse of this final dataframe is shown below.

cleanwq <-cleanwq %>% 
  unite(site, waterbody:unique_id, remove = FALSE)

glimpse(cleanwq)
## Rows: 72
## Columns: 14
## $ site      <chr> "Weasel Brook_W2944", "Poor Farm Brook_W2945", "Big Bummet B…
## $ waterbody <chr> "Weasel Brook", "Poor Farm Brook", "Big Bummet Brook", "Quin…
## $ unique_id <chr> "W2944", "W2945", "W2946", "W2188", "W1242", "W2947", "W1767…
## $ latitude  <dbl> 42.30793, 42.30647, 42.23658, 42.20837, 42.17729, 42.17549, …
## $ longitude <dbl> -71.80008, -71.76581, -71.70516, -71.69757, -71.68788, -71.7…
## $ date      <fct> 2019-11-01, 2019-11-01, 2019-11-01, 2019-11-01, 2019-11-01, …
## $ time      <chr> "8:15:05 AM", "9:15:05 AM", "10:13:34 AM", "10:57:35 AM", "1…
## $ n_depth   <dbl> 0.0, 0.0, 0.3, 0.2, 0.2, 0.1, NA, 0.3, 0.0, 0.0, NA, 0.1, 0.…
## $ n_temp    <dbl> 14.9, 13.2, 14.5, 14.5, 15.2, 13.8, 8.1, 11.6, 13.2, 12.8, 1…
## $ n_ph      <dbl> 7.4, 7.3, 7.1, 7.1, 7.6, 7.0, 7.3, 7.1, 7.1, 7.5, 7.3, 7.3, …
## $ n_spcond  <int> 484, 521, 451, 516, 426, 246, 192, 331, 961, 763, 505, 252, …
## $ n_tds     <int> 310, 333, 289, 330, 272, 157, 123, 212, 615, 488, 323, 161, …
## $ n_do      <dbl> 9.2, 9.5, 9.3, 9.1, 9.8, 9.7, 12.4, 9.6, 9.7, 10.0, 9.3, 9.8…
## $ n_dosat   <int> 94, 93, 93, 92, 100, 95, 107, 91, 95, 97, 91, 95, 99, 98, 97…

Analysis

First, we will look at the overall distribution of dissolved oxygen on each date of measurement. To do this a violin plot and a dot plot will be created to show the distribution at each of the sites on each of the days that measurements were taken. Overall, the measurements taken in the winter (when water temperatures are typically colder) generally have the highest dissolved oxygen values.

cleanwq %>%
  ggplot(aes(x=date, y=n_do))+
  geom_violin()+
  geom_point()+
  coord_flip()+
  labs(title="Dissolved Oxygen Data Distribution", caption="Figure 1. The distribution of dissolved oxygen on each date of measurement. The distributions
       on each date different but are generally clustered somewhat centrally",
     x="Date",
     y="Dissolved Oxygen")

To more thouroughly look at the distribution of data, the measurements can be separated out by each specific waterbody. As seen in Figure 2, the dissolved oxygen measurements follow very similar seasonal patterns at each of the sites.

cleanwq %>%
  ggplot(aes(x=date, y=n_do))+
  geom_point() +
  facet_wrap(~ site)+
  labs(title="Dissolved Oxygen by Site", caption="Figure 2. Dissolved oxygen measurements at each site on each date of measurement. 
       Overall, the different sites have similar patterns in the data",
     x="Date",
     y="Dissolved Oxygen")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Figure 1 and 2 shows that there are patterns in dissolved oxygen data that are similar to when changes in temperature would likely occur seasonally. However, the dataset also measured temperature so these two variables can be compared specifically. As shown in Figure 3, there is a visible negative relationship between temperature and dissolved oxygen. This tracks with other data on the relationship between temperature and dissolved oxygen.

cleanwq %>%
  ggplot(aes(x=n_temp, y=n_do))+
  geom_point()+
  geom_smooth()+
  labs(title="Temperature v Dissolved Oxygen", caption="Figure 3. The relationship between temperature and dissolved oxygen. 
  As shown there is a negative relationship between the two variables that aligns with previous study.",
       x="Temperature",
       y="Dissolved Oxygen")

With the analysis completed plots can be saved as images.

ggsave(here("output","Figure3.png"))