Mattamuskeet water quality boxplot options

Adam Smith

2019-11-20

Here we demonstrate the various options for generating boxplots of Mattamuskeet water quality parameters.

Basic use

Once the necessary functionality and data are loaded (this is documented in other code, but omitted here for clarity), only a few lines of code are necessary to produce a wide range of boxplots.

The first step is to identify which parameters of interest we want boxplots for. We define all three sets of interest here for completeness.

## CORE WATER QUALITY PARAMETERS OF INTEREST (POI)
core_poi <- c("chla", "turbidity", "TP", "TN", "res_susp_total")

## NITROGEN SPECIES OF INTEREST
N_poi <- c("TN", "NH3", "NOx", "TKN", "NP_molar")

## SEDIMENT SPECIES OF INTEREST
sed_poi <-  c("res_total", "res_diss_total", "res_susp_total", "res_susp_fixed", "res_susp_vol")

With parameters of interest defined, the next step is to transform our raw water quality data, which we’ve loaded (not shown) into object wq, for use in a boxplot using the format_boxplot_data function. It requires the raw data and the paramaters of interest object. Here we demonstrate for nitrogen species. We hold this new data set in the bp_N object:

bp_N <- format_boxplot_data(data = wq, variables = N_poi)

All that’s left is to decide on how you want it displayed. The decisions to make are outlined next, and the default option is listed first in all cases:

  1. Do you want summary boxplots by year (summary = "annual") or by month (summary = "monthly")?
  2. Do you want to display the raw data behind the boxplots (raw = TRUE) or not (raw = FALSE)?
  3. Do you want the figure to have no title (title = NULL) or a custom title (title = "Whatever title you want!")?
  4. Do you want the west and east basins boxplots side-by-side (grouped = TRUE) or in their own facets (grouped = FALSE)?
  5. Do you want the y-axis scale to be specific to each variable (fix_y_range = NULL) or would you like all variables to have the same y-axis scale (e.g., 0 - 2000 mg/L for sediment; fix_y_range = c(0, 2000))?

Here’s the default plot for nitrogen species, an annual summary with raw data, no title, west and east basin measurements in a given year plase side-by-side (west on left, east on right), and an automatically calculated y-axis scale for each nitrogen parameter.

Note that for annual summaries with raw data, the observations are colored by month with a custom cyclic color palette. This (hopefully) makes it easier to quickly assign a point to a season (winter months are pink/purple, spring months are shades of blue [April showers!], summer months are green, and fall months are shades of orange).

wq_boxplots(bp_N)
#> Warning: Removed 342 rows containing missing values (geom_rect).
Default boxplot options for nitrogen.

Default boxplot options for nitrogen.

You can get the same kind of plot, but summarized monthly, by specifying the summary = "monthly" argument to the function.

wq_boxplots(bp_N, summary = "monthly")
#> Warning: Removed 342 rows containing missing values (geom_rect).
Default boxplot options for nitrogen, but summarized monthly.

Default boxplot options for nitrogen, but summarized monthly.

If you’d rather simplify the figure by only showing the boxplots without the raw data, pass the raw = FALSE argument:

wq_boxplots(bp_N, raw = FALSE)
#> Warning: Removed 342 rows containing missing values (geom_rect).
Annual nitrogen summary, raw data suppressed.

Annual nitrogen summary, raw data suppressed.

Nice, but we need a descriptive title. Pass whatever title you want in a character string to the title argument:

wq_boxplots(bp_N, raw = FALSE, 
            title = "Mattamuskeet NWR: Nitrogen water quality parameters")
#> Warning: Removed 342 rows containing missing values (geom_rect).
Annual nitrogen summary, raw data suppressed, now with a shiny title.

Annual nitrogen summary, raw data suppressed, now with a shiny title.

If you’d prefer to see the summaries for west and east lake basins separately, rather than side-by-side, then turn off grouping with the grouped = FALSE argument:

wq_boxplots(bp_N, raw = FALSE, grouped = FALSE,
            title = "Mattamuskeet NWR: Nitrogen water quality parameters")
#> Warning: Removed 342 rows containing missing values (geom_rect).
Annual nitrogen summary, raw data suppressed, and west and east basin series in their own facets.

Annual nitrogen summary, raw data suppressed, and west and east basin series in their own facets.

And, lastly, if it makes sense to fix the y-axis scale so you can more easily compare the different species, you can pass a custom y-axis range using the fix_y_range argument.

Let’s switch over to sediment species as an example. Total solids comprise the sum of dissolved and suspended solids. Let’s put them all on the same scale so we can see which is contributing the most to total solids. Here, we use the maximum total solids measurement ever recorded to set the y-axis scale:

# First, make the sediment boxplot data
bp_sed <- format_boxplot_data(data = wq, variables = sed_poi)

# Get the maximum record total solids (res_total variable in `wq` data set) to determine the right y-axis range
y_range <- range(c(0, wq$res_total), na.rm = TRUE)

# Now make a pretty figure with a fixed y-axis to facilitate comparisons among species
wq_boxplots(bp_sed, fix_y_range = y_range,
            title = "Mattamuskeet NWR: Sediment water quality parameters")
#> Warning: Removed 332 rows containing missing values (geom_rect).
Annual sediment species summary, with default display options except for a fixed y-axis scale.

Annual sediment species summary, with default display options except for a fixed y-axis scale.

In this case, it’s clear that dissolved solids are driving total solids.

But this makes it hard to compare fixed vs. volatile suspended solids. We can make that comparison more easily if we make the same figure, but allow each species to have its own y-axis scale:

wq_boxplots(bp_sed, 
            title = "Mattamuskeet NWR: Sediment water quality parameters")
#> Warning: Removed 332 rows containing missing values (geom_rect).
Annual sediment species summary, sediment species-specific y-axis ranges.

Annual sediment species summary, sediment species-specific y-axis ranges.

You can mix and match any of the function arguments to get whichever output you most desire.