DRAFT Diatom Index V3
Jeff Butt and Matt Shank
2021-03-08
We investigated the effect of seasonality on sampling by looking for differences in taxonomic composition of sites based on sub-summer periods of:
Only samples from the least disturbed (LD) group were used. Results of NMDS analysis indicated that collections within the summer season were not significantly different. This suggests that index development and future collection can proceed using a summer index period from June 21st to September 21st.
As a result of the previous meeting, it was determined that we needed to understand why natural variation within the LD BioType Group (BTG) seemed related to common spatial patterns consistent with EPA nutrient ecoregions. Preliminary NMDS analysis suggested that nutrient ecoregions had some possible correlation with stress when these factors were plotted together, which caused concern for developing an ecoregion based index. A parallel analysis sought to determine which taxa were driving the biological community differences within the LD condition. It was determined that the LD groups (consisting of 3 BTGs) were directly influenced by the relative abundance of only two taxa (A. rivulare and A. minutisimum). The three biological community groups were relatively synonymous with a “high”, “medium”, and “low” relative abundance of these two taxa. To confirm this, the two taxa were removed, and the three biological community groups were replotted using NMDS. Results of the NMDS confirmed that the three biological groups were no longer distinctly separated. Therefore, the original biological community separation within the LD group was most likely driven by the abundance of just two taxa.
A final analysis combined the two parallel tracks together to determine if one taxa was more abundant in one ecoregion versus another, and to determine if a pattern existed with stress factors. Results suggest that distribution of these two taxa are relatively mixed regardless of which ecoregion the sample was collected. Additional analyses suggested that that one taxa may have some affinity toward certain conditions over the other. To put this into perspective, our results suggest that one taxon may be suited to a pollution tolerance value of 0 and the other taxon may be suited to a pollution tolerance value of 1.
The NMDS plot below shows LD and stressed (S) groups in each the Appalachian Forests/S.E. Plains (AF-SEP) and Atlantic Highlands/Mixed Wood Plains (AH-MWP). The plot shows substantial overlap between ecoregions, but little overlap between stress classes, resulting in little evidence for the need for classifications.
A. rivulare and A. minutisimum, as discussed above, showed promise as indicator taxa. Their abundance was variable in each ecoregion, and could indicate a PTV 0 vs. 1 analog.
CA plots are not constrained by explanatory variables. Rather, the x and y values (eigenvalues) respond solely to taxonomic composition. The CA plot serves as the backbone of our index. Once we identify the LD BTGs, we calculate index scores by calculating the distance of each point from the centroid of the LD group.
This is an interactive CA plot. The BTG centroids are shown as labels, ellipses correspond to color groupings of each BTG. Hover over for more information about the CA1 & CA2 scores, site info, and Index score of each point. You can turn BTGs ‘off’ by clicking on the legend.
The current version of the Diatom Index is based on this 2 dimensional CA. Index scores are the hypotenuse of the x and y distances of each point from the centroid of the LD BTG. Those distances are then rescaled across all sites from 0 to 100. Scores are presented in further detail below.
This diatom CCA plot shows taxonomic data constrained with explanatory water chemistry variables. As a result, the sample data becomes oriented so that the majority of variation is aligned with the primary axis (x), and the second highest amount of variation is aligned with the secondary axis (y).
Sulfate, SpC, and pH are most closely aligned with the primary axis, while iron, phosphorus, and alkalinity are most aligned with the secondary axis in our CCA. However, many variables are intermediate to the primary and secondary axes.
This step of CCA examination confirms that BTGs 10, 11, and 13 are LD due to their position with respect to stress vectors.
Below are interactive plots that demonstrate the general performance of the Diatom Index. Hover over points for more information. Zoom in by clicking and dragging.
The first plot is a histogram of index scores, symbolized by stress classification (LD, intermediate (I), and S). The dotted vertical line is the impairment threshold (5th percentile of LD), which equals an index score of 68.
The second plot shows index scores by BTG, symbolized by stress classification. Hover over boxes for statistics about the BTG, and hover over the points for additional information on each site/sample. The dotted horizontal line is the impairment threshold (5th percentile of LD), which equals an index score of 68.
This is an interactive table of site locations, BTGs, index scores, and water chemistries. The table is sorted by index score (high to low), but users can sort any column. Use the search function to search for keywords, streams, etc..
This map contains layers of each by BTG, listed in order of stress classification: S, I, LD. Points are symbolized by index score (light = high score; dark = low). Click each point to find additional information regarding the each site, including score, location, and associated water chemistry.
This is an interactive boxplot generator showing concentrations of each water quality stressor, by stress classification: S, I, and LD. Select a parameter to plot in the dropdown box at the top left. The units are shown in the dropdown box. Hover over each point to see concentration and stream. Hover over each box to see statistics for each stress group.
The literature suggests many diatom taxa that may be indicative of LD and S conditions. The work of Sonja Hausmann et al. 2016, which established a diatom-based BCG for assessing impairment and nutrient criteria for streams in New Jersey, is, due to geographic proximity, especially useful for identifying candidate indicator taxa in the PA DEP Diatom Index. For example, sensitive as well as tolerant taxa categories in Sonja Hausmann’s work include several diatom taxa that were also found in the samples used to develop this index. These taxa were consequentially designated as candidate indicator taxa and were examined more closely. These sensitive and tolerant taxa lists provided in Hausmann et al. 2016 were compiled in workshops of expert diatomists and organized by the Academy of Natural Sciences. [refer to Hausmann S., Charles D.F., Gerritsen J., Belton T.J. 2016. A diatom-based biological condition gradient (BCG) approach for assessing impairment and developing nutrient criteria for streams. Science of the Total Environment 562,914-927.]
The graphic below is an interactive NMDS plot which provides more information about the diatom taxa gathered in samples used to develop this index. The NMDS helps to clarify the association between taxa and particular BTG’s as well as LD and S conditions. Turn layers on/off by clicking the legend. Hover over taxa (text) or sites (points) for additional information. Taxa are symbolized by their sensitivity class.
The table below lists thirty-six candidate indicator taxa present in the PA DEP Diatom Index. These candidate taxa were grouped into species complexes consisting of Highly Sensitive, Intermediate Sensitive, Tolerant, and Highly Tolerant taxa. These species complexes were tested with ANOVA models to explore relationships with LD and S classifications as well as with attainment classifications. The interactive box plot generator, shown below, may be used to demonstrate the relationships between these sensitivity complexes and the LD/S and attainment classifications. In all cases, the ANOVA’s indicated statistical significance at, or better than, the 0.05 level. These ANOVA results verify the usefulness of using indicator taxa as a means of recognizing diatom community differences when considering attaining and non-attaining streams.
Pick a species complex in the dropdown box. All Sensitive is the sum of Highly and Intermediate Sensitive. All Tolerant is the sum of Tolerant and Highly Tolerant. Points are filled with attainment status, based on the attainment/impairment threshold of 68. Hover over the points for additional information, including relative abundance of the species complex and index score of the sample.
Diatom species complexes are an important construct for examining the biological integrity of a stream. This is because of the diversity of diatom taxa and niche overlap found across Pennsylvania. Consequentially, no single taxa may serve well as an indicator of a particular stream condition across the geographic extent of our Commonwealth. Whereas a complex of species, when taken together, will offer more reliable insight into the biological integrity of a stream regardless of where that sample was collected in Pennsylvania. Further, these species complexes may form the beginning of diatom metrics specifically calibrated to the streams and rivers of Pennsylvania.
To be considered accurate, the index must be able to consistently use biological data to differentiate between good and poor sites. The method for determining index accuracy is through the calculation of DE (discrimination efficiency) using the samples employed in the index development, calculated using the following formula:
\[ DE = 100 * ( n_{S<25thLD} / n_{STotal} ) \] Where:
Recall that the diatom index development samples were assigned to LD, I or S classifications based upon their taxonomic structure (BTGs) and their position in a CCA relative to a list of constraining pollution gradient water quality parameters. Whereas the sample index scores were determined without reference to this pollution gradient but according to their relative two-axis position to the LD BTG centroid in a separate CA plot. The DE for this index was determined to be 100%.
Classification efficiency utilized 48 samples from the validation dataset using the impairment threshold of 68 and the following formula:
\[ CE = 100 * ((n_{ValLD > 68} + n_{ValS < 68} ) / n_{ValS} + n_{ValLD} ) \] Where:
These validation samples were classified as LD, I, and S alongside the development dataset as described above. An index value for each validation sample was then determined using the taxonomic model established by the index development. CE was determined to be 94.3%
The temporal precision estimate (PE-T) was calculated using pairs of samples collected within the index period of 2013 thru 2017. These sample pairs consisted of a calibration sample and a replicate sample (sample collected from the same site location as the calibration sample but with a collection displaced in time ranging from weeks to years).
PE-T is calculated as the 90% confidence interval around these sample pairs (incremental degradation or improvement measurements) using the following formula:
\[
CI_{90} = 1.282 * (MSE^{0.5} / n^{0.5})
\]
Where:
For the sake of comparison, below are PE-T values for other established biological indices, the freestone and limestone macroinvertebrate indices of biotic integrity.
The method precision estimate (PE-M), often referred to as intrasite precision, using pairs of samples collected from the same stream site on the same day. Using only pairs that are above the attainment threshold (>68), PE-M is calculated by first determining the coefficient of variation (\(CV = StandardDeviation/mean\)) for each pair then determining the average CV for all the pairs. PE-M for this index was determined to be 13.2. Unfortunately, this PE-M determination is limited since only four intrasite pairs in the dataset met the threshold criteria required for inclusion in the PE-M calculation. To address this limitation, future diatom sampling will be directed to capture a much greater quantity of intrasite sample pairs that meet the threshold criteria for inclusion in the PE-M determination.
The PE-M values for the freestone and limestone macroinvertebrate indices of biotic integrity are comparable:
Contour plots show the relationship of each univariate stressor (water chemistry variable) against the unconstrained taxonomic data. Some contours are uniform, while some have much more ‘topography’ where effects are less even across the gradient of water chemistry. These contour plots reveal much of what is driving differences in diatom communities, but show only univariate influences, while the effects of chemistry are synergistic and/or additive across multiple parameters. This information could potentially be leveraged to include a stressor identification component of the Diatom Index tool.