R WQI Scoring – Comp Plan

Code was previously developed to complete WQI scoring in R (see: http://rpubs.com/mattshank20/R_srbWQI) . Since then, Dawn and Joanna compiled and cleaned data from 2000-2018, which resulted in 19,831 records. I took the dataset and:
1. replaced all non-detects (‘ND’) with the detection limit we used in the WQI development
2. increased any concentrations < detection limit to the detection limit we used in the WQI development
3. revised zeroing out schema to ONLY parameters with aquatic life use thresholds (Fe (>= 1.5 mg/L) and Al (>= 0.75 mg/L))

All signs point to success. 8,061 samples had enough data to be scored. WQI scores ranged from 4.2 - 96.3 There were 11,944, 13,118, and 9,612 scores generated for the Metals, Nutrients, and Development categories, respectively. Metals category scores ranged from 0-100, while nutrients and development category scores ranged from 0.2-100 and 0.6-100, respectively.

After the scores at the sample level were generated, I summarized scores by unique site (StationID – NOT AliasID (contains duplicates)). This way, when we average WQI scores by HUC10, each site will have equal influence; sample size will be tracked but not influence HUC10 level averages.

After averaging WQI by AliasID, there were 1,117 sites with enough data to generate an average WQI score. N ranged from 0-239 across sites. See below for the distribution of sites throughout our Classification.

Classification	N	%
Excellent	33	2.95
Good	280	25.1
Fair	412	36.9
Poor	257	23.0
Very Poor	135	12.1

Additionally, 1,573, 1,316, and 1,281 sites had enough data to calculate metals, nutrients, and development scores, respectively. The total number of sites with enough data to calculate at least one category score was 1,777.

Figure 1. Map of 1,117 sites with enough data to generate WQI scores, symbolized by mean WQI score using symbology consistent with report

Figure 2. Map of 1,777 sites with enough data to calculate at least one category score. Radio buttons allow layers to turn on and off. Layers include mean WQI and scores for each category, using color ramp symbology. Layers overlay one another.

WQI and category scores by HUC10 watershed

As per our discussion with Jeff Z., we calculated average WQI and category scores for each HUC10 watershed using 1) all sites and 2) all sites without including mainstem sites. Our thought was that when clean tribs confluence with large rivers of poorer water quality, but the sites are included in the same HUC10, there could be an issue with eclipsing where average WQI is not representative. The 2 iterations completed and visualized below show the results.

There are 170 HUC10 watersheds in the SRB. There is at least one site in each HUC10 for WQI and each category. See specific n for each in the table below.

Table 1. Sample size distribution throughout HUC10 watersheds using data from all sites (n=1,777)

N_WQI	N_Metals	N_Nutrients	N_Develop
Min. : 1.000	Min. : 1.000	Min. : 1.000	Min. : 1.000
1st Qu.: 3.000	1st Qu.: 3.000	1st Qu.: 3.000	1st Qu.: 3.000
Median : 5.000	Median : 5.000	Median : 5.000	Median : 5.000
Mean : 6.571	Mean : 9.253	Mean : 7.741	Mean : 7.535
3rd Qu.: 8.000	3rd Qu.: 9.000	3rd Qu.: 9.000	3rd Qu.: 9.000
Max. :39.000	Max. :86.000	Max. :44.000	Max. :44.000

I then went into GIS and identified all sites on mainstem rivers (Chemung, Susquehanna, West Branch, and Juniata; n=203). This resulted in a dataset of 1,574 sites used to calculate average WQI and category scores for each HUC10. This generated averages for 168 HUC10s (2 had no data). See specific n in the table below; the result was fewer sites used to generate HUC10 averages for WQI, nutrients, and development categories.

Table 2. Sample size distribution throughout HUC10 watersheds using data from all sites with mainstem sites removed (n=1,574)

N_WQI	N_Metals	N_Nutrients	N_Develop
Min. : 1.00	Min. : 1.00	Min. : 1.000	Min. : 1.000
1st Qu.: 3.00	1st Qu.: 3.00	1st Qu.: 3.000	1st Qu.: 3.000
Median : 4.00	Median : 5.00	Median : 5.000	Median : 5.000
Mean : 5.69	Mean : 8.19	Mean : 6.839	Mean : 6.595
3rd Qu.: 7.00	3rd Qu.: 8.00	3rd Qu.: 8.000	3rd Qu.: 8.000
Max. :24.00	Max. :86.00	Max. :36.000	Max. :26.000

Here are polygon maps of mean WQI score in each HUC10 watershed

Mean WQI score in each HUC 10 — all data included.

Mean WQI score in each HUC 10 — with mainstem sites removed from the dataset.

Results showed 2 HUC10s with missing data in the upper and middle subbasins. There are noticeable increases in water quality for some HUC10s when mainstem sites are removed. We’ll have to discuss whether differences warrant the additional complexity needed to show mainstem vs. tributary water quality.

The next bit was to create a graphic that can be included in a pop-up box when IT develops the interactive web map. After a few iterations, I landed on this one. It shows the global distribution (histogram) of the overall WQI and each category, the scoring classification in the overall WQI facet, and a site-specific score plus an error term (standard deviation). Additionally, site name and sample size are included.

This figure is subject to change - will be updated as discussed previously to show top WQI histogram and bar chart with total mean category score

Figure 3. Graphic of WQI and Category scores at BNTY 0.9.

If this graphic is presentable, I can try to write a loop to create this graphic for all 1,117 sites with a WQI score.

One thing that caught my eye was the tall bars (large number of observations) of 0 scores in the metals categories. This is a result of our zeroing out of category scores. See below for quantification of 0 scores for the WQI and each category - obvious that zeroing out happens most often for metals I looked into this further, and it is due to AMD samples within dataset. The majority of sites zeroed out are on AMD impaired waterways. It happens less often for nutrients and development, but the sites where zeroing out does occur are located on impaired segments whose source cause is aligned with our WQI categories.

WQI Type	0s	N	Percent
WQI	0	1117	0%
Metals	188	1573	11.96%
Nutrient	0	1316	0%
Development	0	1281	0%