Code was previously developed to complete WQI scoring in R (see: http://rpubs.com/mattshank20/R_srbWQI) . Since then, Dawn and Joanna compiled and cleaned data from 2000-2018, which resulted in 19,831 records. I took the dataset and:
1. replaced all non-detects (‘ND’) with the detection limit we used in the WQI development
2. increased any concentrations < detection limit to the detection limit we used in the WQI development
3. revised zeroing out schema to ONLY parameters with aquatic life use thresholds (Fe (>= 1.5 mg/L) and Al (>= 0.75 mg/L))
All signs point to success. 8,061 samples had enough data to be scored. WQI scores ranged from 4.2 - 96.3 There were 11,944, 13,118, and 9,612 scores generated for the Metals, Nutrients, and Development categories, respectively. Metals category scores ranged from 0-100, while nutrients and development category scores ranged from 0.2-100 and 0.6-100, respectively.
After the scores at the sample level were generated, I summarized scores by unique site (StationID – NOT AliasID (contains duplicates)). This way, when we average WQI scores by HUC10, each site will have equal influence; sample size will be tracked but not influence HUC10 level averages.
After averaging WQI by AliasID, there were 1,117 sites with enough data to generate an average WQI score. N ranged from 0-239 across sites. See below for the distribution of sites throughout our Classification.
| Classification | N | % |
|---|---|---|
| Excellent | 33 | 2.95 |
| Good | 280 | 25.1 |
| Fair | 412 | 36.9 |
| Poor | 257 | 23.0 |
| Very Poor | 135 | 12.1 |
Additionally, 1,573, 1,316, and 1,281 sites had enough data to calculate metals, nutrients, and development scores, respectively. The total number of sites with enough data to calculate at least one category score was 1,777.
As per our discussion with Jeff Z., we calculated average WQI and category scores for each HUC10 watershed using 1) all sites and 2) all sites without including mainstem sites. Our thought was that when clean tribs confluence with large rivers of poorer water quality, but the sites are included in the same HUC10, there could be an issue with eclipsing where average WQI is not representative. The 2 iterations completed and visualized below show the results.
There are 170 HUC10 watersheds in the SRB. There is at least one site in each HUC10 for WQI and each category. See specific n for each in the table below.
Table 1. Sample size distribution throughout HUC10 watersheds using data from all sites (n=1,777)| N_WQI | N_Metals | N_Nutrients | N_Develop | |
|---|---|---|---|---|
| Min. : 1.000 | Min. : 1.000 | Min. : 1.000 | Min. : 1.000 | |
| 1st Qu.: 3.000 | 1st Qu.: 3.000 | 1st Qu.: 3.000 | 1st Qu.: 3.000 | |
| Median : 5.000 | Median : 5.000 | Median : 5.000 | Median : 5.000 | |
| Mean : 6.571 | Mean : 9.253 | Mean : 7.741 | Mean : 7.535 | |
| 3rd Qu.: 8.000 | 3rd Qu.: 9.000 | 3rd Qu.: 9.000 | 3rd Qu.: 9.000 | |
| Max. :39.000 | Max. :86.000 | Max. :44.000 | Max. :44.000 |
I then went into GIS and identified all sites on mainstem rivers (Chemung, Susquehanna, West Branch, and Juniata; n=203). This resulted in a dataset of 1,574 sites used to calculate average WQI and category scores for each HUC10. This generated averages for 168 HUC10s (2 had no data). See specific n in the table below; the result was fewer sites used to generate HUC10 averages for WQI, nutrients, and development categories.
Table 2. Sample size distribution throughout HUC10 watersheds using data from all sites with mainstem sites removed (n=1,574)| N_WQI | N_Metals | N_Nutrients | N_Develop | |
|---|---|---|---|---|
| Min. : 1.00 | Min. : 1.00 | Min. : 1.000 | Min. : 1.000 | |
| 1st Qu.: 3.00 | 1st Qu.: 3.00 | 1st Qu.: 3.000 | 1st Qu.: 3.000 | |
| Median : 4.00 | Median : 5.00 | Median : 5.000 | Median : 5.000 | |
| Mean : 5.69 | Mean : 8.19 | Mean : 6.839 | Mean : 6.595 | |
| 3rd Qu.: 7.00 | 3rd Qu.: 8.00 | 3rd Qu.: 8.000 | 3rd Qu.: 8.000 | |
| Max. :24.00 | Max. :86.00 | Max. :36.000 | Max. :26.000 |
Here are polygon maps of mean WQI score in each HUC10 watershed
Mean WQI score in each HUC 10 — all data included.
Mean WQI score in each HUC 10 — with mainstem sites removed from the dataset.
Results showed 2 HUC10s with missing data in the upper and middle subbasins. There are noticeable increases in water quality for some HUC10s when mainstem sites are removed. We’ll have to discuss whether differences warrant the additional complexity needed to show mainstem vs. tributary water quality.
The next bit was to create a graphic that can be included in a pop-up box when IT develops the interactive web map. After a few iterations, I landed on this one. It shows the global distribution (histogram) of the overall WQI and each category, the scoring classification in the overall WQI facet, and a site-specific score plus an error term (standard deviation). Additionally, site name and sample size are included.
If this graphic is presentable, I can try to write a loop to create this graphic for all 1,117 sites with a WQI score.
One thing that caught my eye was the tall bars (large number of observations) of 0 scores in the metals categories. This is a result of our zeroing out of category scores. See below for quantification of 0 scores for the WQI and each category - obvious that zeroing out happens most often for metals I looked into this further, and it is due to AMD samples within dataset. The majority of sites zeroed out are on AMD impaired waterways. It happens less often for nutrients and development, but the sites where zeroing out does occur are located on impaired segments whose source cause is aligned with our WQI categories.
| WQI Type | 0s | N | Percent |
|---|---|---|---|
| WQI | 0 | 1117 | 0% |
| Metals | 188 | 1573 | 11.96% |
| Nutrient | 0 | 1316 | 0% |
| Development | 0 | 1281 | 0% |