Change log

Clean up / update this file and archive previous versions on github
Create Shiny App for visualizing and downloading Yampa lab data: https://drip.shinyapps.io/wq-lab/
Ingest Poudre data into master datasets for sensor and lab data (not yet shown)

Remaining To-Dos

Investigate Poudre data
Exploratory data analysis to inform how/whether FDOM can predict any of TOC, TN, TKN, Nitrate, or Phosphorus
- Look at correlation between TOC and nutrients
- Suggest thresholding by modeling each nutrient separately against FDOM using tree-based method (or segmented regression)
- Suggest relationships between FDOM, nutrients, and their interaction with linear combination of FDOM (outcome) regressed on lab nutrients and/or deep (e.g. depth=8) tree-based model

Data Cleaning

Applied suggested limits by Kat provided in Notion ‘Basic wq data cleaning’

##    river parameterName n_prefilter n_cleaned pct_cleaned
## 1: yampa         Chl-a       99127     34431          35
## 2: yampa  Conductivity       97267     27934          29
## 3: yampa          FDOM       99139     34140          34
## 4: yampa     Turbidity       93806     25792          27

##      river lab_parameter n_prefilter n_cleaned pct_cleaned
##  1: poudre    Phosphorus         148         9         6.1
##  2: poudre     Potassium         148         9         6.1
##  3: poudre           TOC         148         9         6.1
##  4: poudre  Conductivity         148         9         6.1
##  5: poudre       Nitrate         148         9         6.1
##  6: poudre      Kjeldahl         148         9         6.1
##  7: poudre            TN         148         9         6.1
##  8: poudre     Turbidity         148         9         6.1
##  9: poudre         Chl-a         148         9         6.1
## 10:  yampa    Phosphorus         268       225        84.0
## 11:  yampa     Potassium         268       225        84.0
## 12:  yampa           TOC         268       225        84.0
## 13:  yampa  Conductivity         268       225        84.0
## 14:  yampa       Nitrate         268       225        84.0
## 15:  yampa      Kjeldahl         268       225        84.0
## 16:  yampa            TN         268       225        84.0
## 17:  yampa     Turbidity         268       225        84.0
## 18:  yampa         Chl-a         268       225        84.0

Removed spikes in sensor data that were 3 standard deviations away from both the previous and consecutive observations. If relatively high values persist (i.e. are observed at more than one consecutive 15-min reading), they are retained. Below is an example from two days in August for turbidity measurements before and after cleaning.

Does these times of sample collection make sense?

##     time_of_day_sample_collected  N
##  1:                           06  3
##  2:                           07 11
##  3:                           08 10
##  4:                           09  6
##  5:                           10  8
##  6:                           11 35
##  7:                           12 45
##  8:                           13 40
##  9:                           14 16
## 10:                           15 30
## 11:                           16 18
## 12:                           17  8
## 13:                           18  1
## 14:                           20  1
## 15:                           21  2

Daily Aggregation

Rolling 24-hour mean, demonstrated for turbidity from two dates in August.

Features

Sensor parameters

sensor_value - raw output of sensor
hours_since_last_cleaning - number of hours since last cleaning of sensor, including first install
LocationLong - longitude of sensor location
LocationLat - latitude of sensor location

Yampa

Predict water quality parameters with rolling 24-hour sensor value mean

Conductivity

## [1] "Lab parameter: Conductivity - Sensor_parameter: Conductivity"

##           learner coefficients    MSE   se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean        0.000  10530  680    2369         7160        12448
##  2:           glm        0.831   1127  193     543          505         1925
##  3: xgb.xgboost_1        0.000 128413 3964   46744        74985       184283
##  4: xgb.xgboost_2        0.000 128449 3963   46778        74985       184283
##  5: xgb.xgboost_3        0.000 128449 3963   46778        74985       184283
##  6: xgb.xgboost_4        0.000  54404 1855   22475        23227        78481
##  7: xgb.xgboost_5        0.000  54417 1849   22662        23230        78481
##  8: xgb.xgboost_6        0.000  54407 1849   22665        23230        78481
##  9:         knn_1        0.056   1964  243     514         1270         2570
## 10:         knn_2        0.056   1964  243     514         1270         2570
## 11:         knn_3        0.056   1964  243     514         1270         2570
## 12:  SuperLearner           NA   1091  191     564          558         1909
## [1] "RMSE: 33.0308039545134"
## [1] "RMSE normalized: 0.0902492451323614"
## [1] "MAE: 24.026196364925"
## [1] "Coefficient of Variation: 13.0689618271284"

Turbidity

## [1] "Lab parameter: Turbidity - Sensor_parameter: Turbidity"

##           learner coefficients MSE se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean          0.0 133 47      89         32.4          234
##  2:           glm          0.7  83 24      66          6.1          173
##  3: xgb.xgboost_1          0.0 169 54     136          9.8          311
##  4: xgb.xgboost_2          0.0 169 54     136          9.7          311
##  5: xgb.xgboost_3          0.0 169 54     136          9.7          312
##  6: xgb.xgboost_4          0.0 120 43      98          7.7          200
##  7: xgb.xgboost_5          0.0 121 43      99          7.9          205
##  8: xgb.xgboost_6          0.3 120 43      98          8.6          203
##  9:         knn_1          0.0 114 39      83         27.4          216
## 10:         knn_2          0.0 114 39      83         27.4          216
## 11:         knn_3          0.0 114 39      83         27.4          216
## 12:  SuperLearner           NA  74 26      71          6.1          161
## [1] "RMSE: 8.62004900148416"
## [1] "RMSE normalized: 1.11657946246983"
## [1] "MAE: 4.36896670927391"
## [1] "Coefficient of Variation: 59.2185555754875"

FDOM

##           learner coefficients   MSE    se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean        0.000  1.88 0.136    1.29         0.44         3.56
##  2:           glm        0.874  0.62 0.071    0.22         0.30         0.85
##  3: xgb.xgboost_1        0.000 22.99 0.759    5.72        15.89        28.81
##  4: xgb.xgboost_2        0.000 22.94 0.751    5.49        16.25        28.63
##  5: xgb.xgboost_3        0.000 22.94 0.751    5.49        16.25        28.63
##  6: xgb.xgboost_4        0.000 10.48 0.444    2.34         6.55        12.56
##  7: xgb.xgboost_5        0.013 10.36 0.456    1.91         7.02        11.78
##  8: xgb.xgboost_6        0.013 10.36 0.456    1.91         7.02        11.78
##  9:         knn_1        0.034  1.86 0.131    1.22         0.44         3.43
## 10:         knn_2        0.034  1.86 0.131    1.22         0.44         3.43
## 11:         knn_3        0.034  1.86 0.131    1.22         0.44         3.43
## 12:  SuperLearner           NA  0.59 0.072    0.19         0.33         0.81
## [1] "RMSE: 0.770026385842371"
## [1] "RMSE normalized: 0.143845386786441"
## [1] "MAE: 0.590310180752545"
## [1] "Coefficient of Variation: 37.9637645133302"

Total Nitrogen

## [1] "Lab parameter: TN - Sensor_parameter: FDOM"

##           learner coefficients   MSE     se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean        0.000 0.064 0.0056   0.032        0.023        0.091
##  2:           glm        0.646 0.052 0.0071   0.031        0.028        0.103
##  3: xgb.xgboost_1        0.090 0.092 0.0066   0.057        0.037        0.172
##  4: xgb.xgboost_2        0.000 0.093 0.0067   0.059        0.038        0.179
##  5: xgb.xgboost_3        0.000 0.093 0.0067   0.059        0.040        0.179
##  6: xgb.xgboost_4        0.000 0.069 0.0056   0.052        0.026        0.148
##  7: xgb.xgboost_5        0.000 0.074 0.0062   0.059        0.029        0.170
##  8: xgb.xgboost_6        0.000 0.077 0.0066   0.062        0.030        0.179
##  9:         knn_1        0.088 0.062 0.0055   0.030        0.023        0.087
## 10:         knn_2        0.088 0.062 0.0055   0.030        0.023        0.087
## 11:         knn_3        0.088 0.062 0.0055   0.030        0.023        0.087
## 12:  SuperLearner           NA 0.048 0.0057   0.028        0.022        0.084
## [1] "RMSE: 0.219782205340004"
## [1] "RMSE normalized: 0.731026752749025"
## [1] "MAE: 0.170337280157402"
## [1] "Coefficient of Variation: 83.9673769708658"

Kjeldahl

## [1] "Lab parameter: Kjeldahl - Sensor_parameter: FDOM"

##           learner coefficients   MSE     se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean        0.000 0.043 0.0035   0.018        0.017        0.064
##  2:           glm        0.760 0.028 0.0031   0.009        0.019        0.042
##  3: xgb.xgboost_1        0.000 0.094 0.0064   0.054        0.045        0.168
##  4: xgb.xgboost_2        0.000 0.094 0.0064   0.054        0.046        0.169
##  5: xgb.xgboost_3        0.000 0.094 0.0064   0.054        0.046        0.169
##  6: xgb.xgboost_4        0.000 0.053 0.0039   0.031        0.026        0.101
##  7: xgb.xgboost_5        0.000 0.054 0.0040   0.032        0.025        0.104
##  8: xgb.xgboost_6        0.078 0.054 0.0040   0.031        0.024        0.101
##  9:         knn_1        0.054 0.041 0.0035   0.017        0.017        0.062
## 10:         knn_2        0.054 0.041 0.0035   0.017        0.017        0.062
## 11:         knn_3        0.054 0.041 0.0035   0.017        0.017        0.062
## 12:  SuperLearner           NA 0.027 0.0028   0.010        0.015        0.037
## [1] "RMSE: 0.164654874689819"
## [1] "RMSE normalized: 0.665090651039663"
## [1] "MAE: 0.134745653812427"
## [1] "Coefficient of Variation: 71.0062427941777"

Nitrate

## [1] "Lab parameter: Nitrate - Sensor_parameter: FDOM"

##           learner coefficients    MSE     se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean       0.6521 0.0087 0.0028  0.0111      0.00046        0.027
##  2:           glm       0.2343 0.0108 0.0029  0.0098      0.00495        0.028
##  3: xgb.xgboost_1       0.0000 0.1932 0.0037  0.0269      0.17034        0.238
##  4: xgb.xgboost_2       0.0039 0.1921 0.0036  0.0250      0.17034        0.233
##  5: xgb.xgboost_3       0.0039 0.1921 0.0036  0.0250      0.17034        0.233
##  6: xgb.xgboost_4       0.0000 0.0969 0.0035  0.0463      0.06343        0.178
##  7: xgb.xgboost_5       0.0000 0.1025 0.0045  0.0598      0.06343        0.208
##  8: xgb.xgboost_6       0.0000 0.0999 0.0041  0.0545      0.06343        0.196
##  9:         knn_1       0.0352 0.0091 0.0028  0.0109      0.00102        0.027
## 10:         knn_2       0.0352 0.0091 0.0028  0.0109      0.00102        0.027
## 11:         knn_3       0.0352 0.0091 0.0028  0.0109      0.00102        0.027
## 12:  SuperLearner           NA 0.0085 0.0028  0.0109      0.00112        0.027
## [1] "RMSE: 0.092088224013393"
## [1] "RMSE normalized: 1.80900678975075"
## [1] "MAE: 0.0593522868363946"
## [1] "Coefficient of Variation: 104.810380544096"

Phosphorus

Cholorphyll-A

Total Nitrogen

## [1] "Lab parameter: TN - Sensor_parameter: Chl-a"

##           learner coefficients   MSE     se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean        0.000 0.064 0.0057   0.016        0.048        0.091
##  2:           glm        0.865 0.045 0.0050   0.017        0.021        0.063
##  3: xgb.xgboost_1        0.000 0.094 0.0067   0.041        0.049        0.138
##  4: xgb.xgboost_2        0.000 0.094 0.0067   0.041        0.049        0.139
##  5: xgb.xgboost_3        0.013 0.094 0.0067   0.041        0.049        0.139
##  6: xgb.xgboost_4        0.000 0.072 0.0057   0.026        0.046        0.106
##  7: xgb.xgboost_5        0.000 0.073 0.0058   0.028        0.047        0.111
##  8: xgb.xgboost_6        0.000 0.074 0.0059   0.029        0.045        0.114
##  9:         knn_1        0.041 0.064 0.0057   0.013        0.049        0.085
## 10:         knn_2        0.041 0.064 0.0057   0.013        0.049        0.085
## 11:         knn_3        0.041 0.064 0.0057   0.013        0.049        0.085
## 12:  SuperLearner           NA 0.044 0.0049   0.015        0.024        0.061
## [1] "RMSE: 0.210514196966526"
## [1] "RMSE normalized: 0.703354098099235"
## [1] "MAE: 0.168207705388649"
## [1] "Coefficient of Variation: 76.8406299098867"

Kjeldahl

## [1] "Lab parameter: Kjeldahl - Sensor_parameter: Chl-a"

##           learner coefficients   MSE     se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean        0.000 0.041 0.0034   0.013        0.022        0.055
##  2:           glm        0.875 0.030 0.0036   0.016        0.017        0.058
##  3: xgb.xgboost_1        0.000 0.096 0.0065   0.039        0.051        0.138
##  4: xgb.xgboost_2        0.000 0.096 0.0065   0.039        0.051        0.138
##  5: xgb.xgboost_3        0.000 0.096 0.0065   0.039        0.051        0.138
##  6: xgb.xgboost_4        0.000 0.060 0.0044   0.023        0.034        0.089
##  7: xgb.xgboost_5        0.000 0.060 0.0044   0.024        0.034        0.090
##  8: xgb.xgboost_6        0.000 0.060 0.0043   0.023        0.035        0.089
##  9:         knn_1        0.042 0.041 0.0035   0.010        0.023        0.048
## 10:         knn_2        0.042 0.041 0.0035   0.010        0.023        0.048
## 11:         knn_3        0.042 0.041 0.0035   0.010        0.023        0.048
## 12:  SuperLearner           NA 0.030 0.0034   0.015        0.019        0.055
## [1] "RMSE: 0.172179941701376"
## [1] "RMSE normalized: 0.69861948688877"
## [1] "MAE: 0.13784452263106"
## [1] "Coefficient of Variation: 77.4371506982431"

Nitrate

## [1] "Lab parameter: Nitrate - Sensor_parameter: Chl-a"

##           learner coefficients    MSE     se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean       0.4223 0.0095 0.0029   0.013       0.0022        0.033
##  2:           glm       0.5719 0.0089 0.0028   0.012       0.0007        0.030
##  3: xgb.xgboost_1       0.0046 0.1911 0.0035   0.041       0.1388        0.226
##  4: xgb.xgboost_2       0.0000 0.1911 0.0035   0.041       0.1388        0.226
##  5: xgb.xgboost_3       0.0012 0.1911 0.0035   0.041       0.1388        0.226
##  6: xgb.xgboost_4       0.0000 0.0837 0.0026   0.027       0.0543        0.118
##  7: xgb.xgboost_5       0.0000 0.0829 0.0027   0.027       0.0543        0.117
##  8: xgb.xgboost_6       0.0000 0.0831 0.0027   0.027       0.0543        0.118
##  9:         knn_1       0.0000 0.0099 0.0029   0.013       0.0027        0.034
## 10:         knn_2       0.0000 0.0099 0.0029   0.013       0.0027        0.034
## 11:         knn_3       0.0000 0.0099 0.0029   0.013       0.0027        0.034
## 12:  SuperLearner           NA 0.0082 0.0028   0.012       0.0010        0.031
## [1] "RMSE: 0.0902906125251513"
## [1] "RMSE normalized: 1.78168362030871"
## [1] "MAE: 0.0585938849504407"
## [1] "Coefficient of Variation: 101.067121804499"

Phosphorus

Transfer function - Yampa

Proceed with non-bias corrected (not necessary) and rolling 24-hour sensor values as predictors (better performance).

Conductivity

For this outcome, transfer function will use only sensor_value as the predictor.

## [1] "Lab parameter: Conductivity - Sensor_parameter: Conductivity"

##           learner coefficients    MSE   se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean       0.0038  10530  680    2369         7160        12448
##  2:           glm       0.9949   1115  185     604          482         1913
##  3: xgb.xgboost_1       0.0000 128710 3973   47066        74989       185101
##  4: xgb.xgboost_2       0.0000 128710 3973   47066        74989       185101
##  5: xgb.xgboost_3       0.0000 128710 3973   47066        74989       185101
##  6: xgb.xgboost_4       0.0000  53394 1861   22990        25516        79218
##  7: xgb.xgboost_5       0.0000  53406 1861   22990        25566        79264
##  8: xgb.xgboost_6       0.0013  53417 1862   23001        25566        79264
##  9:         knn_1       0.0000   1386  207     958          564         2851
## 10:         knn_2       0.0000   1386  207     958          564         2851
## 11:         knn_3       0.0000   1386  207     958          564         2851
## 12:  SuperLearner           NA   1114  184     600          498         1920
## [1] "RMSE: 33.3820393016348"
## [1] "RMSE normalized: 0.091208916746324"
## [1] "MAE: 23.9534414956678"
## [1] "Coefficient of Variation: 13.3483789971074"

Turbidity

FDOM-TOC

Demonstration of uncertainty estimates in transfer function based on bias-corrected bootstrapped replicates of transfer function and the normal approximation equi-tailed two-sided 95% confidence intervals (i.e. statistics +- 1.96*SD). In final product, will need to increase number of replicates and consider bias-corrected and accelerated(BCa) bootstrap interval.

Chlorophyll-A

Did not apply a transfer function since parameter does not describe any lab-measured parameter well. Moreover, the sensor output is not an informative predictor.

## [1] "Lab parameter: TN - Sensor_parameter: Chl-a"

##           learner coefficients   MSE     se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean         0.23 0.064 0.0057  0.0164        0.048        0.091
##  2:           glm         0.00 0.065 0.0057  0.0166        0.049        0.093
##  3: xgb.xgboost_1         0.00 0.094 0.0066  0.0408        0.049        0.140
##  4: xgb.xgboost_2         0.00 0.094 0.0066  0.0408        0.049        0.140
##  5: xgb.xgboost_3         0.00 0.094 0.0066  0.0406        0.049        0.139
##  6: xgb.xgboost_4         0.00 0.072 0.0053  0.0237        0.044        0.101
##  7: xgb.xgboost_5         0.00 0.072 0.0053  0.0234        0.043        0.098
##  8: xgb.xgboost_6         0.15 0.071 0.0052  0.0219        0.044        0.097
##  9:         knn_1         0.21 0.059 0.0051  0.0101        0.051        0.077
## 10:         knn_2         0.21 0.059 0.0051  0.0101        0.051        0.077
## 11:         knn_3         0.21 0.059 0.0051  0.0101        0.051        0.077
## 12:  SuperLearner           NA 0.058 0.0049  0.0095        0.050        0.074
## [1] "RMSE: 0.240290701506969"
## [1] "RMSE normalized: 0.802841100863809"
## [1] "MAE: 0.193874136473176"
## [1] "Coefficient of Variation: 100.11567215922"

US Watershed Carbon - Transfer Function

Katie Fankhauser

2024-02-06

Change log

Remaining To-Dos

Data Cleaning

Daily Aggregation

Features

Sensor parameters

Yampa

Conductivity

Turbidity

FDOM

TOC

Total Nitrogen

Kjeldahl

Nitrate

Phosphorus

Cholorphyll-A

Total Nitrogen

Kjeldahl

Nitrate

Phosphorus

Transfer function - Yampa

Conductivity

Turbidity

FDOM-TOC

Chlorophyll-A