Introduction

Transfer Function: FDOM -> TOC

With a transfer function, we can predict in-river TOC from sensor-measured FDOM at times without a corresponding lab measurement.

Sensor parameters

  1. sensor_value_mean24hr - raw output of sensor smoothed by taking the 24-hour rolling mean
  2. hours_since_last_cleaning - number of hours since last cleaning of sensor, including first install
  3. LocationLong - longitude of sensor location
  4. LocationLat - latitude of sensor location

Demonstration of smoothed sensor output

24-hour rolling mean, demonstrated for FDOM for Chuck Lewis 1 - Sonde.

Transfer Model

Lab-measured total organic carbon (TOC) is predicted from in-river sensor-measured fluorescent dissolved organic matter (fDOM) using an ensemble machine learning model that includes algorithms performing linear regression, k-nearest neighbors, and tree-based methods.

Performance

There is a strong correlation (R = 0.79) between actual and predicted TOC. On average, the error between observed and predicted TOC is expected to be 0.77 mg/L.

Transfer Function

With the transfer model, we can predict in-river TOC at every 15-minute sensor reading, shown below and compared against lab-measured TOC at similar times.

Preliminary Inference Model: Landscape -> TOC

Outcome

To harmonize the outcome to the features, we summarized predicted TOC (from the transfer function with sensor-measured FDOM) to an average daily value for days in the dataset, shown below as mean_daily_TOC.

Explantory variables

Climate

  1. dischrg_cfs - streamflow (in cfs) at DWR Yampa Catamount Station on day of sensor reading
  2. precip_in - precipitation (in inches), including snow, cumulative in the 24 hours preceding 5am MST on day of sensor reading and at 4 km resolution
  3. tmean_degF - mean land surface temperature (in degrees Fahrenheit) in the 24 hours preceding 5am MST on day of sensor reading and at 4 km resolution

Vegetation Indices

  1. ndvi - Normalized Difference Vegetation Index; indicates health and density of vegetation and canopy structure; most commonly used index
  2. gcvi - Green Chlorophyll Vegetation Index; improvement over NDVI in some scenarios; less likely to saturate at high leaf biomass; may indicate nitogren supply; has been used as a predictor of crop yield (Ulfa et al. 2022)
  3. reip - Red-Edge Inflection Point; improvement over NDVI in some scenarios; insensitive to solar elevation angle; well suited to estimation of average leaf chlorophyll content (Broge et al. 2003); more appropriate than NDVI for field crop studies and monitoring (Salvoldi et al. 2021)
  4. ndti - Normalized Difference Tillage Index); crop residue monitoring, plant canopy senescence, fire fuel conditions, grazing management; but susceptible to clouds or cloud shadows, high soil moisture, and green vegetation (Liu et al. 2022); “yellowness” index (Gan et al. 2022)
  5. ndwi - Normalized Difference Water Index); sensitive to plant water content
  6. ndbi - Normalized Difference Built-up Index); indicator of built-up area or structures in land use land cover studies

Geography

  1. elev_m - Elevation at sensor (in m)
  2. slope - Steepness of the ground surface, in degrees calculated from the terrain DEM
  3. aspect - Compass direction that slope faces, in degrees calculated from the terrain DEM where 0=N, 90=E, 180=S, 270=W

(Does not include GPS coordinates because latitude and longitude were used as features in the transfer function.)

Features for inference model demonstrated at the location of Chuck Lewis 1 - Sonde.

Linear Mixed Effects Model

Day-averaged predicted TOC was regressed on streamflow, geographic, climatic, and satellite-derived vegetation index features. A random slope term was added for each sensor, which allows the intercept between each location and the explanatory variable to vary.

## mean_daily_TOC ~ dischrg_cfs + elev_m + slope + aspect + precip_in + 
##     tmean_degF + ndvi + gcvi + reip + ndti + ndwi + ndbi + days_since_start + 
##     (1 | mw_id)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: mlr_frmla
##    Data: data_for_mlr_std
## 
## REML criterion at convergence: 462
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -7.380 -0.546 -0.038  0.668  2.345 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  mw_id    (Intercept) 0.453    0.673   
##  Residual             0.162    0.402   
## Number of obs: 374, groups:  mw_id, 10
## 
## Fixed effects:
##                  Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)        5.2785     0.2150   5.9977   24.55  3.0e-07 ***
## dischrg_cfs       -0.0092     0.0339 354.0350   -0.27    0.786    
## elev_m            -0.0991     0.2253   6.0095   -0.44    0.676    
## slope              0.6176     0.2220   6.0331    2.78    0.032 *  
## aspect            -0.1010     0.2439   6.0312   -0.41    0.693    
## precip_in          0.0335     0.0248 354.0395    1.35    0.179    
## tmean_degF        -0.2787     0.0592 354.2139   -4.71  3.6e-06 ***
## ndvi              -0.3690     0.1516 354.0335   -2.43    0.015 *  
## gcvi               0.1343     0.1182 354.0324    1.14    0.257    
## reip               0.0019     0.0278 354.0958    0.07    0.946    
## ndti               0.1014     0.0845 354.2034    1.20    0.231    
## ndwi               0.0247     0.1490 354.0554    0.17    0.868    
## ndbi               0.0856     0.0399 354.1128    2.14    0.033 *  
## days_since_start  -0.7820     0.0786 354.3097   -9.95  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##       R2m  R2c
## [1,] 0.46 0.86

The fixed effects – streamflow, topography, climate indices, vegetation indices, and a proxy for seasonality – explain 45.88% of the variance in day-averaged predicted TOC. When considering the correlation between predicted TOC induced by repeated measurements at the same sensor locations, the model explains 85.74% of the variance. The model does reasonably well at describing TOC at the set of sensor locations in the data, but it is a greater challenge to extrapolate TOC to other locations in the river from these explanatory features.

Average daily TOC is estimated to equal 5.28 when all explanatory features are set to their observed means. When all other features are equal to their mean, an increase of one standard deviation in NDVI (i.e., green-up) will decrease TOC by 0.37 (95% CI: 0.66 - 0.07) mg/L.

UrbanSky High-resolution (6cm) Land Classification

Watershed Runoff Routing

Supplementary Material

Performance of transfer models for other parameters: Conductivity & Turbidity