Change log

Remaining To-Do

Time-varying and satellite-derived covariates

Additionally, other parameters provided by PRISM at 4 km and daily resolution include: min and max temperature, min and max vapor pressure deficit (VPD), and mean dewpoint temperature.

Precipitation

PRISM 4 km observations of daily cumulative precipitation (rain + snow). The day is defined as the preceding 24 hours from 1200 UTC (5am MST) on the day given. Data in the last 6 months (from July 1, 2023) is provisional and should be refreshed before final analysis.

Land surface temperature

PRISM 4 km mean daily temperature average of the high and low. The day is defined as the preceding 24 hours from 1200 UTC (5am MST) on the day given. Data in the last 6 months (from July 1, 2023) is provisional and should be refreshed before final analysis.

Vegetation indices

All vegetation indices computed from Sentinel-2 scaled spectral bands.

NDVI (Normalized Difference Vegetation Index)

Indicates health and density of vegetation and canopy structure; most commonly used index

GCVI (Green Chlorophyll Vegetation Index)

Improvement over NDVI in some scenarios; less likely to saturate at high leaf biomass; may indicate nitogren supply; has been used as a predictor of crop yield (Ulfa et al. 2022)

REIP (Red-Edge Inflection Point)

Improvement over NDVI in some scenarios; insensitive to solar elevation angle; well suited to estimation of average leaf chlorophyll content (Broge et al. 2003); more appropriate than NDVI for field crop studies and monitoring (Salvoldi et al. 2021)

NDTI (Normalized Difference Tillage Index)

Crop residue monitoring, plant canopy senescence, fire fuel conditions, grazing management; but susceptible to clouds or cloud shadows, high soil moisture, and green vegetation (Liu et al. 2022); “yellowness” index (Gan et al. 2022)

NDWI (Normalized Difference Water Index)

Sensitive to plant water content

NDBI (Normalized Difference Built-up Index)

Indicator of built-up area or structures in land use land cover studies

Modeling

Summary of transfer function

  1. sensor_value_mean24hr - rolling 24-hour mean of raw output of sensor
  2. hours_since_last_cleaning - number of hours since last cleaning of sensor, including first install
  3. LocationLong - longitude of sensor location
  4. LocationLat - latitude of sensor location

FDOM-TOC

## [1] "Lab parameter: TOC - Sensor_parameter: FDOM"

##           learner coefficients   MSE    se fold_sd fold_min_MSE fold_max_MSE
##  1:          mean        0.000  1.88 0.136    1.29         0.44         3.56
##  2:           glm        0.874  0.62 0.071    0.22         0.30         0.85
##  3: xgb.xgboost_1        0.000 22.99 0.759    5.72        15.89        28.81
##  4: xgb.xgboost_2        0.000 22.94 0.751    5.49        16.25        28.63
##  5: xgb.xgboost_3        0.000 22.94 0.751    5.49        16.25        28.63
##  6: xgb.xgboost_4        0.000 10.48 0.444    2.34         6.55        12.56
##  7: xgb.xgboost_5        0.013 10.36 0.456    1.91         7.02        11.78
##  8: xgb.xgboost_6        0.013 10.36 0.456    1.91         7.02        11.78
##  9:         knn_1        0.034  1.86 0.131    1.22         0.44         3.43
## 10:         knn_2        0.034  1.86 0.131    1.22         0.44         3.43
## 11:         knn_3        0.034  1.86 0.131    1.22         0.44         3.43
## 12:  SuperLearner           NA  0.59 0.072    0.19         0.33         0.81
## [1] "RMSE: 0.770026385842371"
## [1] "RMSE normalized: 0.143845386786441"
## [1] "MAE: 0.590310180752545"
## [1] "Coefficient of Variation: 37.9637645133302"

Inference Model

Features

This section lists all available or previously considered features. Only a few selected features are used in the model.

Climate

  1. dischrg_cfs - streamflow (in cfs) at DWR Yampa Catamount Station on day of sensor reading
  2. precip_in - precipitation (in inches), including snow, cumulative in the 24 hours preceding 5am MST on day of sensor reading and at 4 km resolution
  3. tmean_degF - mean land surface temperature (in degrees Fahrenheit) in the 24 hours preceding 5am MST on day of sensor reading and at 4 km resolution

Vegetation Indices

  1. ndvi - Normalized Difference Vegetation Index; indicates health and density of vegetation and canopy structure; most commonly used index
  2. gcvi - Green Chlorophyll Vegetation Index; improvement over NDVI in some scenarios; less likely to saturate at high leaf biomass; may indicate nitogren supply; has been used as a predictor of crop yield (Ulfa et al. 2022)
  3. reip - Red-Edge Inflection Point; improvement over NDVI in some scenarios; insensitive to solar elevation angle; well suited to estimation of average leaf chlorophyll content (Broge et al. 2003); more appropriate than NDVI for field crop studies and monitoring (Salvoldi et al. 2021)
  4. ndti - Normalized Difference Tillage Index); crop residue monitoring, plant canopy senescence, fire fuel conditions, grazing management; but susceptible to clouds or cloud shadows, high soil moisture, and green vegetation (Liu et al. 2022); “yellowness” index (Gan et al. 2022)
  5. ndwi - Normalized Difference Water Index); sensitive to plant water content
  6. ndbi - Normalized Difference Built-up Index); indicator of built-up area or structures in land use land cover studies

Geography

  1. elev_m - Elevation at sensor (in m)
  2. slope - Steepness of the ground surface, in degrees calculated from the terrain DEM
  3. aspect - Compass direction that slope faces, in degrees calculated from the terrain DEM where 0=N, 90=E, 180=S, 270=W

(Does not include GPS coordinates because latitude and longtitude were used as features in the transfer function.)

SPARROW inputs

These are static, non-contemporaneous, outdated, and may be assign different sensor locations the same value (there are only 5 unique HUC12 units for 10 sensor locations) - so, again, don’t really know if these make sense.

Detail provided by Garrett Cole, 24 Oct 2023, email ’Machine Laerning Inputs”

Donwloaded from EPA EnviroAtlas

  1. pct_pasture - Percent of land managed as pasture in each subwatershed (HUC12). Pasture areas are planted for livestock grazing or the production of seed or hay crops. (2011)
  2. pct_pasture_slope3 - Percentage of land area within each subwatershed (HUC12) that is classified as pasture on areas with slopes greater than or equal to three percent. (2011)
  3. pct_pasture_slope9 - Percentage of land area within each subwatershed (HUC12) that is classified as pasture on areas with slopes greater than or equal to nine percent. (2011)
  4. pct_dev - Percentage of land area within each subwatershed (HUC12) that is classified as developed; developed land cover includes a variety of development, such as single family homes, multifamily housing units, retail, commercial, industrial sites, and associated infrastructure. Developed land cover is not confined to city limits. (2011)
  5. pct_forest - Percentage of land area within each subwatershed (HUC12) that is covered by trees and forest. (2011)
  6. pct_forest_riparian - Percentage of land within 45 meters of streams, rivers, and other hydrologically connected waterbodies within each subwatershed (HUC12) that is covered by trees and forests. (2011)
  7. downstream_ag_m - The average width in meters of buffers that are contiguous to the stream and are intersected by agricultural flow paths. This metric does not include any riparian areas that do not also contain agriculture upslope. (2006)
  8. pct_ag_floodplain - Percentage of land area in estimated floodplains in the subwatershed (HUC12) classified as agriculture. (2011)
  9. fertilizer_P_kg_ha - Application rate of inorganic phosphorus (P) fertilizer on agricultural land in kilograms P per hectare within each subwatershed (HUC12). (2012)
  10. fertilizer_N_kg_ha - Mean rate of synthetic nitrogen fertilizer application to agricultural lands within each subwatershed (HUC12) in kg N/ha/yr. (2006)
  11. ag_runoff_N_tons - Modeled estimates of the movement (flux) of nitrogen dissolved in surface runoff at the outer edges of all agricultural fields within each subwatershed (HUC12) in metric tons of N. (2002)
  12. ag_runoff_P_tons - Modeled estimates of the movement (flux) of phosphorous dissolved in surface runoff at the outer edges of all agricultural fields within each subwatershed (HUC12) in metric tons of N. (2002)
  13. drainage_density_km_km2 - Drainage density within each subwatershed (12-digit HUC). The density is equal to the total stream length in kilometers within a subwatershed divided by its total area in square kilometers. (2015-2016)

Still need to find source of data for the following parameters suggested by Garrett: surface runoff - quick flow (m/yr), percent tree cover in stream and lake buffer, number of sheep, number of cattle, sediment yield.

Linear Mixed Effects Model

Mean daily predicted TOC (from transfer function with sensor-measured FDOM) regressed on streamflow, geographic, climatic, and satellite-derived vegetation index variables. May want to think about using the transfer function to predict daily values directly. Random slope effect for sensor.

## mean_prediction_value ~ dischrg_cfs + elev_m + slope + aspect + 
##     precip_in + tmean_degF + ndvi + gcvi + reip + ndti + ndwi + 
##     ndbi + days_since_start + (1 | mw_id)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: mlr_frmla
##    Data: data_for_mlr
## 
## REML criterion at convergence: 471
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -7.380 -0.546 -0.038  0.668  2.345 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  mw_id    (Intercept) 0.453    0.673   
##  Residual             0.162    0.402   
## Number of obs: 374, groups:  mw_id, 10
## 
## Fixed effects:
##                    Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        4.398427  12.963406  20.985537    0.34    0.738    
## dischrg_cfs       -0.000341   0.001258 354.035009   -0.27    0.786    
## elev_m            -0.001310   0.002979   6.009439   -0.44    0.676    
## slope              0.788561   0.283395   6.033104    2.78    0.032 *  
## aspect            -0.011309   0.027306   6.031155   -0.41    0.693    
## precip_in          0.488161   0.362186 354.039470    1.35    0.179    
## tmean_degF        -0.024535   0.005213 354.213909   -4.71  3.6e-06 ***
## ndvi              -2.375114   0.975851 354.033472   -2.43    0.015 *  
## gcvi               0.138421   0.121835 354.032425    1.14    0.257    
## reip               0.000852   0.012470 354.095233    0.07    0.946    
## ndti               2.288556   1.906234 354.203343    1.20    0.231    
## ndwi               0.207486   1.251718 354.055372    0.17    0.868    
## ndbi               0.837056   0.390656 354.112789    2.14    0.033 *  
## days_since_start  -0.021148   0.002125 354.309669   -9.95  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##       R2m  R2c
## [1,] 0.46 0.86

The fixed effects alone explain 45.88% of the variance in mean daily predicted TOC. The entire model, with the random slope term for sensor, explains 85.74% of the variance in the outcome. An example of interpretation of the coefficients: when all other predictors are equal to zeron, an increase of one degree Fahrenheit in temperature will decrease TOC by 0.025 units, or an increase of 10 degrees will decrease TOC by 0.25 units.

Consider mean centering and standardizing predictors by their standard deviation so that they are on the same scale and numerical stability improves.

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: mlr_frmla
##    Data: data_for_mlr_std
## 
## REML criterion at convergence: 462
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -7.380 -0.546 -0.038  0.668  2.345 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  mw_id    (Intercept) 0.453    0.673   
##  Residual             0.162    0.402   
## Number of obs: 374, groups:  mw_id, 10
## 
## Fixed effects:
##                  Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)        5.2785     0.2150   5.9977   24.55  3.0e-07 ***
## dischrg_cfs       -0.0092     0.0339 354.0350   -0.27    0.786    
## elev_m            -0.0991     0.2253   6.0095   -0.44    0.676    
## slope              0.6176     0.2220   6.0331    2.78    0.032 *  
## aspect            -0.1010     0.2439   6.0312   -0.41    0.693    
## precip_in          0.0335     0.0248 354.0395    1.35    0.179    
## tmean_degF        -0.2787     0.0592 354.2139   -4.71  3.6e-06 ***
## ndvi              -0.3690     0.1516 354.0335   -2.43    0.015 *  
## gcvi               0.1343     0.1182 354.0324    1.14    0.257    
## reip               0.0019     0.0278 354.0958    0.07    0.946    
## ndti               0.1014     0.0845 354.2034    1.20    0.231    
## ndwi               0.0247     0.1490 354.0554    0.17    0.868    
## ndbi               0.0856     0.0399 354.1128    2.14    0.033 *  
## days_since_start  -0.7820     0.0786 354.3097   -9.95  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

When predictors are mean centered and scaled the interpretation of the intercept and coefficients changes. For example, when all other predictors are equal to their mean, an increase of one standard deviation in temperature will decrease TOC by 0.28 units.