Additionally, other parameters provided by PRISM at 4 km and daily resolution include: min and max temperature, min and max vapor pressure deficit (VPD), and mean dewpoint temperature.
PRISM 4 km observations of daily cumulative precipitation (rain + snow). The day is defined as the preceding 24 hours from 1200 UTC (5am MST) on the day given. Data in the last 6 months (from July 1, 2023) is provisional and should be refreshed before final analysis.
PRISM 4 km mean daily temperature average of the high and low. The day is defined as the preceding 24 hours from 1200 UTC (5am MST) on the day given. Data in the last 6 months (from July 1, 2023) is provisional and should be refreshed before final analysis.
All vegetation indices computed from Sentinel-2 scaled spectral bands.
NDVI (Normalized Difference Vegetation Index)
Indicates health and density of vegetation and canopy structure; most commonly used index
GCVI (Green Chlorophyll Vegetation Index)
Improvement over NDVI in some scenarios; less likely to saturate at high leaf biomass; may indicate nitogren supply; has been used as a predictor of crop yield (Ulfa et al. 2022)
REIP (Red-Edge Inflection Point)
Improvement over NDVI in some scenarios; insensitive to solar elevation angle; well suited to estimation of average leaf chlorophyll content (Broge et al. 2003); more appropriate than NDVI for field crop studies and monitoring (Salvoldi et al. 2021)
NDTI (Normalized Difference Tillage Index)
Crop residue monitoring, plant canopy senescence, fire fuel conditions, grazing management; but susceptible to clouds or cloud shadows, high soil moisture, and green vegetation (Liu et al. 2022); “yellowness” index (Gan et al. 2022)
NDWI (Normalized Difference Water Index)
Sensitive to plant water content
NDBI (Normalized Difference Built-up Index)
Indicator of built-up area or structures in land use land cover studies
## [1] "Lab parameter: TOC - Sensor_parameter: FDOM"
## learner coefficients MSE se fold_sd fold_min_MSE fold_max_MSE
## 1: mean 0.000 1.88 0.136 1.29 0.44 3.56
## 2: glm 0.874 0.62 0.071 0.22 0.30 0.85
## 3: xgb.xgboost_1 0.000 22.99 0.759 5.72 15.89 28.81
## 4: xgb.xgboost_2 0.000 22.94 0.751 5.49 16.25 28.63
## 5: xgb.xgboost_3 0.000 22.94 0.751 5.49 16.25 28.63
## 6: xgb.xgboost_4 0.000 10.48 0.444 2.34 6.55 12.56
## 7: xgb.xgboost_5 0.013 10.36 0.456 1.91 7.02 11.78
## 8: xgb.xgboost_6 0.013 10.36 0.456 1.91 7.02 11.78
## 9: knn_1 0.034 1.86 0.131 1.22 0.44 3.43
## 10: knn_2 0.034 1.86 0.131 1.22 0.44 3.43
## 11: knn_3 0.034 1.86 0.131 1.22 0.44 3.43
## 12: SuperLearner NA 0.59 0.072 0.19 0.33 0.81
## [1] "RMSE: 0.770026385842371"
## [1] "RMSE normalized: 0.143845386786441"
## [1] "MAE: 0.590310180752545"
## [1] "Coefficient of Variation: 37.9637645133302"
This section lists all available or previously considered features. Only a few selected features are used in the model.
(Does not include GPS coordinates because latitude and longtitude were used as features in the transfer function.)
These are static, non-contemporaneous, outdated, and may be assign different sensor locations the same value (there are only 5 unique HUC12 units for 10 sensor locations) - so, again, don’t really know if these make sense.
Detail provided by Garrett Cole, 24 Oct 2023, email ’Machine Laerning Inputs”
Donwloaded from EPA EnviroAtlas
Still need to find source of data for the following parameters suggested by Garrett: surface runoff - quick flow (m/yr), percent tree cover in stream and lake buffer, number of sheep, number of cattle, sediment yield.
Mean daily predicted TOC (from transfer function with sensor-measured FDOM) regressed on streamflow, geographic, climatic, and satellite-derived vegetation index variables. May want to think about using the transfer function to predict daily values directly. Random slope effect for sensor.
## mean_prediction_value ~ dischrg_cfs + elev_m + slope + aspect +
## precip_in + tmean_degF + ndvi + gcvi + reip + ndti + ndwi +
## ndbi + days_since_start + (1 | mw_id)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: mlr_frmla
## Data: data_for_mlr
##
## REML criterion at convergence: 471
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -7.380 -0.546 -0.038 0.668 2.345
##
## Random effects:
## Groups Name Variance Std.Dev.
## mw_id (Intercept) 0.453 0.673
## Residual 0.162 0.402
## Number of obs: 374, groups: mw_id, 10
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 4.398427 12.963406 20.985537 0.34 0.738
## dischrg_cfs -0.000341 0.001258 354.035009 -0.27 0.786
## elev_m -0.001310 0.002979 6.009439 -0.44 0.676
## slope 0.788561 0.283395 6.033104 2.78 0.032 *
## aspect -0.011309 0.027306 6.031155 -0.41 0.693
## precip_in 0.488161 0.362186 354.039470 1.35 0.179
## tmean_degF -0.024535 0.005213 354.213909 -4.71 3.6e-06 ***
## ndvi -2.375114 0.975851 354.033472 -2.43 0.015 *
## gcvi 0.138421 0.121835 354.032425 1.14 0.257
## reip 0.000852 0.012470 354.095233 0.07 0.946
## ndti 2.288556 1.906234 354.203343 1.20 0.231
## ndwi 0.207486 1.251718 354.055372 0.17 0.868
## ndbi 0.837056 0.390656 354.112789 2.14 0.033 *
## days_since_start -0.021148 0.002125 354.309669 -9.95 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## R2m R2c
## [1,] 0.46 0.86
The fixed effects alone explain 45.88% of the variance in mean daily predicted TOC. The entire model, with the random slope term for sensor, explains 85.74% of the variance in the outcome. An example of interpretation of the coefficients: when all other predictors are equal to zeron, an increase of one degree Fahrenheit in temperature will decrease TOC by 0.025 units, or an increase of 10 degrees will decrease TOC by 0.25 units.
Consider mean centering and standardizing predictors by their standard deviation so that they are on the same scale and numerical stability improves.
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: mlr_frmla
## Data: data_for_mlr_std
##
## REML criterion at convergence: 462
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -7.380 -0.546 -0.038 0.668 2.345
##
## Random effects:
## Groups Name Variance Std.Dev.
## mw_id (Intercept) 0.453 0.673
## Residual 0.162 0.402
## Number of obs: 374, groups: mw_id, 10
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 5.2785 0.2150 5.9977 24.55 3.0e-07 ***
## dischrg_cfs -0.0092 0.0339 354.0350 -0.27 0.786
## elev_m -0.0991 0.2253 6.0095 -0.44 0.676
## slope 0.6176 0.2220 6.0331 2.78 0.032 *
## aspect -0.1010 0.2439 6.0312 -0.41 0.693
## precip_in 0.0335 0.0248 354.0395 1.35 0.179
## tmean_degF -0.2787 0.0592 354.2139 -4.71 3.6e-06 ***
## ndvi -0.3690 0.1516 354.0335 -2.43 0.015 *
## gcvi 0.1343 0.1182 354.0324 1.14 0.257
## reip 0.0019 0.0278 354.0958 0.07 0.946
## ndti 0.1014 0.0845 354.2034 1.20 0.231
## ndwi 0.0247 0.1490 354.0554 0.17 0.868
## ndbi 0.0856 0.0399 354.1128 2.14 0.033 *
## days_since_start -0.7820 0.0786 354.3097 -9.95 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
When predictors are mean centered and scaled the interpretation of the intercept and coefficients changes. For example, when all other predictors are equal to their mean, an increase of one standard deviation in temperature will decrease TOC by 0.28 units.