SOILM_GLOBAL

This RMarkdown file is compiled with code from the publicly accessible github repository github.com/stineb/soilm_global.

The starting point of this analysis is that we found that all the remote sensing-driven GPP models that we looked at have a tendency to overestimate GPP during conditions when soils are dry. Apparently, the apparent fractional reduction in light use efficiency, fLUE, which I derived in the previous paper using artificial neural networks, is strongly related - both in magnitude and timing - to the model biases. Below, I’m showing this, derive an empirical correction factor to account for soil moisture limitation on GPP, and assess implications of this mechanism for global GPP variability.

The problem: high bias in simulated GPP under dry conditions

Pooled data from all available sites

First, collect FLUXNET 2015 data, environmental forcing data and P-model output into a common dataframe. This performs a data “cleaning” of FLUXNET 2015 data (see clean_fluxnet.R), calculates additional variables (e.g. soilm_obs_mean) and writes all data as an R data frame (tibble, data/df_modobs_<simsuite>_*.Rdata).

Pooled data from soil moisture focus sites

For a subset of the FLUXNET Tier 1 sites, a clear soil moisture control on light use efficiency is evident from the parallel CO2 flux, VPD, and and soil moisture data. These sites were used in Stocker et al. (2018) New Phytologist and allow us here to investigate the bias in GPP model predictions to be related to the degree of soil moisture limitation. The latter is quantified by fLUE, the fractional of light use efficiency reduction due to soil moisture.

First, aggregate data for the subset of these 71 sites only. This also reads originally downloaded MODIS GPP files and saves data for each site as Rdata files.

source("aggregate_nn_fluxnet2015.R")

Then align data along the start of “fLUE droughts”. This also aggregates across drought events for each site and thereby quantifies the mean course of variables through drought events (not shown). Furthermore, this retains only datapoints that are either 20 days before the onset of a drought or during droughts and thereby allows us to focus on model bias associated with soil dryness. This uses only data with successcode 1, i.e. where a certain minimum number of days in the data were classified as fLUE drought in Stocker et al. (subm.).

source("execute_reshape_align_nn_fluxnet2015.R")

## Loading required package: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Loading required package: readr

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   mysitename = col_character(),
##   successcode = col_integer(),
##   ndays = col_integer()
## )

## See spec(...) for full column specifications.

## [1] "aligning data for all sites ..."
## [1] "reshaping for site AR-Vir"

## Loading required package: tidyr

## [1] "reshaping for site AU-Ade"
## [1] "reshaping for site AU-ASM"
## [1] "reshaping for site AU-DaP"
## [1] "reshaping for site AU-DaS"
## [1] "reshaping for site AU-Dry"
## [1] "reshaping for site AU-Fog"
## [1] "reshaping for site AU-Gin"
## [1] "reshaping for site AU-How"
## [1] "reshaping for site AU-Stp"
## [1] "reshaping for site AU-Whr"
## [1] "reshaping for site CH-Oe1"
## [1] "reshaping for site CN-Qia"
## [1] "reshaping for site DE-Kli"
## [1] "reshaping for site DE-Tha"
## [1] "reshaping for site DK-NuF"
## [1] "reshaping for site DK-Sor"
## [1] "reshaping for site FI-Hyy"
## [1] "reshaping for site FR-LBr"
## [1] "reshaping for site FR-Pue"
## [1] "reshaping for site IT-Cp2"
## [1] "reshaping for site IT-Cpz"
## [1] "reshaping for site IT-Noe"
## [1] "reshaping for site IT-PT1"
## [1] "reshaping for site IT-Ro1"
## [1] "reshaping for site IT-SR2"
## [1] "reshaping for site IT-SRo"
## [1] "reshaping for site RU-Fyo"
## [1] "reshaping for site SD-Dem"
## [1] "reshaping for site SN-Dhr"
## [1] "reshaping for site US-Me2"
## [1] "reshaping for site US-SRG"
## [1] "reshaping for site US-SRM"
## [1] "reshaping for site US-Ton"
## [1] "reshaping for site US-Var"
## [1] "reshaping for site ZM-Mon"
## [1] "... done."
## [1] "saving variables 'df_dday_agg', 'df_dday_8d_agg', and 'df_dday_aggbydday_agg' to file: data/data_aligned_agg.Rdata"

For all GPP models, this shows a clear relationship between the apparent fractional reduction in light use efficiency (fLUE) and the model bias. Here, the ratio of observed-to-modelled is shown along the y-axes, indicating by how much modelled GPP would have to be scaled in order to match observations. The straight red line is the 1:1 line and the fact that points cluster along it indicates that soil moisture effects on LUE are the dominant factor underlying model bias. Colors indicate the density of points. Small boxes with horizontal red lines show the most frequent value from a kernel density within fLUE bins and the vertical extent of the boxes the range of values where the kernel density lies above half its peak. The boxes to the right of each plot represent model bias (median, IQR) during the 20 days before fLUE drought onset.

source("plot_bias_nn.R")

Bias vs. fLUE

The solution: Fitting an empirical soil moisture stress function

As documented above, there is a systematic bias of several remote-sensing based GPP models during droughts and this bias is strongly related, both in magnitude and timing, to the apparent effects of soil moisture on light use efficiency. These apparent effects are quantified by ‘fLUE’, derived with artificial neural networks as described in Stocker et al. (2018). In addition, we found in Stocker et al. (2018) that the sensitivity of LUE to progressive soil dryness is related to the mean aridity at the site. I.e., the dryness-related decline in LUE is particularly strong at the driest sites (mostly deserts, grasslands, savannahs), whereas sites at intermediate aridity (mostly mediterranean) exhibit a smaller maximum reduction in LUE.

The functional form of fLUE (here termed \(\varphi\)) versus soil moisture (\(\theta\)) is suggestive of a quadratic function that attains 1 for soil moisture above a certain threshold \(\theta_0\) and held at 1 for soil moisture values above that (see Fig. 7 in Stocker et al., 2018), and has a y-axis intersect (\(\varphi_0\)) related to mean annual AET/PET, (see Fig. 7 in Stocker et al., 2018). The latter reflects the relationship between soil dryness sensitivity of LUE and mean aridity of the site. Hence, the soil moisture stress function (\(\varphi(\theta)\)) can be described by: \[ \varphi = \begin{cases} \beta(\theta - \theta_0)^2 + 1,& \theta \leq \theta_0\\ 1, & \theta > \theta_0 \end{cases} \] where the “curvature coefficient” \(\beta\) is defined by the maximum LUE reduction (\(\varphi_0\)) at low soil moisture (\(\theta\)=0), leading to \(\beta=(\varphi_0-1)/\theta_0^2\). \(\varphi_0\) is a linear function of the mean annual AET/PET, termed \(\alpha\): \[ \varphi_0 = a + b \alpha \]

Here, I’m using these relationships to fit an empirical function that represents direct soil moisture effects on LUE.

I tested several approaches to fit parameters \(a\) and \(b\). Best results were obtained by prescribing \(\theta_0\) to 0.9 based on visual inspection of the functional form \(\varphi(\theta)\) in Fig. 7 of Stocker et al. (2018). After countless tests of alternative approaches to define parameters \(a\) and \(b\), I show below three approaches that lead to soil moisture stress function that bracket fLUE values derived at different sites. Below is a description of the three approaches and their effect for resolving the bias of the P-model.

Approach I

Based on fLUE in lowest soil moisture bin and its relationship to mean AET/PET.

This uses the relationship between minimum fLUE and mean AET/PET, identified in Stocker et al. (2018). Parameters \(a\) and \(b\) are calculated from a linear regression between each site’s \(\varphi_0\) and \(\alpha\) (mean annual AET/PET, calculated from daily values of simulations with the SPLASH model). \(\varphi_0\) is derived for each site as the mean fLUE value of data in the lowest soil moisture quartile. This defines parameters \(a\) and \(b\), and hence \(\beta\) in above equations. This is identical to the linear regression of fLUE\(_0\) to soil moisture shown in Stocker et al. (2018) Fig. 9.

In summary, this method derives the following fitting parameters:

print( coef(linearfit1$linmod) )

## (Intercept)   meanalpha 
##   0.2617121   0.5668587

df_coef <- tibble( approach="I", apar=coef(linearfit1$linmod)["(Intercept)"], bpar=coef(linearfit1$linmod)["meanalpha"], apar_grass=NA, bpar_grass=NA )

Given the estimate of \(\varphi_0\) as a function of \(\alpha\) at each site, we can calculate the soil moisture stress function using daily varying soil moisture data and evaluate whether accounting for this resolves the bias of modelled GPP at low soil moisture. Here, I’m assessing this with GPP predictions from the P-model.

Above boxplot shows that the bias in predicted GPP is reduced but not resolved. Apparently, the predicted LUE reduction at low soil moisture is generally not strong enough.

Here, the boxplot for fLUE bins suggests that the bias is stronger than when looking at the boxplot for soil moisture bins.

Approach II

Based on fitted relationship between fLUE and soil moisture, taking its y-axis intersect and its relationship to mean AET/PET.

In contrast to the method above, I’m fitting parameters \(a\) and \(b\) not to the maximum fLUE reduction, but to the bias in modelled versus observed GPP (for the P-model), i.e. to the ratio of observed over modelled GPP as a function of soil moisture. First, a quadratic function of the form described above (Eq. XXX) is fitted to the ratio of observed versus modelled GPP for each site individually. This yields a site-specific relationship of model bias with soil moisture and defines the maximum LUE reduction (or LUE correction) at low soil moisture (\(\varphi_0\)) for each site. Second, \(\varphi_0\) is regressed against \(\alpha\) as above.

Again, we have a linear relationship between the maximum LUE reduction (y-axis intersect) and mean site aridity (AET/PET).

par(las=1)
with(linearfit2$data, plot( meanalpha, y0, pch=16 ) )
abline( linearfit2$linmod )
mtext( line=-1.5, bquote( italic(R)^2 == .(format( summary( linearfit2$linmod )$r.squared, digits = 2) ) ),  adj=0.1, cex=1 )
cf <- coef(linearfit2$linmod) %>% round( 2 )
eq <- paste0( "y = ", cf[1], ifelse(sign(cf[2])==1, " + ", " - "), abs(cf[2]), " x " )
mtext( line=-2.5, eq, adj=0.1 )

This approach (II) effectively resolves the bias in P-model GPP predictions even at very low soil moisture, as shown by the boxplots below (logarithm of observed over modelled in soil moisture bins).

In summary, this method derives the following fitting parameters:

print( coef(linearfit2$linmod) )

## (Intercept)   meanalpha 
##  0.06066509  0.50900847

This is the R code that implements this soil moisture stress function:

stress_quad_1sided_alpha <- function( x, alpha, x0, apar, bpar ){
    y0 <- apar + bpar * alpha
    beta <- (1 - y0) / x0^2
  outstress <- 1.0 - beta * ( x - x0 ) ^ 2
  outstress <- ifelse( x>x0, 1, ifelse( x<0, 0, outstress ) )
  outstress <- ifelse( outstress>1, 1, ifelse( outstress<0, 0, outstress ) )
  return( outstress )
}

Here, x0 is set to 0.9. alpha is \(\alpha\) (AET/PET, mean over daily values per site), and x is soil moisture.

Approach III

As II, but for notorious sites.

This approach (II) effectively resolves the bias in P-model GPP predictions even at very low soil moisture, as shown by the boxplots below (logarithm of observed over modelled in soil moisture bins).

Approach IV

Separate fit for grasslands and others, otherwise identical to approach II

Again, we have a linear relationship between the maximum LUE reduction (y-axis intersect) and mean site aridity (AET/PET).

par(las=1)
with( filter( linearfit6$data, classid %in% c("GRA", "CSH")), plot( meanalpha, y0, pch=16 , ylim=c(0,1), xlim=c(0,1)) )
with( filter( linearfit6$data, !(classid %in% c("GRA", "CSH"))), points( meanalpha, y0, pch=1 ) )
abline( linearfit6$linmod_grass )
abline( linearfit6$linmod_tree, lty=2 )
mtext( line=-1.5, bquote( italic(R)^2 == .(format( summary( linearfit6$linmod_grass )$r.squared, digits = 2) ) ),  adj=0.1, cex=1 )
cf <- coef(linearfit6$linmod_grass) %>% round( 2 )
eq <- paste0( "y = ", cf[1], ifelse(sign(cf[2])==1, " + ", " - "), abs(cf[2]), " x " )
mtext( line=-2.5, eq, adj=0.1 )

Linearfit V: Exponential stress function

What sort of relationships do we see between y0 and aridity or vegetation type?

Again, we have a linear relationship between the maximum LUE reduction (y-axis intersect) and mean site aridity (AET/PET).

par(las=1)
with(filter(linearfit5$data, classid!="GRA"), plot( meanalpha, y0, pch=1, ylim=c(0,1) ) )
with(filter(linearfit5$data, classid=="GRA"), points( meanalpha, y0, pch=16 ) )
abline( linearfit5$linmod_y0_tree, lty=2 )
abline( linearfit5$linmod_y0_grass )
mtext( line=-1.5, bquote( italic(R)^2 == .(format( summary( linearfit5$linmod_y0_tree )$r.squared, digits = 2) ) ),  adj=0.1, cex=1 )
cf <- coef(linearfit5$linmod_y0_tree) %>% round( 2 )
eq <- paste0( "y = ", cf[1], ifelse(sign(cf[2])==1, " + ", " - "), abs(cf[2]), " x " )
mtext( line=-2.5, eq, adj=0.1 )

And (a somewhat weaker) relationship between the mean aridity and the curvature parameter.

par(las=1)
# with( linearfit5$data, plot( y0, curve, pch=16 ) )
with( linearfit5$data, plot( meanalpha, curve, pch=16 ) )
abline( linearfit5$linmod_curve )
mtext( line=-1.5, bquote( italic(R)^2 == .(format( summary( linearfit5$linmod_curve )$r.squared, digits = 2) ) ),  adj=0.1, cex=1 )
cf <- coef(linearfit5$linmod_curve) %>% round( 2 )
eq <- paste0( "y = ", cf[1], ifelse(sign(cf[2])==1, " + ", " - "), abs(cf[2]), " x " )
mtext( line=-2.5, eq, adj=0.1 )

Let’s see how this works…

Overview

Overview of parameters determining y0 as a function of site aridity.
approach	apar	bpar	apar_grass	bpar_grass
I	0.2617121	0.5668587	NA	NA
II	0.0606651	0.5090085	NA	NA
III	0.0515108	0.1920844	NA	NA
IV	0.1785247	0.4501654	0.1007749	0.0063069
V	0.2067347	0.5543440	0.0056745	0.4116154

Site-by-site evaluations

Site-by-site evaluations of the three different soil moisture stress functions are shown below.

Stress vs. soil moisture

source("plot_fit_vs_soilmoist.R")
plot_fit_vs_soilmoist( linearfit1, linearfit6, linearfit3, ddf, nice_agg, makepdf = FALSE )

## Loading required package: lubridate

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

Stress and fLUE vs. time

source("plot_fit_fvar_vs_time.R")
load("data/nice_nn_agg_lue_obs_evi.Rdata")
plot_fit_fvar_vs_time( linearfit1, linearfit6, linearfit3, ddf, nice_agg, makepdf = TRUE )

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## [1] "plotting fLUE and fLUEest vs. time for each site into file  fig/flue_est_per_site.pdf ..."

## quartz_off_screen 
##                 2

GPP, P-model original and corrected

source("plot_fit_gpp_vs_time.R")
plot_fit_gpp_vs_time( linearfit1, linearfit6, linearfit3, ddf, nice_agg, makepdf=TRUE )

## [1] "plotting GPPobs and (GPP_Pmodel * fLUEest) vs. time for each site into file  fig/gpp_per_site.pdf ..."

## quartz_off_screen 
##                 2

Universal stress function?

Bias, by models

We’ve seen that all RS-models exhibit a bias towards overestimating GPP at low soil moisture. The regression of this bias against fLUE reveals a consistent relationship that holds for all RS-models.

## Load aligned aggregated data
load( "data/data_aligned_agg.Rdata" )

## IMPORTANT: REMOVE DUPLICATE ROWS (were introduced because one date can below to multiple drought instances (within their before-after window) )
df_dday_agg    <- df_dday_agg    %>% select( -dday, -inst ) %>% unique()
df_dday_8d_agg <- df_dday_8d_agg %>% select( -dday, -inst ) %>% unique()


df_dday_8d_agg <- filter( df_dday_8d_agg, mysitename!="US-Var")

##------------------------------------------------
## bin data
##------------------------------------------------
## define fLUE bins
nbins <- 5
max <- 1.0
binwidth <- max/nbins
bins <- seq( from=0, to=max, by=binwidth )
xvals <- bins[1:nbins]+binwidth/2

df_dday_8d_agg <- df_dday_8d_agg %>% mutate( inbin = cut( fvar, breaks = bins ) )

##------------------------------------------------
## Quick plots I:
##------------------------------------------------
## Bias for each model separately
## pmodel
par(las=1)
boxplot( bias_pmodel_diff ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "P-model", line=0.5, adj = 0, font=2 )

## MODIS
boxplot( bias_modis_diff ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "MODIS", line=0.5, adj = 0, font=2 )

## vpm
boxplot( bias_vpm_diff ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "VPM", line=0.5, adj = 0, font=2 )

## bess_v1
boxplot( bias_bess_v1_diff ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "BESS", line=0.5, adj = 0, font=2 )

Normalised bias, by models

All RS-models (except the P-model) underestimate GPP when fLUE is high. To remove this bias, data is normalised for each model separately to the ratio of observed over modelled within the highest fLUE bin.

##------------------------------------------------
## normalise ratios 
## to 1 for bin fvar = 1.0-1.1
##------------------------------------------------
df_binmedians <- df_dday_8d_agg %>%   group_by( inbin ) %>% 
                                      summarise( median_modis   = median( bias_modis,   na.rm = TRUE ),
                                                 median_vpm     = median( bias_vpm,     na.rm = TRUE ),
                                                 median_bess_v1 = median( bias_bess_v1, na.rm = TRUE ),
                                                 median_pmodel  = median( bias_pmodel,  na.rm = TRUE )
                                                )

df_dday_8d_agg <- df_dday_8d_agg %>%  mutate(  gpp_modis   = gpp_modis     / df_binmedians$median_modis[nbins],
                                               gpp_vpm     = gpp_vpm      / df_binmedians$median_vpm[nbins], 
                                               gpp_bess_v1 = gpp_bess_v1 / df_binmedians$median_bess_v1[nbins], 
                                               gpp_pmodel  = gpp_pmodel / df_binmedians$median_pmodel[nbins]  
                                               ) %>%
                                      mutate( bias_pmodel_diff  = gpp_pmodel  - gpp_obs,
                                              bias_modis_diff   = gpp_modis   - gpp_obs,
                                              bias_vpm_diff     = gpp_vpm     - gpp_obs,
                                              bias_bess_v1_diff = gpp_bess_v1 - gpp_obs
                                             )

##------------------------------------------------
## Quick plots II:
##------------------------------------------------
## Bias for each model separately after "normalisation"
## pmodel
boxplot( bias_pmodel_diff ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "P-model, normalised", line=0.5, adj = 0, font=2 )

## MODIS
boxplot( bias_modis_diff ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "MODIS, normalised", line=0.5, adj = 0, font=2 )

## vpm
boxplot( bias_vpm_diff ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "VPM, normalised", line=0.5, adj = 0, font=2 )

## bess_v1
boxplot( bias_bess_v1_diff ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "BESS, normalised", line=0.5, adj = 0, font=2 )

Corrected, by models and stress function

Now, we apply the empirical soil moisture stress functions and check whether the systematic biases are resolved for each RS-model. First, using approach II:

##------------------------------------------------
## correct ratio with estimated fLUE
##------------------------------------------------
library(tibble)
df_dday_8d_agg <- df_dday_8d_agg %>% as_tibble() %>%
                                     mutate(  
                                              bias_modis_diff_corr   = gpp_modis   * fvar - gpp_obs,
                                              bias_vpm_diff_corr     = gpp_vpm     * fvar - gpp_obs, 
                                              bias_bess_v1_diff_corr = gpp_bess_v1 * fvar - gpp_obs, 
                                              bias_pmodel_diff_corr  = gpp_pmodel  * fvar - gpp_obs,

                                              bias_modis_diff_corr_I   = gpp_modis   * flue_est_I - gpp_obs,
                                              bias_vpm_diff_corr_I     = gpp_vpm     * flue_est_I - gpp_obs, 
                                              bias_bess_v1_diff_corr_I = gpp_bess_v1 * flue_est_I - gpp_obs, 
                                              bias_pmodel_diff_corr_I  = gpp_pmodel  * flue_est_I - gpp_obs,

                                              bias_modis_diff_corr_II   = gpp_modis   * flue_est_II - gpp_obs,
                                              bias_vpm_diff_corr_II     = gpp_vpm     * flue_est_II - gpp_obs, 
                                              bias_bess_v1_diff_corr_II = gpp_bess_v1 * flue_est_II - gpp_obs, 
                                              bias_pmodel_diff_corr_II  = gpp_pmodel  * flue_est_II - gpp_obs,

                                              ## flue_est_IV is based on separate grasslands and others
                                              bias_modis_diff_corr_IV   = gpp_modis   * flue_est_IV - gpp_obs,
                                              bias_vpm_diff_corr_IV     = gpp_vpm     * flue_est_IV - gpp_obs, 
                                              bias_bess_v1_diff_corr_IV = gpp_bess_v1 * flue_est_IV - gpp_obs, 
                                              bias_pmodel_diff_corr_IV  = gpp_pmodel  * flue_est_IV - gpp_obs,

                                              bias_modis_diff_corr_III   = gpp_modis   * flue_est_III - gpp_obs,
                                              bias_vpm_diff_corr_III     = gpp_vpm     * flue_est_III - gpp_obs, 
                                              bias_bess_v1_diff_corr_III = gpp_bess_v1 * flue_est_III - gpp_obs, 
                                              bias_pmodel_diff_corr_III  = gpp_pmodel  * flue_est_III - gpp_obs,

                                              ## flue_est_V is based on exponential stress function
                                              bias_modis_diff_corr_V   = gpp_modis   * flue_est_V - gpp_obs,
                                              bias_vpm_diff_corr_V     = gpp_vpm     * flue_est_V - gpp_obs, 
                                              bias_bess_v1_diff_corr_V = gpp_bess_v1 * flue_est_V - gpp_obs, 
                                              bias_pmodel_diff_corr_V  = gpp_pmodel  * flue_est_V - gpp_obs
                                            )

##------------------------------------------------
## Quick plots III:
##------------------------------------------------
## Bias for each model separately after "normalisation"

## Corrected by fLUE

## pmodel
boxplot( bias_pmodel_diff_corr ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "P-model, norm., corr. fLUE", line=0.5, adj = 0, font=2 )

# library(LSD)
# with( df_dday_8d_agg, heatscatter( fvar, bias_pmodel_diff, ylim = c(-6,6), xlim = c(0,1.2) ) )
# abline( h=0, lty=3 )

# with( df_dday_8d_agg, heatscatter( fvar, bias_pmodel_diff_corr, ylim = c(-6,6), xlim = c(0,1.2) ) )
# abline( h=0, lty=3 )

# with( df_dday_8d_agg, heatscatter( fvar, bias_pmodel_diff_corr_III, ylim = c(-6,6), xlim = c(0,1.2) ) )
# abline( h=0, lty=3 )

## MODIS
boxplot( bias_modis_diff_corr ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "MODIS, norm., corr. fLUE", line=0.5, adj = 0, font=2 )

## vpm
boxplot( bias_vpm_diff_corr ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "VPM, norm., corr. fLUE", line=0.5, adj = 0, font=2 )

## bess_v1
boxplot( bias_bess_v1_diff_corr ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "BESS, norm., corr. fLUE", line=0.5, adj = 0, font=2 )

## Corrected by estimated fLUE (following approach II)

## pmodel
boxplot( bias_pmodel_diff_corr_II ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "P-model, norm., corr. II", line=0.5, adj = 0, font=2 )

## MODIS
boxplot( bias_modis_diff_corr_II ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "MODIS, norm., corr. II", line=0.5, adj = 0, font=2 )

## vpm
boxplot( bias_vpm_diff_corr_II ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "VPM, norm., corr. II", line=0.5, adj = 0, font=2 )

## bess_v1
boxplot( bias_bess_v1_diff_corr_II ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "BESS, norm., corr. II", line=0.5, adj = 0, font=2 )

## Corrected by estimated fLUE (following approach IV)

## pmodel
boxplot( bias_pmodel_diff_corr_IV ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "P-model, norm., corr. IV", line=0.5, adj = 0, font=2 )

## MODIS
boxplot( bias_modis_diff_corr_IV ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "MODIS, norm., corr. IV", line=0.5, adj = 0, font=2 )

## vpm
boxplot( bias_vpm_diff_corr_IV ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "VPM, norm., corr. IV", line=0.5, adj = 0, font=2 )

## bess_v1
boxplot( bias_bess_v1_diff_corr_IV ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "BESS, norm., corr. IV", line=0.5, adj = 0, font=2 )

## Corrected by estimated fLUE (following approach V)

## pmodel
boxplot( bias_pmodel_diff_corr_V ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "P-model, norm., corr. V", line=0.5, adj = 0, font=2 )

## MODIS
boxplot( bias_modis_diff_corr_V ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "MODIS, norm., corr. V", line=0.5, adj = 0, font=2 )

## vpm
boxplot( bias_vpm_diff_corr_V ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "VPM, norm., corr. V", line=0.5, adj = 0, font=2 )

## bess_v1
boxplot( bias_bess_v1_diff_corr_V ~ inbin, data=df_dday_8d_agg, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "BESS, norm., corr. V", line=0.5, adj = 0, font=2 )

Corrected, by models, pooled stress function

##------------------------------------------------
## Treat data from different stress functions as one - gather data
##------------------------------------------------
pmodel <- df_dday_8d_agg %>%  select( mysitename, date, inbin, bias_pmodel_diff_corr_I, bias_pmodel_diff_corr_IV, bias_pmodel_diff_corr_III ) %>%
                              gather( fct, pmodel, bias_pmodel_diff_corr_I, bias_pmodel_diff_corr_IV, bias_pmodel_diff_corr_III )

modis <- df_dday_8d_agg %>%  select( mysitename, date, inbin, bias_modis_diff_corr_I, bias_modis_diff_corr_IV, bias_modis_diff_corr_III ) %>%
                             gather( fct, modis, bias_modis_diff_corr_I, bias_modis_diff_corr_IV, bias_modis_diff_corr_III )

vpm <- df_dday_8d_agg %>%  select( mysitename, date, inbin, bias_vpm_diff_corr_I, bias_vpm_diff_corr_IV, bias_vpm_diff_corr_III ) %>%
                           gather( fct, vpm, bias_vpm_diff_corr_I, bias_vpm_diff_corr_IV, bias_vpm_diff_corr_III )

bess_v1 <- df_dday_8d_agg %>%  select( mysitename, date, inbin, bias_bess_v1_diff_corr_I, bias_bess_v1_diff_corr_IV, bias_bess_v1_diff_corr_III ) %>%
                               gather( fct, bess_v1, bias_bess_v1_diff_corr_I, bias_bess_v1_diff_corr_IV, bias_bess_v1_diff_corr_III )

## Corrected by estimated fLUE (following approach II)

## pmodel
boxplot( pmodel ~ inbin, data=pmodel, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "P-model, norm., corr. pooled", line=0.5, adj = 0, font=2 )

## MODIS
boxplot( modis ~ inbin, data=modis, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "MODIS, norm., corr. pooled", line=0.5, adj = 0, font=2 )

## vpm
boxplot( vpm ~ inbin, data=vpm, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "VPM, norm., corr. pooled", line=0.5, adj = 0, font=2 )

## bess_v1
boxplot( bess_v1 ~ inbin, data=bess_v1, xlab = "fLUE bin", ylab = "bias (mod.-obs.)", outline=FALSE, ylim=c(-6,6) )
abline( h=0, lty=3 )
mtext( "BESS, norm., corr. pooled", line=0.5, adj = 0, font=2 )

Corrected, by stress function, pooled models

Now, let’s pool data (ratio observed over modelled) from all RS-models and check how the different approaches (I-III) perform in resolving the systematic bias.

##------------------------------------------------
## Treat all different model's data as one - gather data
##------------------------------------------------
tmp  <- df_dday_8d_agg %>%  select( mysitename, date, fvar, inbin, bias_modis_diff, bias_vpm_diff, bias_bess_v1_diff, bias_pmodel_diff) %>% 
                            gather( model, bias_diff, bias_modis_diff, bias_vpm_diff, bias_bess_v1_diff, bias_pmodel_diff )

tmp0 <- df_dday_8d_agg %>%  select( mysitename, date, fvar, inbin, bias_modis_diff_corr, bias_vpm_diff_corr, bias_bess_v1_diff_corr, bias_pmodel_diff_corr ) %>% 
                            gather( model, bias_diff_corr, bias_modis_diff_corr, bias_vpm_diff_corr, bias_bess_v1_diff_corr, bias_pmodel_diff_corr )

tmp1 <- df_dday_8d_agg %>%  select( mysitename, date, fvar, inbin, bias_modis_diff_corr_I, bias_vpm_diff_corr_I, bias_bess_v1_diff_corr_I, bias_pmodel_diff_corr_I ) %>% 
                            gather( model, bias_diff_corr_I, bias_modis_diff_corr_I, bias_vpm_diff_corr_I, bias_bess_v1_diff_corr_I, bias_pmodel_diff_corr_I )

tmp2 <- df_dday_8d_agg %>%  select( mysitename, date, fvar, inbin, bias_modis_diff_corr_IV, bias_vpm_diff_corr_IV, bias_bess_v1_diff_corr_IV, bias_pmodel_diff_corr_IV ) %>% 
                            gather( model, bias_diff_corr_II, bias_modis_diff_corr_IV, bias_vpm_diff_corr_IV, bias_bess_v1_diff_corr_IV, bias_pmodel_diff_corr_IV )

tmp3 <- df_dday_8d_agg %>%  select( mysitename, date, fvar, inbin, bias_modis_diff_corr_III, bias_vpm_diff_corr_III, bias_bess_v1_diff_corr_III, bias_pmodel_diff_corr_III ) %>% 
                            gather( model, bias_diff_corr_III, bias_modis_diff_corr_III, bias_vpm_diff_corr_III, bias_bess_v1_diff_corr_III, bias_pmodel_diff_corr_III )

tmp4 <- df_dday_8d_agg %>%  select( mysitename, date, fvar, inbin, bias_modis_diff_corr_IV, bias_vpm_diff_corr_IV, bias_bess_v1_diff_corr_IV, bias_pmodel_diff_corr_IV ) %>% 
                            gather( model, bias_diff_corr_IV, bias_modis_diff_corr_IV, bias_vpm_diff_corr_IV, bias_bess_v1_diff_corr_IV, bias_pmodel_diff_corr_IV )

tmp5 <- df_dday_8d_agg %>%  select( mysitename, date, fvar, inbin, bias_modis_diff_corr_V, bias_vpm_diff_corr_V, bias_bess_v1_diff_corr_V, bias_pmodel_diff_corr_V ) %>% 
                            gather( model, bias_diff_corr_V, bias_modis_diff_corr_V, bias_vpm_diff_corr_V, bias_bess_v1_diff_corr_V, bias_pmodel_diff_corr_V )

##------------------------------------------------
## Quick plots IV:
##------------------------------------------------
par(las=1)
boxplot( bias_diff ~ inbin, data = tmp, outline=FALSE, ylim=c(-7,7), xlab = "fLUE bin", ylab = "ratio obs./mod." )
abline( h=0, lty=3 )
mtext( "Pooled models, normalised", line=0.5, adj = 0, font=2 )

boxplot( bias_diff_corr ~ inbin, data = tmp0, outline=FALSE, ylim=c(-7,7), xlab = "fLUE bin", ylab = "ratio obs./mod." )
abline( h=0, lty=3 )
mtext( "Pooled models, norm., corr. fLUE", line=0.5, adj = 0, font=2 )

boxplot( bias_diff_corr_I ~ inbin, data = tmp1, outline=FALSE, ylim=c(-7,7), xlab = "fLUE bin", ylab = "ratio obs./mod." )
abline( h=0, lty=3 )
mtext( "Pooled models, norm., corr. I", line=0.5, adj = 0, font=2 )

boxplot( bias_diff_corr_II ~ inbin, data = tmp2, outline=FALSE, ylim=c(-7,7), xlab = "fLUE bin", ylab = "ratio obs./mod." )
abline( h=0, lty=3 )
mtext( "Pooled models, norm., corr. II", line=0.5, adj = 0, font=2 )

boxplot( bias_diff_corr_III ~ inbin, data = tmp3, outline=FALSE, ylim=c(-7,7), xlab = "fLUE bin", ylab = "ratio obs./mod." )
abline( h=0, lty=3 )
mtext( "Pooled models, norm., corr. III", line=0.5, adj = 0, font=2 )

boxplot( bias_diff_corr_IV ~ inbin, data = tmp4, outline=FALSE, ylim=c(-7,7), xlab = "fLUE bin", ylab = "ratio obs./mod." )
abline( h=0, lty=3 )
mtext( "Pooled models, norm., corr. IV", line=0.5, adj = 0, font=2 )

boxplot( bias_diff_corr_V ~ inbin, data = tmp5, outline=FALSE, ylim=c(-7,7), xlab = "fLUE bin", ylab = "ratio obs./mod." )
abline( h=0, lty=3 )
mtext( "Pooled models, norm., corr. V", line=0.5, adj = 0, font=2 )

Corrected, pooled models and stress function

# pmodel <- df_dday_8d_agg %>%  select( mysitename, date, inbin, bias_pmodel_diff_corr_I, bias_pmodel_diff_corr_IV, bias_pmodel_diff_corr_III ) %>%
#                               gather( fct, pmodel, bias_pmodel_diff_corr_I, bias_pmodel_diff_corr_IV, bias_pmodel_diff_corr_III )

# modis <- df_dday_8d_agg %>%  select( mysitename, date, inbin, bias_modis_diff_corr_I, bias_modis_diff_corr_IV, bias_modis_diff_corr_III ) %>%
#                              gather( fct, modis, bias_modis_diff_corr_I, bias_modis_diff_corr_IV, bias_modis_diff_corr_III )

# vpm <- df_dday_8d_agg %>%  select( mysitename, date, inbin, bias_vpm_diff_corr_I, bias_vpm_diff_corr_IV, bias_vpm_diff_corr_III ) %>%
#                            gather( fct, vpm, bias_vpm_diff_corr_I, bias_vpm_diff_corr_IV, bias_vpm_diff_corr_III )

# bess_v1 <- df_dday_8d_agg %>%  select( mysitename, date, inbin, bias_bess_v1_diff_corr_I, bias_bess_v1_diff_corr_IV, bias_bess_v1_diff_corr_III ) %>%
#                                gather( fct, bess_v1, bias_bess_v1_diff_corr_I, bias_bess_v1_diff_corr_IV, bias_bess_v1_diff_corr_III )

pooled <- select( pmodel, -fct, bias=pmodel ) %>% 
          bind_rows( select( modis, -fct, bias=modis) ) %>% 
          bind_rows( select( vpm, -fct, bias=vpm) ) %>% 
          bind_rows( select( bess_v1, -fct, bias=bess_v1) )

##------------------------------------------------
## Quick plots V:
##------------------------------------------------
par(las=1)
boxplot( bias_diff ~ inbin, data = tmp, outline=FALSE, ylim=c(-7,7), xlab = "fLUE bin", ylab = "ratio obs./mod." )
abline( h=0, lty=3 )
mtext( "Pooled models, normalised", line=0.5, adj = 0, font=2 )

boxplot( bias_diff_corr ~ inbin, data = tmp0, outline=FALSE, ylim=c(-7,7), xlab = "fLUE bin", ylab = "ratio obs./mod." )
abline( h=0, lty=3 )
mtext( "Pooled models, norm., corr. fLUE", line=0.5, adj = 0, font=2 )

boxplot( bias ~ inbin, data = pooled, outline=FALSE, ylim=c(-7,7), xlab = "fLUE bin", ylab = "ratio obs./mod." )
abline( h=0, lty=3 )
mtext( "Pooled models, norm., pooled corr.", line=0.5, adj = 0, font=2 )

Additional analyses

Bias versus soil moisture, normalised to bias in the wettest soil moisture bin

–> –>

–> –> –> –>

–>

–> –> –>

–>

–> –>

–>

–> –>

–> –> –> –>

–>

–> –> –>

–>

SOILM_GLOBAL

Beni Stocker

2018-04-06

The problem: high bias in simulated GPP under dry conditions

Pooled data from all available sites

Pooled data from soil moisture focus sites

The solution: Fitting an empirical soil moisture stress function

Approach I

Approach II

Approach III

Approach IV

Linearfit V: Exponential stress function

Overview

Site-by-site evaluations

Stress vs. soil moisture

Stress and fLUE vs. time

GPP, P-model original and corrected

Universal stress function?

Bias, by models

Normalised bias, by models

Corrected, by models and stress function

Corrected, by models, pooled stress function

Corrected, by stress function, pooled models

Corrected, pooled models and stress function

Additional analyses