Article acceptance date: 10 February 2018
This R-Markdown document provides step-by-step instructions for executing the analysis and producing figures presented in the paper Quantifying soil moisture impacts on light use efficiency across biomes by Stocker et al. (2018). The starting point of the present collection of scripts is the data file ./data/modobs_fluxnet2015_s11_s12_s13_with_SWC_v3.Rdata
which contains all the data used for the neural network (NN) training and analysis. Further information about data processing and creating the data file is given below (section ‘Data processing’). Using RMarkdown and open access code available through github (https://github.com/stineb/nn_fluxnet2015) this is supposed to allow for full reproducibility of published results - from publicly accessible data files to published figures.
All code in this repository is free software. It may be redistributed and/or modified under the terms of the GNU General Public License as published by the Free Software Foundation, version 3. Present code is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License (file ./LICENSE
) for more details.
Copyright (C) 2017 Benjamin D. Stocker
We quantify the fractional reduction in light use efficiency due to soil moisture, separated from VPD and greenness effects as the ratio of actual versus potential LUE: \[ \mathrm{fLUE} = \mathrm{LUE}_{\mathrm{act}}\; /\; \mathrm{LUE}_{\mathrm{pot}} \] “Potential”" light use efficiency (LUE\(_{\mathrm{pot}}\)) is predicted using artificial neural networks (NN, see below), trained on the empirical relationship between observed LUE (LUE\(_{\mathrm{obs}}\)) and its predictors temperature, VPD, and PAR during days where soil moisture is not limiting (“moist days”). All NN training is done for each site specifically. “Actual” LUE (LUE\(_{\mathrm{act}}\)) is derived from NNs using all data and, in contrast to the NN for LUE\(_{\mathrm{pot}}\), with soil moisture as an additional predictor (see Fig. S1).
Illustration of the method for splitting data for NN training into moist and dry days data.
LUE\(_{\mathrm{obs}}\) is calculated based on daily total observed GPP (GPP\(_{\mathrm{obs}}\)), PAR, and fAPAR, measured at each site. We refer to the alternative NN models used for predicting LUE\(_{\mathrm{act}}\) and LUE\(_{\mathrm{pot}}\) as NN\(_{\mathrm{act}}\) and NN\(_{\mathrm{pot}}\), respectively. \[ \mathrm{LUE}_{\mathrm{obs}} = \frac{\mathrm{GPP}_{\mathrm{FLX}} }{ \mathrm{fAPAR}_{\mathrm{EVI}} \cdot \mathrm{PAR}_{\mathrm{FLX}} } \simeq \begin{cases} \mathrm{NN}_{\mathrm{pot}}( T_{\mathrm{FLX}}, \mathrm{VPD}_{\mathrm{FLX}}, \mathrm{PAR}_{\mathrm{FLX}} ), & ( T_{\mathrm{FLX}}, \mathrm{VPD}_{\mathrm{FLX}}, \mathrm{PAR}_{\mathrm{FLX}} ) \in \mathrm{moist\ days} \\ \mathrm{NN}_{\mathrm{act}}( T_{\mathrm{FLX}}, \mathrm{VPD}_{\mathrm{FLX}}, \mathrm{PAR}_{\mathrm{FLX}}, \theta ), & ( T_{\mathrm{FLX}}, \mathrm{VPD}_{\mathrm{FLX}}, \mathrm{PAR}_{\mathrm{FLX}} ) \in \mathrm{all\ days} \end{cases} \] LUE\(_{\mathrm{obs}}\) is used as the target variable for NN\(_{\mathrm{act}}\) and NN\(_{\mathrm{pot}}\). Predictor (input) variables are temperature (\(T\)), vapour pressure deficit (VPD), and the photon flux density (PAR). Their subscripts refer to the data source with \(\mathrm{FLX}\) referring to the FLUXNET 2015 dataset and \(\mathrm{EVI}\) to MODIS EVI. Further information about data sources and processing is given below (section Data processing).
We limit the NN training to a small number of predictors that are reflective of process understanding regarding the controls on LUE (I. C. Prentice et al. 2014) and to avoid over-fitting. The agreement between potential and actual LUE, using the two NN models’ prediction, should be good during “moist days” (high soil moisture). In contrast, the potential LUE from the “moist days” NN model is expected to overestimate LUE during days of low soil moisture (see Fig. S2 and S3). With the only difference between NN models being soil moisture as an additional predictor, the ratio fLUE thus indicates the separated effect of soil moisture on LUE. fLUE droughts are identified when fLUE falls below a site-specific threshold (see Process sequence, step 4).
Neural networks (1 hidden layer) are trained (R packages nnet
and caret
) using repeated (5 times) five-fold cross-validation where 75% of data for training in each iteration. The learning rate decay rate is set to 0.1 and the number nodes in the hidden layer is sampled from 4 to 20 (step 2). The best-performing NN (by RMSE) is selected, the same procedure is repeated five times, and the mean across repetitions is used for further analyses.
The following code snippets are used by function profile_soilm_neuralnet()
and functions called therein (see below for a full description of the sequence of commands executed to produce results).
preprocessParams <- preProcess( data, method=c("range") )
number=5
) cross-validation, repeated 5 times (argument repeats=5
), for the NN training and selecting optimal number of nodes in the NN-model (only one hidden layer in the NN). In each iteration, 75% of the data is used for training and 25% for testing (argument p=0.75
).traincotrlParams <- trainControl( method="repeatedcv",
number=5, repeats=5, verboseIter=FALSE, p=0.75
)
tune_grid <- expand.grid( .decay = c(0.1), .size = seq(4,20,2) )
nn <- train(
lue_obs_evi ~ ppdf + temp + vpd,
data = data,
method = "nnet",
linout = TRUE,
tuneGrid = tune_grid,
preProc = preprocessParams,
trControl = traincotrlParams,
trace = FALSE
)
This procedure is repeated five times and the mean across individual NN predictions is used for further analysis.
The threshold for splitting training data into “moist” and “dry” days, where “moist” days’ data is used to train NNpot and all data is used to train NNact (see Fig. S1), is determined by optimal performance in the face of the trade-off between the number of data points and including data where low soil moisture affects fluxes. The threshold selection algorithm can be described as follows:
$ratio
) of the “good days” model during bad days is highest. This is implemented in profile_soilm_neuralnet.R
as follows:n_best <- 5
diff_good_bad <- data.frame( soilm_threshold = c(), diff = c() )
for (isoilm_trh in soilm_thrsh_avl){
## add row to statistics data frame
addrow <- data.frame(
soilm_threshold = isoilm_trh,
diff = median( filter( df_bad, soilm_threshold==isoilm_trh )$ratio,
na.rm=TRUE
)
- median( filter( df_good, soilm_threshold==isoilm_trh )$ratio,
na.rm=TRUE
)
)
diff_good_bad <- rbind( diff_good_bad, addrow )
}
list_best_soilm_trh <- diff_good_bad$soilm_threshold[ order(-diff_good_bad$diff)][1:min(n_best,nrow(diff_good_bad))]
This is done for each soil moisture dataset separately. This is implemented by function profile_soilm_neuralnet()
(see Section ‘Profiling soil moisture threshold’).
The following provides a documentation and instruction to reproduce results presented in the paper. The starting point is the Rdata file modobs_fluxnet2015_s11_s12_s13_with_SWC_v3.Rdata
where original data downloaded from different sources is stored in a standardised way. Provided is the documentation of NN training, fLUE calculation, drought event detection, NN quality checking, clustering, and plotting (Steps 1-9).
Install required R packages. The execution of the scripts provided here requires the following R packages. Install them if missing (Sorry if some are missing - It’s hard to keep the overview):
list.of.packages <- c( "dplyr", "tidyr", "broom", "caret", "nnet",
"minpack.lm", "LSD", "zoo", "Metrics", "abind",
"caTools", "cgwtools", "hydroGOF", "cluster",
"lattice", "ggplot2", "stats", "knitr", "SPEI"
)
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if( length(new.packages) ) install.packages(new.packages)
Create sub-directories
Define working directory. Define the path variable used throughout multiple scripts and functions. myhome
is used only by functions where argument testprofile
is FALSE
, i.e. when evaluating all analyses for all sites (batch execution). Change myhome
as an absolute path by hand, defining your local parent directory of the present working directory, i.e. where the the repository has been pulled into:
workingdir <- getwd()
print( paste("working directory directory:", workingdir ) )
## [1] "working directory directory: /Users/benjaminstocker/alphadata01/bstocker/nn_fluxnet2015"
myhome <- "/alphadata01/bstocker/"
Get additional utility functions.
source("utils_fluxnet2015.R")
The data file ./data/modobs_fluxnet2015_s11_s12_s13_with_SWC_v3.Rdata
contains for each site all the data used for the NN training and all subsequent analyses. The scripts used to process original, downloaded data are not provided here but steps are described below.
Download and load the data file:
load( "./data/modobs_fluxnet2015_s11_s12_s13_with_SWC_v3.Rdata" )
The data is organised as a nested list of dataframes. Inside fluxnet$ddf
), monthly (mdf
), and annual data (adf
):
str(fluxnet$`AR-SLu`, max.level = 1)
## List of 6
## $ ddf :List of 7
## $ ddf_stat:List of 3
## $ mdf :List of 4
## $ mdf_stat:List of 3
## $ adf :List of 4
## $ adf_stat:List of 3
Daily data (ddf
) is used here. This contains the following set of data frames:
str(fluxnet$`AR-SLu`$ddf, max.level = 1)
## List of 7
## $ s11 :'data.frame': 1095 obs. of 10 variables:
## $ s12 :'data.frame': 1095 obs. of 10 variables:
## $ s13 :'data.frame': 1095 obs. of 10 variables:
## $ obs :'data.frame': 1095 obs. of 11 variables:
## $ inp :'data.frame': 1095 obs. of 10 variables:
## $ swc_obs :'data.frame': 1095 obs. of 5 variables:
## $ swc_by_etobs:'data.frame': 1095 obs. of 7 variables:
s11
: contains P-model and SPLASH outputs of simulation s11 (water holding capacity is 220 mm).s12
: contains P-model and SWBM outputs of simulation s12.s13
: contains P-model and SWBM outputs of simulation s13 (water holding capacity is 150 mm).obs
: contains GPP and ET data from FLUXNET 2015 dataset.inp
: climate and greenness (input) data from the FLUXNET 2015 dataset and MODIS EVI and FPAR.swc_obs
: observational soil moisture data from the FLUXNET 2015 dataset.swc_by_etobs
: soil moisture data driven by observational ET.Soil moisture data is based direct measurements, provided through the FLUXNET 2015 dataset, and five alternative bucket-type models. Measured soil moisture is provided in units of volume water per volume soil. We scaled data by range to within 0 and 1. Due to limited observational soil moisture data availability and mostly unavailable soil moisture data for deep soil layers, we also used simulated soil moisture, provided by alternative, bucket-type soil water balance models.
Soil moisture data availability
soilm_splash150
, and soilm_splash220
.soilm_swbm
.LE_F_MDS
) is cleaned first if more than 80% of the underlying half-hourly data is gap-filled, then gap-filled using neural networks (temperature, PAR, VPD, and ET simulated by the SPLASH model as predictors, using R package nnet
and caret
, single hidden layer, 20 nodes, 10-fold cross-validated). The water-holding capacity of the ET-driven buckets is set to 220 mm. Two bucket versions are used. One where no runoff is generated before the soil water holding capacity is reached (soilm_etobs
), and one where runoff is generated before as in the SWBM model (see above, termed soilm_etobs_ob
).All models are driven by observed precipitation, measured at the FLUXNET sites.
The following creates a plot showing alternative soil moisture time series (observations and models) for one site (‘US-Var’).
if (!file.exists("fig/soilm")) system("mkdir -p fig/soilm")
source("plot_soilm_fluxnet.R")
plot_soilm_fluxnet( "US-Var", fluxnet[[ "US-Var" ]]$ddf, makepdf=FALSE )
To produce respective plots for all sites (PDFs stored in ./fig/soilm/
), do:
system("mkdir -p ./fig/soilm/")
for (sitename in ls(fluxnet)){
plot_soilm_fluxnet( sitename, fluxnet[[ sitename ]]$ddf, makepdf=TRUE )
}
Daily data are used from the FLUXNET 2015 Tier 1 dataset, downloaded on 13. November, 2016. We use GPP based on the Nighttime Partitioning, and the Variable U-Star Threshold method, named GPP_NT_VUT_REF
. In the FLUXNET 2015 dataset, daily values are sums over half-hourly data. We use only daily values where less than 50% of respective half-hourly data is gap-filled. We further removed data points where the daytime and nighttime methods (GPP_DT_VUT_REF
and GPP_NT_VUT_REF
, resp.) are inconsistent. I.e., the upper and lower 2.5% quantile of the difference between each method’s GPP quantification. Finally, we removed all negative daily GPP values.
This is implemented by the following function:
clean_fluxnet_gpp <- function( gpp_nt, gpp_dt, qflag_nt, qflag_dt, cutoff=0.80 ){
##--------------------------------------------------------------------
## Cleans daily data.
## gpp_nt: based on nighttime flux decomposition ("NT")
## gpp_dt: based on daytime flux decomposition ("DT")
##--------------------------------------------------------------------
## Remove data points that are based on too much gap-filled data in the underlying half-hourly data
gpp_nt[ which(qflag_nt < cutoff) ] <- NA ## based on fraction of data based on gap-filled half-hourly
gpp_dt[ which(qflag_dt < cutoff) ] <- NA ## based on fraction of data based on gap-filled half-hourly
## Remove data points where the two flux decompositions are inconsistent,
## i.e. where the residual of their regression is above the 97.5% or below the 2.5% quantile.
res <- gpp_nt - gpp_dt
q025 <- quantile( res, probs = 0.025, na.rm=TRUE )
q975 <- quantile( res, probs = 0.975, na.rm=TRUE )
gpp_nt[ res > q975 | res < q025 ] <- NA
gpp_dt[ res > q975 | res < q025 ] <- NA
## remove negative GPP
gpp_nt[ which(gpp_nt<0) ] <- NA
gpp_dt[ which(gpp_dt<0) ] <- NA
return( list( gpp_nt=gpp_nt, gpp_dt=gpp_dt ) )
}
We use MODIS EVI (MOD13Q1, 16 days, 250 m) and MODIS FPAR (MOD15A2, 8 days, 1 km) to quantify fAPAR. Due to its higher spatial resolution, smaller scatter and smaller tendency to saturate at high values (see Fig. S1), EVI is generally preferred and all results shown in the paper are based on analyses with EVI data. Results based on FPAR and NDVI data is shown in the SI. Data was downloaded for 81 pixels surrounding the flux tower location (coordinates from FLUXNET 2015) using the MODISTools
R package. The center pixel’s information was used unless quality flags indicated cloudiness or missing data. Data was interpolated to daily values using a Savitzky-Golay smoothing filter (signal
R package) of order 3 and length 31 days. This generally maintains the full seasonal amplitude but does not fully remove noise. We used the mean seasonal cycle to extend the EVI time series back to years before MODIS EVI data is available (before 2000).
The function profile_soilm_neuralnet()
samples soil moisture thresholds (0.05-0.60 in steps of 0.05) for separating NN training data into “moist days” and “dry days” data. For each threshold and site, steps described in Section ‘Neural network training’ are executed. The best-performing threshold is selected as described in Section ‘Data splitting into moist and dry days’.
The function also derives various performance metrics and creates diagnostic plots that are saved into the following files and directories (where
./data/profile_lue_obs_nn_<sitename>.Rdata
./fig_nn_fluxnet2015/ratio_vs_threshold/
: Illustrates by how much the “good days model” overestimates LUE during bad days, depending on the chosen soil moisture threshold.plot_rsq_vs_smtrh/plot_rsq_vs_smtrh_lue_obs_evi_<soilmoisturedata>_<sitename>.pdf"
: Plots the R\(^2\) of the good days model’s prediction of LUE (during “good days”).All NN predictions and performance metrics are saved in .Rdata
files:
profile_lue_obs_evi_nn_<sitename>.Rdata
: Contains NN predictions for each soil moisture threshold and each of the 5 repetitions, as well as respective performance metrics (RMSE and R\(^2\)) for each threshold and information about which threshold performs best (see below: Selection of best-performing soil moisture threshold).profile_light_lue_obs_evi_nn_<sitename>.Rdata
: Contains only summary information (performance metrics and information about best-performing threshold) and the the nnet
object (trained NN model).Ratio of NN-modelled over observed GPP during “dry days”, evaluated for each soil moisture threshold. The grey boxes represent predictions based on NN\(_{\mathrm{pot}}\), the green boxes represent the predictions based on (NN\(_{\mathrm{act}}\)). The red box illustrates the selected soil moisture threshold (0.4 in this case).
Light use efficiency is calculated and outliers (outside 3 times the interquartile range) are removed:
remove_outliers <- function( vec, coef=1.5 ) {
## use the command boxplot.stats()$out which use the Tukey's method
## to identify the outliers ranged above and below the <coef`>*IQR.
outlier <- boxplot.stats( vec, coef=coef )$out
vec[ which( is.element( vec, outlier ) ) ] <- NA
return( vec )
}
data <- data %>% mutate( lue_obs_evi = remove_outliers( gpp_obs / ( ppfd * evi ), coef=3.0 ) )
Soil moisture data is normalised to vary between 0 and 1, where 1 is the maximum value in the time series (generally equal to the maximum water holding cacpacity parameter in the respective model, but for the observational data, we don’t know the maximum water holding capacity). This is applied to make values from different models and observations comparable on the same scale. Example for soil moisture model output data soilm_splash150
:
data <- data %>% mutate( soilm_splash150 = soilm_splash150 / max( soilm_splash150, na.rm=TRUE ) )
Tests have shown that relationships between predictors and observed LUE are noisy during winter in temperate and boreal ecosystems. Therefore we limit data used for NN training to days where average temperature is above 5 degrees Celsius. In profile_soilm_neuralnet.R
, see
df_nona <- filter( df_nona, temp > 5.0 )
The following just removes NA values from the dataframe used for NN training:
df_nona <- cleandata_nn( df_nona, nam_target )
The code below executes the profiling by soil moisture using MODIS EVI data (argument nam_target="lue_obs_evi"
). Alternatively, use MODIS FPAR data (nam_target="lue_obs_fpar"
). If the argument makepdf=TRUE
, this creates diagnostic plots in subdirectory ./fig_nn_fluxnet2015
, otherwise plots are printed to screen. To speed up things, only one NN is trained only with soil moisture data from one dataset (varnams_swc=c("soilm_swbm")
), only one repetition for averaging (nrep=2
), and reduced numbers of soil moisture thresholds sampled (soilm_threshold=seq( 0.45, 0.55, 0.05 )
). Still, this takes a while - be patient. For the published study, we sampled the a wider range of thresholds (soilm_threshold = seq( 0.1, 0.60, 0.05 )
) and all available soil moisture datasets (varnams_swc = c( "soilm_splash150", "soilm_splash220", "soilm_swbm", "soilm_etobs", "soilm_etobs_ob" )
).
source('profile_soilm_neuralnet.R')
profile_soilm_neuralnet(
"FR-Pue",
nam_target="lue_obs_evi",
outdir="./data/profile/",
soilm_threshold=seq( 0.45, 0.55, 0.05 ), ## reduced range to speed up things
varnams_swc=c("soilm_swbm"), ## only SWBM soil moisture to speed up things
packages="nnet",
nrep=2,
makepdf=FALSE,
use_weights=FALSE,
overwrite_profile=TRUE
)
The full data from the profiling is now stored in the file ./data/profile/profile_evi_nn_FR-Pue.Rdata
. For example, the best-performing soil moisture threshold (soil water contant as afraction) for dataset soilm_swbm
is:
load("./data/profile/profile_light_lue_obs_evi_nn_FR-Pue.Rdata")
print( paste( "best-performing soil moisture threshold:", profile_nn_light$`FR-Pue`$soilm_swbm$nnet$best_soilm_trh[1] ) )
## [1] "best-performing soil moisture threshold: 0.55"
The profiling function described above is executed for 135 sites out of all 166 in the FLUXNET Tier 1 dataset, where modelled soil moisture gave consistent results across different models and was consistent with observed soil moisture where available (observational soil moisture availability is shown above). This step of site exclusion was done by visual judgment based on the comparison of soil moisture time series for each site (Figures created above, see Section ‘2. Data processing, Soil moisture’ and stored in ./fig/soilm/
). The table below shows the judment results (only first 7 rows are shown. Set n=166
to see the full table).
mysitename | code |
---|---|
AR-SLu | 2 |
AR-Vir | 2 |
AT-Neu | 2 |
AU-Ade | 1 |
AU-ASM | 1 |
AU-Cpr | 3 |
AU-Cum | 3 |
Due to its high computational demand when evaluating the profiling function for all 135 sites (“batch execution”), it can be executed in parallel for each site on a High Performance Computing cluster (We used the Imperial College London, HPC, server CX1). Note, that in this case, profile files are written into (and subsequently read from) paste( myhome, "data/nn_fluxnet/profile/")
(see argument testprofile
in various functions and scripts).
To do the batch execution, first, create a text file containing the list of sites for which profiling is to be done:
do.sites <- dplyr::filter( siteinfo, code!=0 )$mysitename
fileConn <- file("sitelist_doprofile_fluxnet2015.txt")
writeLines( do.sites, fileConn )
close(fileConn)
Then, in the terminal, enter
./submit_profile_nn_ALL.sh
This creates a job submission file for each site from submit_profile_nn_XXXXX.sh
and submits it to the cluster by qsub
. Note that the job submission file may have to be adjusted for execution on other clusters than Imperial HPC CX1.
This step is implemented by function nn_fVAR_fluxnet()
. It does the following:
./data/profile/profile_light_lue_obs_evi_nn_<sitename>.Rdata
) is read (if testprofile==TRUE
) and data corresponding to the best performing soil moisture threshold is extracted from its large nested list.fvar <- apply(
profile_nn[[ sitename ]][[ isoilm_data ]]$nnet$var_nn_all /
profile_nn[[ sitename ]][[ isoilm_data ]]$nnet[[
paste( "smtrh_", as.character( isoilm_trh ), sep="" )
]]$var_nn_bad,
1, FUN=mean
)
remove_outliers_fXX <- function( vec, coef=1.5 ) {
## use the command boxplot.stats()$out which use the Tukey's method
## to identify the outliers ranged above and below the <coef>*IQR.
outlier <- boxplot.stats( vec, coef=coef )$out
outlier <- outlier[ which( outlier>1.0 | outlier<0.0 ) ]
vec[ which( is.element( vec, outlier ) ) ] <- NA
return( vec )
}
df_nona$fvar <- remove_outliers_fXX( df_nona$fvar, coef=3.0 )
nice$fvar <- apply( fvar_by_smdata, 1, FUN=mean, na.rm=TRUE )
nn_fVAR_fluxnet()
.positive <- nice$fvar[ which(nice$fvar>=1.0) ]
even <- c( positive, 1.0-(positive-1.0) )
cutoff <- quantile( even, 0.05, na.rm=TRUE )
## consider only instances where fvar falls below 0.97 as drought
cutoff <- min( 0.97, cutoff )
nice$fvar_filled <- approx( nice$year_dec, nice$fvar, xout=nice$year_dec )$y
niceify()
is defined in file niceify.R
and adds rows with NA values back to dataframes where these rows were removed before.idxs <- which( !is.na(nice$fvar_filled) )
tmp <- data.frame( year_dec=nice$year_dec[idxs], fvar_smooth=runmed( nice$fvar_filled[idxs], 5 ) )
nice$fvar_smooth <- niceify( tmp, nice )$fvar_smooth
nice$fvar_smooth_filled <- approx( nice$year_dec, nice$fvar_smooth, xout=nice$year_dec )$y
is_drought_byvar
to the dataframe we’re working with here (nice
).nice$is_drought_byvar <- ( nice$fvar_smooth_filled < cutoff )
get_consecutive()
(file get_consecutive.R
) creates the dataframe droughts
that stores, for each such drought event, the index (with respect to vector nice$is_drought_byvar
) of its start and its length.droughts <- get_consecutive(
nice$is_drought_byvar,
leng_threshold = 10,
do_merge = FALSE
)
nice$fvar_smooth
) has NA values at the end of a drought event, reduce the length of the event so that to last non-NA value.out <- prune_droughts(
droughts,
nice$is_drought_byvar,
nice$fvar_smooth,
nice$fvar_smooth_filled,
mod_pot = nice$var_nn_pot,
mod_act = nice$var_nn_act,
obs = nice[[ nam_target ]],
soilm = soilm,
soilm_mod = soilm_mod,
apar = nice$ppfd * nice[[ fapar_data ]]
)
The function nn_fVAR_fluxnet()
implements all steps outlined above and produces four figures, included below. The first shows a histogram of fLUE values (red), the hypothetical symetrical distribution (blue), and fLUE cutoff to delineate fLUE droughts as a red vertical line. The subsequent figures are modelled versus observed LUE of NN\(_{\mathrm{pot}}\) during moist days, NN\(_{\mathrm{act}}\) during all days and NN\(_{\mathrm{act}}\) versus NN\(_{\mathrm{pot}}\) during moist days.
source("nn_fVAR_fluxnet2015.R")
nn_fVAR_fluxnet( "FR-Pue", nam_target="lue_obs_evi", use_weights=FALSE, makepdf=FALSE, testprofile=TRUE, overwrite=TRUE )
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: abind
## [1] "loading profile file ./data/profile/profile_light_lue_obs_evi_nn_FR-Pue.Rdata"
## [1] "==============================================="
## [1] "train NN at lue_obs_evi for FR-Pue ..."
## [1] "Using soil moisture data source: soilm_swbm"
## [1] "Using soil moisture threshold: 0.55"
## [1] "reading from profile data..."
## [1] "... done."
## [1] "aggregate across soil moisture datasets ..."
## [1] "done ..."
## [1] "get cutoff ..."
## [1] "done ..."
## [1] "fill and smooth ..."
## [1] "done ..."
## [1] "get drought events ..."
## [1] "number of events before pruning: 16"
## [1] "number of events after trimming: 16"
## [1] "number of events after pruning by soil moisture: 16"
## Loading required package: hydroGOF
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## [1] "number of events after pruning by mod vs obs: 15"
## [1] "number of events after pruning by GPP level: 15"
## [1] "number of events after pruning: 15"
##
## Attaching package: 'Metrics'
## The following objects are masked from 'package:hydroGOF':
##
## mae, mse, rmse
## [1] "writing file with fLUE and all other data into: data/nn_fluxnet2015_FR-Pue_lue_obs_evi.Rdata"
Parallel execution on a HPC cluster of nn_fVAR_fluxnet()
for each site separately is implemented by submit_fVAR_nn_ALL.sh
, submit_fVAR_nn_XXXXXX.sh
and execute_nn_fVAR_fluxnet2015.R
. Submit all site-specific jobs (site list in file sitelist_doprofile_fluxnet2015.txt
, as created above) to cluster by entering the following command in the terminal:
./submit_fVAR_nn_ALL.sh
Sites are excluded from further analysis based on data length and performance statistics of the NN model goodness:
analyse_modobs()
(script analyse_modobs.R
) as RMSE (root mean square error) divided by the mean of observationsand multiplied by 100.Sites excluded upon visual inspection and for which no quantitative criteria are found, are: CN-Din, CZ-BK1, IT-CA1, IT-CA3, IT-Ro2, and US-Me6.
This is implemented by function nn_getfail_fluxnet()
which returns a dataframe (one row) with all the performance metrics for this site.
source("function_nn_getfail_fluxnet2015.R")
out <- nn_getfail_fluxnet( "FR-Pue", nam_target="lue_obs_evi",
use_weights=FALSE, use_fapar=FALSE,
testprofile=TRUE, makepdf=FALSE, verbose=TRUE
)
## [1] "reading file ./data/fvar/nn_fluxnet2015_FR-Pue_lue_obs_evi.Rdata"
The column successcodes
provides the outcome of the quality check as a code.
print( dplyr::select( out, mysitename, successcode ) )
## mysitename successcode
## 1 FR-Pue 1
successcode | meaning |
---|---|
1 | success, droughts identified |
2 | success, no droughts identified (fLUE drought days make up less than 2% of all days.) |
3 | failed quality check |
Up to this point, all steps are carried out for one single example site and outputs are written into and read from ./data
. In order to enable further analyses and make the study fully reproducible, we provide all site-specific “intermediate” files created by function nn_fVAR_fluxnet()
that stores data, including fLUE. This avoids having to do the batch execution of profile_soilm_neuralnet()
and nn_fVAR_fluxnet()
. Data can be downloaded and extracted as follows (uncomment these lines in file knit_nn_fluxnet2015.Rmd
):
Once data is downloaded and extracted, get the success code and performance statistics for all sites in one dataframe and write it to the file ./successcodes.csv
.
source( "nn_getfail_fluxnet2015.R" )
## [1] "getting failure info for all sites ..."
## [1] "... done"
## [1] "Force exclusion of the following sites: "
## [1] "CN-Din" "CZ-BK1" "IT-CA1" "IT-CA3" "IT-Ro2" "US-Me6"
## [1] "number of sites with code 1: 36"
## [1] "number of sites with code 2: 35"
## [1] "number of sites with code 3: 63"
## [1] "number of sites with code 0: 32"
Out of the 135 sites for which the profiling was carried out (consistent soil moisture time series), 76 sites have a success code of 1 or 2, i.e. the NN-based method to quantify fLUE and identify fLUE droughts worked acceptably. These 76 sites are used for all subsequent analyses and are finally assigned to the four clusters.
Create a table that stores all the statistics evaluated for each site and that is finally used to create plot S5 of the paper. It is created here and complemented by additional evaluations in subsequent steps.
source( "get_overview_L1.R" )
## [1] "Getting data for overview table for all sites ..."
## [1] "... done."
## [1] "Saving to file data/overview_data_fluxnet2015_L1.Rdata"
load( "data/overview_data_fluxnet2015_L1.Rdata" )
knitr::kable( head( select(overview, mysitename, lon, lat, elv, year_start, year_end, years_data, classid), n=7 ), caption = "**Table**: First seven rows and first nine columns of overview table.")
mysitename | lon | lat | elv | year_start | year_end | years_data | classid |
---|---|---|---|---|---|---|---|
AR-SLu | -66.4598 | -33.4648 | 887 | 2009 | 2011 | 3 | MF |
AR-Vir | -56.1886 | -28.2395 | 74 | 2009 | 2012 | 4 | ENF |
AT-Neu | 11.3175 | 47.1167 | 970 | 2002 | 2012 | 11 | GRA |
AU-Ade | 131.1178 | -13.0769 | 188 | 2007 | 2009 | 3 | WSA |
AU-ASM | 133.2490 | -22.2830 | 615 | 2010 | 2013 | 4 | ENF |
AU-Cpr | 140.5891 | -34.0021 | 51 | 2010 | 2014 | 5 | SAV |
AU-Cum | 150.7225 | -33.6133 | 107 | 2012 | 2014 | 3 | EBF |
Fit a quadratic function to the relationship between fLUE and soil moisture for each site. This creates figures in ./fig_nn_fluxnet2015/fvar_vs_soilm/
, and stores data for all sites in data/fvar_vs_soilm.Rdata
(medians of fLUE by soil moisture 20%-quantiles) and data/fitparams.Rdata
(fit parameters of quadratic fit function). This also complements the overview table, stored as data/overview_data_fluxnet2015_L2.Rdata
. The figures included below are three examples of the fit figures created for all sites (see Figure S2 in SI of paper).
system( "mkdir -p ./fig_nn_fluxnet2015/fvar_vs_soilm" )
source( "fit_fvar_vs_soilm_nn_fluxnet2015.R" )
## [1] "Fitting functional relationship for all sites ..."
## [1] "... done."
This aggregates site-specific data into one big dataframe holding all sites’ data (used for additional plotting) as ./data/nice_agg_lue_obs_evi.Rdata
source( "aggregate_nn_fluxnet2015.R" )
This creates the time series plot, Fig. 1 in the manuscript.
source( "plot_bysite_nn_fluxnet2015.R" )
plot_bysite_nn_fluxnet2015( "FR-Pue", nam_target="lue_obs_evi",
use_fapar=FALSE, use_weights=FALSE,
makepdf=FALSE
)
To get plots for all sites, do
system( "mkdir -p ./fig_nn_fluxnet2015/panel_potentialgpp/" )
source( "execute_plot_bysite_nn_fluxnet2015.R" )
## [1] "creating time series plots for all sites ..."
## [1] "... done."
First, for each site separately, time series of variables stored in nice
(file nn_fluxnet2015_<sitname>_lue_obs_evi.Rdata
) are “cut” and aligned by the onset of fLUE droughts. Thereby, droughts can be characterised in terms of the common (aggregated across drought events) co-evolution of different variables. This creates a new data array data_alg_dry
with dimensions (dday, var, inst), where ‘dday’ is the number of days after drought onset, var
are the variables (columns) in nice
, and inst
is the drought instance. All arrays are stored as ./data/aligned_<sitename>.Rdata
.
Second, data_alg_dry
is re-expanded to a dataframe (2D) df_dday
and data is binned with respect to dday
. This step is also done for dataframes with MODIS GPP and FLUXCOM MTE data (df_dday_modis
, df_dday_mte
) and all are saved in files (./data/aligned_<sitename>.Rdata
, ./data/aligned_mte_<sitename>.Rdata
, and ./data/aligned_modis_<sitename>.Rdata
)
Third, df_dday
is aggregated (median and upper and lower quartiles) with respect to dday
, leading to dataframe df_dday_aggbydday
and saved in data/df_dday_aggbydday_<sitename>.Rdata
.
The function reshape_align_nn_fluxnet2015()
executes above steps and returns a list containing the dataframes df_dday
and df_dday_aggbydday
:
source( "reshape_align_nn_fluxnet2015.R" )
out <- reshape_align_nn_fluxnet2015( "FR-Pue" )
## Loading required package: cgwtools
print( "content of returned list:" ); ls( out )
## [1] "content of returned list:"
## [1] "df_dday" "df_dday_aggbydday"
The script execute_reshape_align_nn_fluxnet2015()
executes this function for all sites where fLUE droughts are identified (successcode==1
) and combines dataframes into combined large dataframes containing all sites’ data, saved in data/data_aligned_agg.Rdata
:
df_dday_agg
df_dday_aggbydday_agg
source( "execute_reshape_align_nn_fluxnet2015.R" )
## [1] "aligning data for all sites ..."
## [1] "... done."
## [1] "saving to file: data/data_aligned_agg.Rdata"
Panel plots for soil moisture, VPD, fLUE and EVI, all aligned by the fLUE drought onset and aggregated across drought events (Fig. 2 in paper main text), are created by the function plot_aligned_nn_fluxnet2015()
:
source( "plot_aligned_nn_fluxnet2015.R" )
plot_aligned_nn_fluxnet2015( "FR-Pue", nam_target="lue_obs_evi",
use_fapar=FALSE, use_weights=FALSE, makepdf=FALSE
)
Create panel plots for all sites and save to PDFs (fig_nn_fluxnet2015/aligned/aligned_potentialgpp_<sitename>_nam_target_lue_obs_evi.pdf
) by doing:
source( "execute_plot_aligned_nn_fluxnet2015.R" )
## [1] "create aligned plots for all sites ..."
## [1] "... done."
In step 1, sites are grouped into clusters cLS and cNA:
This uses data from files data/fvar_vs_soilm.Rdata
and data/fitparams.Rdata
, created by script fit_fvar_vs_soilm_nn_fluxnet2015.R
and complements overview table, creating data/overview_data_fluxnet2015_L4.Rdata
.
source("cluster_step1_nn_fluxnet2015.R")
## Warning: package 'ggplot2' was built under R version 3.3.2
## [1] "Total number of sites in group 4 (no low soil moisture): 21"
## Warning in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y): joining
## factor and character vector, coercing into character vector
## [1] "Total number of sites in group 3 (low fLUE sensitivity): 23"
Remaining sites are grouped by their parallel greenness and fLUE response during droughts.
This is based on aligned and binned data, as illustrated in the figure below, where the lines represent medians of data aligned by drought events, boxes represent values binned into 20 day periods (1 before and 2 after drought onset), green dots are medians of binned EVI data, normalised with respect to the level in bin 1 (EVI/EVI\(_0\)) and red dots are medians of binned fLUE data.
if (!file.exists("fig_nn_fluxnet2015/aligned_binned")) system("mkdir fig_nn_fluxnet2015/aligned_binned")
source("get_aggresponse_binned.R")
out <- get_aggresponse_binned( "FR-Pue", makepdf = FALSE )
The clustering into clusters cDD and cGR is based on the medians of binned data (‘fapar_agg’ and ‘fvar_agg’ in code, dots in above figure for one site) and is implemented using the kmeans()
function from the ‘stats’ R package:
## Combine variables based on which clustering is done into a single array 'mega'
mega <- cbind( fapar_agg, fvar_agg )
set.seed(1)
outkmeans <- stats::kmeans( mega, 2 )
The script cluster_step2_nn_fluxnet2015.R
executes both the collecting of binned and aligned data for all sites and the clustering. It complements the overview table and creates data/overview_data_fluxnet2015_L5.Rdata
.
source("cluster_step2_nn_fluxnet2015.R")
## [1] "total number of sites left for remaining clusters: 26"
## [1] "The following figure shows the dependence of the total within-cluster sum of squared differences from the cluster mean:"
## [1] "The following figure shows a histogram of the greenness changes (median of third bin, representing days 21-40 into fLUE droughts) by cluster 1 and 2:"
At this point, all analyses are completed. Remaining steps create (remaining) figures used in the paper.
The script plot_agg_nn_fluxnet2015.R
produces the following figures:
fig_nn_fluxnet2015/boxplot_fluedrought_vs_alpha_spi_spei_1mo.pdf
: Fig. 8 in paper. Boxplots for SPI, SPEI, and AET/PET distributions during fLUE droughts and non-droughts.fig_nn_fluxnet2015/modobs/modobs_gpp_rct_ALL_FROMNICE.pdf
: Fig. 2 in paper. Modelled versus observed scatterplots, evaluating NN performance based on GPP.fig_nn_fluxnet2015/modobs/modobs_lue_rct_ALL_FROMNICE.pdf
: Fig. S3 in Supplementary Information. Modelled versus observed scatterplots, evaluating NN performance based on LUEfig_nn_fluxnet2015/fpar_vs_evi.pdf
: Fig. S1 in Supplementary Information. Scatterplot of MODIS FPAR versus EVI.source("plot_agg_nn_fluxnet2015.R")
## [1] "plotting fig_nn_fluxnet2015/boxplot_fluedrought_vs_alpha_spi_spei_1mo.pdf"
## [1] "plotting mod vs obs for GPP in nice_agg: fig_nn_fluxnet2015/modobs/modobs_gpp_rct_ALL_FROMNICE.pdf"
## [1] "plotting mod vs obs for LUE in nice_agg: fig_nn_fluxnet2015/modobs/modobs_lue_rct_ALL_FROMNICE.pdf"
## [1] "plotting fig_nn_fluxnet2015/fpar_vs_evi.pdf"
The script plot_bycluster_nn_fluxnet2015.R
produces the following figures:
./fig_nn_fluxnet2015/vpd_vs_soilm/vpd_vs_soilm_lue_obs_evi_ALL.pdf
: Fig. S6 in Supplementary Information. Scatterplot of VPD versus soil moisture and boxplot of VPD distribution during dry and moist days.fig_nn_fluxnet2015/fvar_vs_soilm/fvar_vs_soilm_agg_ALL.pdf
: Fig. 7 in the main text of the paper. Functional form of fLUE versus soil moisture, by cluster.fig_nn_fluxnet2015/aligned_cluster/aligned_cluster_3rows.pdf
: Fig. 3 in the main text of the paper. fLUE and EVI/EVI\(_0\) during fLUE droughts for clusters 1 and 2.source("plot_bycluster_nn_fluxnet2015.R")
## [1] "plotting ./fig_nn_fluxnet2015/vpd_vs_soilm/vpd_vs_soilm_lue_obs_evi_ALL.pdf"
## [1] "plotting fig_nn_fluxnet2015/fvar_vs_soilm/fvar_vs_soilm_agg_ALL.pdf"
## [1] "printing fig_nn_fluxnet2015/aligned_cluster/aligned_cluster_3rows.pdf"
Overview “heat table” (Fig. S5)
The script plot_overview_nn_fluxnet2015.R
combines the overview table (data/overview_data_fluxnet2015_L5.Rdata
) with complementary site data extracted from global datasets:
df_wtd
from ./data/wtd_fluxnet2015.Rdata
. Data prepared by get_watertable_fluxnet2015.R
. Original files downloaded on May 16, 2017 from glowasis.deltares.nl, selecting /thredds/fileServer/opendap/opendap/Equilibrium_Water_Table/Eurasia_model_wtd_v2.nc
df_wtd_degraaf
from data/wtd_degraaf_fluxnet2015.Rdata
. Data prepared by get_watertable_degraaf_fluxnet2015.R
. Original file provided by I. De Graaf (29 May 2017).df_greve
from ./data/greve_fluxnet2015.Rdata
. Data prepared by get_greve.R
. Original file provided by P. Greve.df_ai
from ./data/ai_fluxnet2015.Rdata
. Data prepared by get_aridityindex.R
based on SPLASH model outputs from simulation s13 (see above).df_alpha
from ./data/alpha_fluxnet2015.Rdata
. Data prepared by get_alpha.R
based on SPLASH model outputs from simulation s11 (see above).df_soil
from ./data/soilparams_fluxnet2015.Rdata
prepared by get_soilparams.R
. Original files downloaded from http://globalchange.bnu.edu.cn/research/soilwd.jsp (26 May 2017). Complemented overview table is saved as data/overview_data_fluxnet2015_L6.Rdata
.The script produces the following figures:
fig_nn_fluxnet2015/overview_clusterX.pdf
where X
is 1, 2, 3 and 4. This is the colored overview table of all sites, grouped by cluster. Fig. S5 in Supplementary InformationThis script also evaluates the correlation of water table depth site data from (Fan, Li, and Miguez-Macho 2013) compared with (Graaf et al. 2015).
source("plot_overview_nn_fluxnet2015.R")
## [1] "Correlation of water table depth site data from Fan & Miguez-Macho (2013) compared with De Graaf et al. (2015):"
## [1] "adjusted R2: 0.224755326360854"
## [1] "Correlation of fLUE_0 with water table depth, site data from Fan & Miguez-Macho (2013)"
## [1] "adjusted R2: -0.0196119167708804"
## [1] "Correlation of fLUE_0 with water table depth, site data from De Graaf et al. (2015)"
## [1] "adjusted R2: 0.0109571701227021"
Sites across climate space (Fig. 9)
The script plot_sites_climatespace.R
creates a panel plot, illustrating the locations of sites in Budyko space and the relationship of annual percent GPP loss and maximum LUE reduction with AET/PET and the change in greenness. The figure sites_climatespace.pdf
is used in the paper as Fig. 9
source("plot_sites_climatespace.R")
## [1] "plotting fig_nn_fluxnet2015/sites_climatespace.pdf"
Map of PET/P and sites locations (Fig. 10)
The script plot_map_greve.R
creates a global map of PET/P (=aridity index) with symbols illustrating site locations, including distributions of the aridity index and AET/PET by clusters. This is Fig. 10 in paper. Plotting the map requires loading the NetCDF file that contains global data on PET/P from (Greve et al. 2014–10AD). However, this is not made publicly available. Therefore, the script just plots a constant field with value 1 as a map. The figure map_greve.pdf
is used in the paper as Fig. 10.
source("plot_map_greve.R")
## [1] "plotting ./fig_nn_fluxnet2015/map_greve.pdf"
## [1] "MYCOLORBAR: assuming explicit margins provided"
The script evaluate_sensitivity_soilm_vpd.R
evaluates the sensitivity of NNs to VPD and the bias of their predictions of LUE under moist and dry conditions. Figure vpd_soilmoisture_test.pdf
is created and used in the paper as Fig. 3. Execution of this script requires the full profile files. Due to their large size, they are not distributed here and the script thus cannot be executed.
source("evaluate_sensitivity_soilm_vpd.R")
Davis, T. W., I. C. Prentice, B. D. Stocker, R. T. Thomas, R. J. Whitley, H. Wang, B. J. Evans, A. V. Gallego-Sala, M. T. Sykes, and W. Cramer. 2017. “Simple Process-Led Algorithms for Simulating Habitats (Splash V.1.0): Robust Indices of Radiation, Evapotranspiration and Plant-Available Moisture.” Geoscientific Model Development 10 (2): 689–708. doi:10.5194/gmd-10-689-2017.
Fan, Y., H. Li, and G. Miguez-Macho. 2013. “Global Patterns of Groundwater Table Depth.” Science 339 (6122). American Association for the Advancement of Science: 940–43. doi:10.1126/science.1229881.
Graaf, I. E. M. de, E. H. Sutanudjaja, L. P. H. van Beek, and M. F. P. Bierkens. 2015. “A High-Resolution Global-Scale Groundwater Model.” Hydrology and Earth System Sciences 19 (2): 823–37. doi:10.5194/hess-19-823-2015.
Greve, Peter, Boris Orlowsky, Brigitte Mueller, Justin Sheffield, Markus Reichstein, and Sonia I. Seneviratne. 2014–10AD. “Global Assessment of Trends in Wetting and Drying over Land.” Nature Geosci 7 (10). Nature Publishing Group: 716–21. http://dx.doi.org/10.1038/ngeo2247.
Orth, Rene, Randal D. Koster, and Sonia I. Seneviratne. 2013. “Inferring Soil Moisture Memory from Streamflow Observations Using a Simple Water Balance Model.” Journal of Hydrometeorology 14 (6): 1773–90. doi:10.1175/JHM-D-12-099.1.
Prentice, I. Colin, Ning Dong, Sean M. Gleason, Vincent Maire, and Ian J. Wright. 2014. “Balancing the Costs of Carbon Gain and Water Transport: Testing a New Theoretical Framework for Plant Functional Ecology.” Ecology Letters 17 (1): 82–91. doi:10.1111/ele.12211.
Shangguan, Wei, Yongjiu Dai, Qingyun Duan, Baoyuan Liu, and Hua Yuan. 2014. “A Global Soil Data Set for Earth System Modeling.” Journal of Advances in Modeling Earth Systems 6 (1): 249–63. doi:10.1002/2013MS000293.