The following describes the use of rsofun
for an ensemble of site-scale P-model simulations, including steps for model calibration and evaluation (benchmarking). rsofun
is designed in a modular and hierarchical fashion. This enables multiple setups within the same modelling framework. The P-model implementation in rsofun
is described in Stocker et al. (2020) Geosci. Mod. Dev..
In the P-model setup, the model requires time series of daily meteorological data as input and is calibrated against observational GPP data. This example describes simulations at a subset of FLUXNET 2015 Tier 1 sites, using GPP based on the night-time flux partitioning method as calibration target and benchmark (no worries, an out-of-sample calibration/evaluation function is available, too).
We manually define a subset of sites that are part of the FLUXNET 2015 Tier 1 set of sites:
# mysites <- c("BE-Vie", "DE-Tha", "DK-Sor", "FI-Hyy", "IT-Col", "NL-Loo", "US-MMS", "US-WCr", "US-UMB", "US-Syv", "FR-Pue")
mysites <- "FR-Pue"
A small number of meta data variables have to be specified for each site specifically to define the simulation years. This information is also used for input, calibration, and evaluation data ingestion. Required meta information is specified for each site (in rows) and a number of variables:
lat
for latitude (decimal degrees)lon
for longitude (decimal degrees) - this is only used for data ingestion but not for the P-model simulation with rsofun
.elv
for elevation (m a.s.l.)year_start
and year_end
specifying years covered by the simulationwhc
for the soil water holding capacitykoeppen_code
to group sites for evaluation by Koeppen-Geiger climate zones.This information is provided in file siteinfo_fluxnet2015.csv
. This file is created as described in (and using code from) metainfo_fluxnet2015.
siteinfo <- readr::read_csv("~/rsofun/siteinfo_fluxnet2015.csv") %>%
filter(sitename %in% mysites)
## Parsed with column specification:
## cols(
## sitename = col_character(),
## lon = col_double(),
## lat = col_double(),
## elv = col_double(),
## year_start = col_double(),
## year_end = col_double(),
## classid = col_character(),
## c4 = col_logical(),
## whc = col_double(),
## koeppen_code = col_character(),
## igbp_land_use = col_character(),
## plant_functional_type = col_character()
## )
## take only year 2007 to 2014, corresponding to subset of data for site FR-Pue provided in this package as demo
siteinfo <- siteinfo %>%
mutate(year_start = 2007, year_end = 2014)
siteinfo <- siteinfo %>%
mutate(date_start = lubridate::ymd(paste0(year_start, "-01-01"))) %>%
mutate(date_end = lubridate::ymd(paste0(year_end, "-12-31")))
Specify additional simulation parameters that are identical for all site-scale simulations.
params_siml <- list(
spinup = TRUE,
spinupyears = 10,
recycle = 1,
soilmstress = FALSE,
tempstress = FALSE,
calc_aet_fapar_vpd = FALSE,
in_ppfd = TRUE,
in_netrad = FALSE,
outdt = 1,
ltre = FALSE,
ltne = FALSE,
ltrd = FALSE,
ltnd = FALSE,
lgr3 = TRUE,
lgn3 = FALSE,
lgr4 = FALSE
)
Run prepare_setup_sofun()
to define the simulation settings that contain all the information specified by the two steps above (meta info, and simulation parameters), global simulation parameters are wrapped inside an additional column params_siml
, added to the site meta info dataframe.
siteinfo <- prepare_setup_sofun(siteinfo = siteinfo, params_siml = params_siml)
First, let’s do it by hand (calibration of parameters is shown later).
params_modl <- list(
kphio = 0.05,
soilm_par_a = 1.0,
soilm_par_b = 0.0,
vpdstress_par_a = 0.2,
vpdstress_par_b = 0.2,
vpdstress_par_m = 5
)
For now, this is implemented as an illustration. Should be made site-specific. Values entered here take no effect.
df_soiltexture <- bind_rows(
top = tibble(layer = "top", fsand = 0.4, fclay = 0.3, forg = 0.1, fgravel = 0.1),
bottom = tibble(layer = "bottom", fsand = 0.4, fclay = 0.3, forg = 0.1, fgravel = 0.1)
)
Input data, used as model forcing, is collected using the ingestr package. A brief description for how to use it for our present application is provided here. Data is collected by data source.
The following ingests meteorological data from the FLUXNET 2015 files for variables temperature, precipitation, VPD, shortwave incoming radiation, net radiation, and atmospheric pressure. Arguments that are specific for this data source are provided in the settings
list. The data object ddf_fluxnet
is organised as a nested table with rows for sites and time series nested inside the column data
. See here for how to handle nested dataframes.
library(ingestr)
ddf_fluxnet <- ingest(
siteinfo = siteinfo %>% filter(sitename %in% mysites),
source = "fluxnet",
getvars = list(temp = "TA_F_DAY", prec = "P_F", vpd = "VPD_F", ppfd = "SW_IN_F", patm = "PA_F"),
dir = paste0(path.package("rsofun"), "/extdata/"),
settings = list(dir_hh = paste0(path.package("rsofun"), "/extdata/"), dir_hr = paste0(path.package("rsofun"), "/extdata/"), getswc = FALSE),
timescale = "d"
)
Some meteo data is not available from FLUXNET. Extract it from CRU global climate files instead.
ddf_cru <- ingest(
siteinfo = siteinfo %>% filter(sitename %in% mysites),
source = "cru",
getvars = list(ccov = "cld"),
dir = "~/data/cru/ts_4.01/"
)
## Warning in .varName(nc, varname, warn = warn): varname used is: cld
## If that is not correct, you can set it to one of: cld, stn
Combine the two meteo data frames into one, containing ccov
(cloud cover) from CRU and all other variables from FLUXNET.
ddf_meteo <- ddf_fluxnet %>%
tidyr::unnest(data) %>%
left_join(
ddf_cru %>%
tidyr::unnest(data),
by = c("sitename", "date")
) %>%
group_by(sitename) %>%
tidyr::nest()
fAPAR data is prescribed in the P-model setup. The following extracts data MODIS FPAR data from Google Earth Engine (see a complete description for how to set up the GEE API and download library here).
The following example is for downloading MODIS FPAR data (MODIS/006/MCD15A3H, band Fpar).
settings_gee <- get_settings_gee(
bundle = "modis_fpar",
python_path = system("which python", intern = TRUE),
gee_path = "~/google_earth_engine_subsets/gee_subset/",
data_path = "~/data/gee_subsets/",
method_interpol = "linear",
keep = FALSE,
overwrite_raw = FALSE,
overwrite_interpol = TRUE
)
This can now be used to download the data to the directory specified by argument data_path
of function get_settings_gee()
and to read data into R.
df_gee_modis_fpar <- ingest(
siteinfo = siteinfo %>% filter(sitename %in% mysites),
source = "gee",
settings = settings_gee,
verbose = FALSE
)
## Parsed with column specification:
## cols(
## id = col_character(),
## longitude = col_double(),
## latitude = col_double(),
## date = col_date(format = ""),
## Fpar = col_double(),
## FparLai_QC = col_double(),
## product = col_character()
## )
## linear ...
## done.
## Warning: Filling values with last available data point at head
## Warning: Filling values with last available data point at tail.
Ingesting CO2 data is particularly simple. We can safely assume it’s well mixed in the atmosphere (independent of site location), and we can use a annual mean value for all days in respective years.
df_co2 <- ingest(
siteinfo = siteinfo %>% filter(sitename %in% mysites),
source = "co2",
verbose = FALSE,
settings = list(path = "~/data/co2/cCO2_rcp85_const850-1765.csv")
)
## Parsed with column specification:
## cols(
## year = col_double(),
## co2 = col_double()
## )
Finally, we can collect forcing data, simulation parameters, and site meta info into a single object that will be used to drive rsofun.
df_drivers <- collect_drivers_sofun(
siteinfo = siteinfo %>% filter(sitename %in% mysites),
meteo = ddf_meteo,
fapar = df_gee_modis_fpar,
co2 = df_co2,
df_soiltexture = df_soiltexture
)
df_drivers
## # A tibble: 1 x 5
## sitename params_siml siteinfo df_soiltexture forcing
## <chr> <list<df[,18]>> <list<df[,12]> <list> <list>
## 1 FR-Pue [1 × 18] [1 × 12] <tibble [2 × 5… <tibble [2,920 ×…
If the data ingestion failed, an example for the drivers data frame df_driver
can be
Run the model for all the sites specified in the first step.
## run for a single site
mod <- run_sofun_f_bysite(
df_drivers$sitename[1],
df_drivers$params_siml[[1]],
df_drivers$siteinfo[[1]],
df_drivers$forcing[[1]],
df_drivers$df_soiltexture[[1]],
params_modl = params_modl,
makecheck = TRUE
)
## Run for the full set of sites
ptm <- proc.time()
df_output <- runread_sofun_f(
df_drivers,
params_modl = params_modl,
makecheck = TRUE,
parallel = FALSE
)
print(ptm)
## user system elapsed
## 3.497 0.348 8.379
# microbenchmark::microbenchmark(
# runread_sofun_f(
# df_drivers,
# params_modl = params_modl,
# makecheck = TRUE,
# parallel = TRUE,
# ncores = 4
# ),
# runread_sofun_f(
# df_drivers,
# params_modl = params_modl,
# makecheck = TRUE,
# parallel = FALSE
# ),
# times = 5,
# units = 's'
# )
library(ggplot2)
df_output$data[[1]] %>%
ggplot(aes(x=date, y=gpp)) +
geom_line() +
labs(title = df_output$sitename[[1]], subtitle = "SOFUN output")
Define calibration settings.
settings_calib <- list(
method = "gensa",
targetvars = c("gpp"),
timescale = list( gpp = "d" ),
maxit = 5, # (5 for gensa) (30 for optimr) #
sitenames = "FR-Pue",
metric = "rmse",
dir_results = "./",
name = "ORG",
par = list( kphio = list( lower=0.03, upper=0.07, init=0.0496 ) )
)
Use the ingestr package once again, now for collecting calibration target data. I.e., GPP based on the nighttime flux decomposition method.
settings_ingestr_fluxnet <- list(
dir_hh = "~/data/FLUXNET-2015_Tier1/20191024/HH/",
getswc = FALSE,
filter_ntdt = TRUE,
threshold_GPP = 0.8,
remove_neg = FALSE
)
ddf_fluxnet_gpp <- ingestr::ingest(
siteinfo = filter(siteinfo, sitename == "FR-Pue"),
source = "fluxnet",
getvars = list(gpp = "GPP_NT_VUT_REF",
gpp_unc = "GPP_NT_VUT_SE"),
dir = "~/data/FLUXNET-2015_Tier1/20191024/DD/",
settings = settings_ingestr_fluxnet,
timescale = "d"
)
Calibrate the model.
set.seed(1982)
settings_calib <- calib_sofun(
df_drivers = filter(df_drivers, sitename %in% settings_calib$sitenames), # use only one site
ddf_obs = ddf_fluxnet_gpp,
settings = settings_calib
)
## [1] 0.03008955
## [1] "writing output from GenSA function to .//out_gensa_ORG.Rdata"
## [1] "writing calibrated parameters to .//params_opt_ORG.csv"
The calibrated parameters are returned by calib_sofun()
as part of the list:
print(settings_calib$par_opt)
## kphio
## 0.03008955
Run the model once again with these parameters and evaluate results.
params_modl$kphio <- settings_calib$par_opt["kphio"]
df_output <- runread_sofun_f(
df_drivers,
params_modl = params_modl,
makecheck = TRUE,
parallel = FALSE
)
Get evaluation data (benchmarking data).
## get data for idfferent time scales separately
settings_fluxnet <- list(
getswc = FALSE,
filter_ntdt = TRUE,
threshold_GPP = 0.8,
remove_neg = FALSE
)
ddf_eval <- ingest(
siteinfo = siteinfo %>% filter(sitename %in% mysites),
source = "fluxnet",
getvars = list(gpp = "GPP_NT_VUT_REF",
gpp_unc = "GPP_NT_VUT_SE"),
dir = paste0(path.package("rsofun"), "/extdata/"),
settings = settings_fluxnet,
timescale = "d"
)
mdf_eval <- ingest(
siteinfo = siteinfo %>% filter(sitename %in% mysites),
source = "fluxnet",
getvars = list(gpp = "GPP_NT_VUT_REF",
gpp_unc = "GPP_NT_VUT_SE"),
dir = paste0(path.package("rsofun"), "/extdata/"),
settings = settings_fluxnet,
timescale = "m"
)
adf_eval <- ingest(
siteinfo = siteinfo %>% filter(sitename %in% mysites),
source = "fluxnet",
getvars = list(gpp = "GPP_NT_VUT_REF",
gpp_unc = "GPP_NT_VUT_SE"),
dir = paste0(path.package("rsofun"), "/extdata/"),
settings = settings_fluxnet,
timescale = "y"
)
Use rsofun to create a standardised object used for benchmarking the model output.
settings_eval <- list(
benchmark = list( gpp = c("fluxnet2015") ),
sitenames = mysites,
agg = 8 # An integer specifying the number of days used to define the width of bins for daily data aggregated to several days
)
obs_eval <- collect_obs_eval(
siteinfo = siteinfo %>% filter(sitename %in% mysites),
settings = settings_eval,
adf = adf_eval,
mdf = mdf_eval,
ddf = ddf_eval
)
obs_eval
is now a list of data frames for different temporal resolutions. The data frames have rows for sites and time series for each site nested inside the column data
.
df_output
is the model output, also a data frame with rows for sites and time series for each site nested inside a column named data
.
And finally do the evaluation.
out_eval <- eval_sofun(
df_output,
settings_eval,
settings_sims,
obs_eval = obs_eval,
overwrite = TRUE,
light = FALSE
)
## [1] "-----------------------------------"
## [1] "gpp"
## [1] "-----------------------------------"
## [1] "Aggregating model outputs..."
## [1] "Evaluate annual values..."
## [1] "Evaluate monthly values..."
## [1] "Evaluate spatial values..."
## [1] "Evaluate interannual variability..."
## [1] "Evaluate mean seasonal cycle..."
## [1] "Evaluate mean seasonal cycle by climate zones..."
## [1] "Evaluate inter-day variability..."
## [1] "Evaluate mean seasonal cycle by X-day periods..."
## [1] "Evaluate inter-X-day variability..."
## [1] "Done with eval_sofun()."
Print some results.
out_eval$gpp$fluxnet2015$metrics$xdaily_pooled
## # A tibble: 1 x 5
## rsq rmse slope bias nvals
## <dbl> <dbl> <dbl> <dbl> <int>
## 1 0.461 1.27 0.955 -0.245 355
Get the rbeni R package for nice plotting functions that can be used with the output of eval_sofun()
.
library(rbeni)
##
## Attaching package: 'rbeni'
## The following object is masked _by_ '.GlobalEnv':
##
## add_alpha
## The following object is masked from 'package:ingestr':
##
## remove_outliers
gg <- out_eval$gpp$fluxnet2015$data$xdf %>% analyse_modobs2("mod", "obs", type = "heat")
## Loading required package: LSD
## Loading required package: ggthemes
## Loading required package: RColorBrewer
gg$gg +
labs(title = "FR-Pue: modelled vs. observed GPP",
x = expression(paste("Modelled GPP (gC m"^{-2}, "d"^{-1}, ")")),
y = expression(paste("Observed GPP (gC m"^{-2}, "d"^{-1}, ")")))
## `geom_smooth()` using formula 'y ~ x'