Example for using rsofun

Example using rsofun for P-model simulations

The following describes the use of rsofun for an ensemble of site-scale P-model simulations, including steps for model calibration and evaluation (benchmarking). rsofun is designed in a modular and hierarchical fashion. This enables multiple setups within the same modelling framework. The P-model implementation in rsofun is described in Stocker et al. (2020) Geosci. Mod. Dev..

In the P-model setup, the model requires time series of daily meteorological data as input and is calibrated against observational GPP data. This example describes simulations at a subset of FLUXNET 2015 Tier 1 sites, using GPP based on the night-time flux partitioning method as calibration target and benchmark (no worries, an out-of-sample calibration/evaluation function is available, too).

Site selection and meta data

We manually define a subset of sites that are part of the FLUXNET 2015 Tier 1 set of sites:

# mysites <- c("BE-Vie", "DE-Tha", "DK-Sor", "FI-Hyy", "IT-Col", "NL-Loo", "US-MMS", "US-WCr", "US-UMB", "US-Syv", "FR-Pue")
mysites <- "FR-Pue"

A small number of meta data variables have to be specified for each site specifically to define the simulation years. This information is also used for input, calibration, and evaluation data ingestion. Required meta information is specified for each site (in rows) and a number of variables:

lat for latitude (decimal degrees)
lon for longitude (decimal degrees) - this is only used for data ingestion but not for the P-model simulation with rsofun.
elv for elevation (m a.s.l.)
year_start and year_end specifying years covered by the simulation
whc for the soil water holding capacity
koeppen_code to group sites for evaluation by Koeppen-Geiger climate zones.

This information is provided in file siteinfo_fluxnet2015.csv. This file is created as described in (and using code from) metainfo_fluxnet2015.

siteinfo <- readr::read_csv("~/rsofun/siteinfo_fluxnet2015.csv") %>%
  filter(sitename %in% mysites)

## Parsed with column specification:
## cols(
##   sitename = col_character(),
##   lon = col_double(),
##   lat = col_double(),
##   elv = col_double(),
##   year_start = col_double(),
##   year_end = col_double(),
##   classid = col_character(),
##   c4 = col_logical(),
##   whc = col_double(),
##   koeppen_code = col_character(),
##   igbp_land_use = col_character(),
##   plant_functional_type = col_character()
## )

## take only year 2007 to 2014, corresponding to subset of data for site FR-Pue provided in this package as demo
siteinfo <- siteinfo %>% 
  mutate(year_start = 2007, year_end = 2014)

siteinfo <- siteinfo %>% 
  mutate(date_start = lubridate::ymd(paste0(year_start, "-01-01"))) %>%
  mutate(date_end = lubridate::ymd(paste0(year_end, "-12-31")))

Simulation settings

Specify additional simulation parameters that are identical for all site-scale simulations.

params_siml <- list(
  spinup             = TRUE,
  spinupyears        = 10,
  recycle            = 1,
  soilmstress        = FALSE,
  tempstress         = FALSE,
  calc_aet_fapar_vpd = FALSE,
  in_ppfd            = TRUE,
  in_netrad          = FALSE,
  outdt              = 1,
  ltre               = FALSE,
  ltne               = FALSE,
  ltrd               = FALSE,
  ltnd               = FALSE,
  lgr3               = TRUE,
  lgn3               = FALSE,
  lgr4               = FALSE
    )

Run prepare_setup_sofun() to define the simulation settings that contain all the information specified by the two steps above (meta info, and simulation parameters), global simulation parameters are wrapped inside an additional column params_siml, added to the site meta info dataframe.

siteinfo <- prepare_setup_sofun(siteinfo = siteinfo, params_siml = params_siml)

Define model parameters

First, let’s do it by hand (calibration of parameters is shown later).

params_modl <- list(
    kphio           = 0.05,
    soilm_par_a     = 1.0,
    soilm_par_b     = 0.0,
    vpdstress_par_a = 0.2,
    vpdstress_par_b = 0.2,
    vpdstress_par_m = 5
    )

Define soil parameters

For now, this is implemented as an illustration. Should be made site-specific. Values entered here take no effect.

df_soiltexture <- bind_rows(
  top    = tibble(layer = "top",    fsand = 0.4, fclay = 0.3, forg = 0.1, fgravel = 0.1),
  bottom = tibble(layer = "bottom", fsand = 0.4, fclay = 0.3, forg = 0.1, fgravel = 0.1)
)

Get input

Input data, used as model forcing, is collected using the ingestr package. A brief description for how to use it for our present application is provided here. Data is collected by data source.

Meteo data

The following ingests meteorological data from the FLUXNET 2015 files for variables temperature, precipitation, VPD, shortwave incoming radiation, net radiation, and atmospheric pressure. Arguments that are specific for this data source are provided in the settings list. The data object ddf_fluxnet is organised as a nested table with rows for sites and time series nested inside the column data. See here for how to handle nested dataframes.

library(ingestr)
ddf_fluxnet <- ingest(
  siteinfo  = siteinfo %>% filter(sitename %in% mysites),
  source    = "fluxnet",
  getvars   = list(temp = "TA_F_DAY", prec = "P_F", vpd  = "VPD_F", ppfd =  "SW_IN_F", patm = "PA_F"),
  dir       = paste0(path.package("rsofun"), "/extdata/"),
  settings  = list(dir_hh = paste0(path.package("rsofun"), "/extdata/"), dir_hr = paste0(path.package("rsofun"), "/extdata/"), getswc = FALSE),
  timescale = "d"
  )

Some meteo data is not available from FLUXNET. Extract it from CRU global climate files instead.

ddf_cru <- ingest(
  siteinfo  = siteinfo %>% filter(sitename %in% mysites),
  source    = "cru",
  getvars   = list(ccov = "cld"),
  dir       = "~/data/cru/ts_4.01/"
  )

## Warning in .varName(nc, varname, warn = warn): varname used is: cld
## If that is not correct, you can set it to one of: cld, stn

Combine the two meteo data frames into one, containing ccov (cloud cover) from CRU and all other variables from FLUXNET.

ddf_meteo <- ddf_fluxnet %>% 
  tidyr::unnest(data) %>% 
  left_join(
    ddf_cru %>% 
      tidyr::unnest(data),
    by = c("sitename", "date")
  ) %>% 
  group_by(sitename) %>% 
  tidyr::nest()

fAPAR data

fAPAR data is prescribed in the P-model setup. The following extracts data MODIS FPAR data from Google Earth Engine (see a complete description for how to set up the GEE API and download library here).

The following example is for downloading MODIS FPAR data (MODIS/006/MCD15A3H, band Fpar).

settings_gee <- get_settings_gee( 
  bundle = "modis_fpar", 
  python_path = system("which python", intern = TRUE),
  gee_path = "~/google_earth_engine_subsets/gee_subset/",
  data_path = "~/data/gee_subsets/",
  method_interpol = "linear",
  keep = FALSE,
  overwrite_raw = FALSE,
  overwrite_interpol = TRUE
  )

This can now be used to download the data to the directory specified by argument data_path of function get_settings_gee() and to read data into R.

df_gee_modis_fpar <- ingest(
  siteinfo  = siteinfo %>% filter(sitename %in% mysites),
  source = "gee",
  settings = settings_gee,
  verbose = FALSE
  )

## Parsed with column specification:
## cols(
##   id = col_character(),
##   longitude = col_double(),
##   latitude = col_double(),
##   date = col_date(format = ""),
##   Fpar = col_double(),
##   FparLai_QC = col_double(),
##   product = col_character()
## )

## linear ...

## done.

## Warning: Filling values with last available data point at head

## Warning: Filling values with last available data point at tail.

CO2

Ingesting CO2 data is particularly simple. We can safely assume it’s well mixed in the atmosphere (independent of site location), and we can use a annual mean value for all days in respective years.

df_co2 <- ingest(
  siteinfo  = siteinfo %>% filter(sitename %in% mysites),
  source = "co2",
  verbose = FALSE,
  settings = list(path = "~/data/co2/cCO2_rcp85_const850-1765.csv")
  )

## Parsed with column specification:
## cols(
##   year = col_double(),
##   co2 = col_double()
## )

Collect all drivers

Finally, we can collect forcing data, simulation parameters, and site meta info into a single object that will be used to drive rsofun.

df_drivers <- collect_drivers_sofun( 
  siteinfo       = siteinfo %>% filter(sitename %in% mysites),
  meteo          = ddf_meteo, 
  fapar          = df_gee_modis_fpar,
  co2            = df_co2,
  df_soiltexture = df_soiltexture
  )
df_drivers

## # A tibble: 1 x 5
##   sitename     params_siml       siteinfo df_soiltexture  forcing          
##   <chr>    <list<df[,18]>> <list<df[,12]> <list>          <list>           
## 1 FR-Pue          [1 × 18]       [1 × 12] <tibble [2 × 5… <tibble [2,920 ×…

If the data ingestion failed, an example for the drivers data frame df_driver can be

Run the model

Run the model for all the sites specified in the first step.

## run for a single site
mod <- run_sofun_f_bysite( 
  df_drivers$sitename[1], 
  df_drivers$params_siml[[1]], 
  df_drivers$siteinfo[[1]], 
  df_drivers$forcing[[1]], 
  df_drivers$df_soiltexture[[1]], 
  params_modl = params_modl, 
  makecheck = TRUE 
  )

## Run for the full set of sites
ptm <- proc.time()
df_output <- runread_sofun_f(
     df_drivers, 
     params_modl = params_modl, 
     makecheck = TRUE,
     parallel = FALSE
     )
print(ptm)

##    user  system elapsed 
##   3.497   0.348   8.379

# microbenchmark::microbenchmark(
#   runread_sofun_f(
#     df_drivers, 
#     params_modl = params_modl, 
#     makecheck = TRUE,
#     parallel = TRUE,
#     ncores = 4
#     ),
#   runread_sofun_f(
#     df_drivers, 
#     params_modl = params_modl, 
#     makecheck = TRUE,
#     parallel = FALSE
#     ),
#   times = 5,
#   units = 's'
# )

library(ggplot2)
df_output$data[[1]] %>% 
  ggplot(aes(x=date, y=gpp)) +
  geom_line() + 
  labs(title = df_output$sitename[[1]], subtitle = "SOFUN output")

Calibrate

Define calibration settings.

settings_calib <- list(
  method              = "gensa",
  targetvars          = c("gpp"),
  timescale           = list( gpp = "d" ),
  maxit               = 5, # (5 for gensa) (30 for optimr)    #
  sitenames           = "FR-Pue",
  metric              = "rmse",
  dir_results         = "./",
  name                = "ORG",
  par                 = list( kphio = list( lower=0.03, upper=0.07, init=0.0496 ) )
 )

Use the ingestr package once again, now for collecting calibration target data. I.e., GPP based on the nighttime flux decomposition method.

settings_ingestr_fluxnet <- list(
  dir_hh = "~/data/FLUXNET-2015_Tier1/20191024/HH/", 
  getswc = FALSE,
  filter_ntdt = TRUE,
  threshold_GPP = 0.8,
  remove_neg = FALSE
  )

ddf_fluxnet_gpp <- ingestr::ingest(
  siteinfo = filter(siteinfo, sitename == "FR-Pue"),
  source    = "fluxnet",
  getvars = list(gpp = "GPP_NT_VUT_REF",
                 gpp_unc = "GPP_NT_VUT_SE"),
  dir = "~/data/FLUXNET-2015_Tier1/20191024/DD/",
  settings = settings_ingestr_fluxnet,
  timescale = "d"
  )

Calibrate the model.

set.seed(1982)
settings_calib <- calib_sofun(
  df_drivers = filter(df_drivers, sitename %in% settings_calib$sitenames),  # use only one site
  ddf_obs = ddf_fluxnet_gpp,
  settings = settings_calib
  )

## [1] 0.03008955
## [1] "writing output from GenSA function to .//out_gensa_ORG.Rdata"
## [1] "writing calibrated parameters to .//params_opt_ORG.csv"

The calibrated parameters are returned by calib_sofun() as part of the list:

print(settings_calib$par_opt)

##      kphio 
## 0.03008955

Evaluate

Run the model once again with these parameters and evaluate results.

params_modl$kphio <- settings_calib$par_opt["kphio"]

df_output <- runread_sofun_f(
     df_drivers, 
     params_modl = params_modl, 
     makecheck = TRUE,
     parallel = FALSE
     )

Get evaluation data (benchmarking data).

## get data for idfferent time scales separately
settings_fluxnet <- list(
  getswc = FALSE,
  filter_ntdt = TRUE,
  threshold_GPP = 0.8,
  remove_neg = FALSE
  )

ddf_eval <- ingest(
  siteinfo  = siteinfo %>% filter(sitename %in% mysites),
  source    = "fluxnet",
  getvars   = list(gpp = "GPP_NT_VUT_REF",
                   gpp_unc = "GPP_NT_VUT_SE"),
  dir       = paste0(path.package("rsofun"), "/extdata/"),
  settings  = settings_fluxnet,
  timescale = "d"
  )

mdf_eval <- ingest(
  siteinfo  = siteinfo %>% filter(sitename %in% mysites),
  source    = "fluxnet",
  getvars   = list(gpp = "GPP_NT_VUT_REF",
                   gpp_unc = "GPP_NT_VUT_SE"),
  dir       = paste0(path.package("rsofun"), "/extdata/"),
  settings  = settings_fluxnet,
  timescale = "m"
  )

adf_eval <- ingest(
  siteinfo  = siteinfo %>% filter(sitename %in% mysites),
  source    = "fluxnet",
  getvars   = list(gpp = "GPP_NT_VUT_REF",
                   gpp_unc = "GPP_NT_VUT_SE"),
  dir       = paste0(path.package("rsofun"), "/extdata/"),
  settings  = settings_fluxnet,
  timescale = "y"
  )

Use rsofun to create a standardised object used for benchmarking the model output.

settings_eval <- list(
  benchmark = list( gpp = c("fluxnet2015") ),
  sitenames = mysites,
  agg       = 8  # An integer specifying the number of days used to define the width of bins for daily data aggregated to several days
  )
obs_eval <- collect_obs_eval( 
  siteinfo = siteinfo %>% filter(sitename %in% mysites),
  settings = settings_eval, 
  adf = adf_eval, 
  mdf = mdf_eval, 
  ddf = ddf_eval 
  )

obs_eval is now a list of data frames for different temporal resolutions. The data frames have rows for sites and time series for each site nested inside the column data.

df_output is the model output, also a data frame with rows for sites and time series for each site nested inside a column named data.

And finally do the evaluation.

out_eval <- eval_sofun( 
  df_output, 
  settings_eval, 
  settings_sims, 
  obs_eval = obs_eval, 
  overwrite = TRUE, 
  light = FALSE 
  )

## [1] "-----------------------------------"
## [1] "gpp"
## [1] "-----------------------------------"
## [1] "Aggregating model outputs..."
## [1] "Evaluate annual values..."
## [1] "Evaluate monthly values..."
## [1] "Evaluate spatial values..."
## [1] "Evaluate interannual variability..."
## [1] "Evaluate mean seasonal cycle..."
## [1] "Evaluate mean seasonal cycle by climate zones..."
## [1] "Evaluate inter-day variability..."
## [1] "Evaluate mean seasonal cycle by X-day periods..."
## [1] "Evaluate inter-X-day variability..."
## [1] "Done with eval_sofun()."

Print some results.

out_eval$gpp$fluxnet2015$metrics$xdaily_pooled

## # A tibble: 1 x 5
##     rsq  rmse slope   bias nvals
##   <dbl> <dbl> <dbl>  <dbl> <int>
## 1 0.461  1.27 0.955 -0.245   355

Get the rbeni R package for nice plotting functions that can be used with the output of eval_sofun().

library(rbeni)

## 
## Attaching package: 'rbeni'

## The following object is masked _by_ '.GlobalEnv':
## 
##     add_alpha

## The following object is masked from 'package:ingestr':
## 
##     remove_outliers

gg <- out_eval$gpp$fluxnet2015$data$xdf %>% analyse_modobs2("mod", "obs", type = "heat")

## Loading required package: LSD

## Loading required package: ggthemes

## Loading required package: RColorBrewer

gg$gg +
  labs(title = "FR-Pue: modelled vs. observed GPP", 
       x = expression(paste("Modelled GPP (gC m"^{-2}, "d"^{-1}, ")")), 
       y = expression(paste("Observed GPP (gC m"^{-2}, "d"^{-1}, ")")))

## `geom_smooth()` using formula 'y ~ x'