There are cases in which we may want to run matilda
functions on multiple objects. This could be, for example, across
different parameter sets or across different scenarios.
In this tutorial we will briefly address how to conduct this type of
analysis easily using lapply()
and Map()
functions.
We will go through these processes:
Running iterate_model()
Scoring runs with score_runs()
Computing metrics with metric_calc()
Computing probabilities with prob_calc()
all using looping functions.
The packages we need for this tutorial starts and ends with
matilda
and ggplot2
. The looping functions we
will use are all part of base R
. ggplot will be used for
plotting and if you have not installed matilda
you will
have to do that first with the following code:
# Install Matilda from Github
remotes::install_github("jgcri/matilda")
If you already have matilda
installed we can proceed
with loading the package:
# Loading packages
library(matilda)
## Loading required package: hector
library(ggplot2)
matilda
is integrated with the Hector simple climate
model and therefore we run the model in a similar way.
For this tutorial we will run matilda
with different
scenarios:
# locate the ini directory
ini_dir <- paste0(system.file("input", package = "hector"), "/")
# read in ini files into a list
ini_list <- list(ssp119 = paste0(ini_dir, "hector_ssp119.ini"),
ssp245 = paste0(ini_dir, "hector_ssp245.ini"),
ssp370 = paste0(ini_dir, "hector_ssp370.ini"),
ssp585 = paste0(ini_dir, "hector_ssp585.ini"))
By creating a list we can take advantage of lapply()
which will loop through each element of the list applying the function
we direct it to.
Before running the model we will use one of the ini files to initiate a core that will be used to produce parameters for the model.
# Using generate_params() to build the perturbed parameter set
# set seed for replication
set.seed(123)
# set sample size (we will run with a small sample size for now)
n = 10
# initiate a core using the first ini file in ini_list
params_core <- newcore(ini_list[[1]])
# produce parameter set
params <- generate_params(core = params_core,
draws = n)
We will use lapply()
to loop across the elements of
ini_list
, running the model for each element (scenario).
This function will first initiate a core for each ini file in
ini_list
then will run the model using the params produced
above.
# using element names in ini_list, loop through each scenario_name and run the model
model_results <- lapply(names(ini_list), function(scenario_name){
# extract scenario information from each element in the ini_list by
# scenario name
scenario = ini_list[[scenario_name]]
# initiate core for the scenario
core = newcore(scenario, name = scenario_name)
# run the model
result = iterate_model(core = core,
params = params,
save_years = 1800:2100,
save_vars = GMST())
})
This will take a minute to run, but the result will be a list of
model results. This list, called model_results
and each
element is a matilda
result data frame. There will be one
elements for each of the scenarios in ini_list
.
We can use the same method for scoring the results in
model_results
.
# loop through results in model_results to compute model weights for each
score_list <- lapply(model_results, function(df){
# use observed temperature to compute model weights
scores_temp <- score_runs(df,
criterion = criterion_gmst_obs(),
score_function = score_bayesian)
# remove any NAs -- this is important to keep the alignment of results and scores correct
scores_temp = na.omit(scores_temp)
})
Here, we are not looping through model_results
using
names, because 1) the names are NULL and 2) we don’t need to pass names
to any pieces of the anonymous function. We only need to tell
R
to loop through each element in the
model_results
list to compute scores.
The result is a new list of scores called score_list
.
Similar to before, this is a list of elements where each element is a
data frame that contains scores (or model weights for each data frame in
model_results
).
At this point we can merge our model weights with our model results.
To do this we will use the Map()
function. This function is
used to apply a function across corresponding elements of multiple
lists.
# merge results df in model_results with the corresponding scores in score_list
# by the run_number column
scored_result_list <- Map(merge, model_results, score_list, by = "run_number")
# bind the elements in scored_result_list to build data frame for plotting
scored_result_df <- do.call(rbind, scored_result_list)
Plot:
ggplot(data = scored_result_df) +
geom_line(
aes(x = year,
y = value,
group = run_number,
color = weights)) +
scale_color_continuous() +
facet_wrap(~scenario)
Before we can calculate metrics, we need to establish our metric of interest.
my_metric <- new_metric(var = GMST(), years = 2090:2100, op = median)
Now we can calculate metrics for each of the data frames in
model_results
. We will follow the same
lapply()
method.
# loop through results and compute metrics for each
metric_list <- lapply(model_results, function(df){
# remove any NAs in the current df (this is important for alignment of data frames)
df_na.rm <- na.omit(df)
# calculate metrics using the metric we defined in the previous step
metric_calc(df_na.rm, my_metric)
})
# merge the metrics_list dfs with score_list
scored_metric_list <- Map(merge, metric_list, score_list, by = "run_number")
The final step above merges the metric_list elements with the score_list elements. This will make computing the probabilities much easier.
Now that we have everything we need to compute probabilities of
warming we can use prob_calc()
to compute those values for
each scenario.
# Define metric ranges
temp_ranges <- c(1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, Inf)
# compute probabilities for each element in metric_list
probability_result <- lapply(scored_metric_list, function(df){
# use prob_calc to compute probabilities
prob_calc(metrics = df$metric_result,
bins = temp_ranges,
scores = df$weights)
})
We also want to add scenario names to the data frames in
probability_result
. This does not yet exist, but will be
helpful when plotting. To complete this step we can take advantage of
Map()
once again.
# scenario names
scenario_id <- c("ssp119", "ssp245", "ssp370", "ssp585")
# Use Map to add scenario names to metric_list()
probability_result <- Map(function(df, scenario_id){
# add scenario column and add corresponding scenario id in the scenario_id vector
df$scenario <- scenario_id
return(df)
}, probability_result, scenario_id)
Here we are defining an anonymous function to add the
scenario
column to each data frame in
probability_result
list. Finally, we can bind the
probability_result
list into a data frame.
# bind results to data frame
probability_result_df <- do.call(rbind, probability_result)
With the resulting data frame we can compares probabilities of
warming across scenarios or plot. stacked bar graphs as is presented in
the matilda
v1.0 paper (Brown et al. 2023).
In closing it is important to consider that this workflow not only
works to simplify running matilda
functions across multiple
scenarios, but can also be applied to runs matilda
functions with different perturbed parameter sets, or even a combination
of them both.