Background

There are cases in which we may want to run matilda functions on multiple objects. This could be, for example, across different parameter sets or across different scenarios.

In this tutorial we will briefly address how to conduct this type of analysis easily using lapply() and Map() functions.

We will go through these processes:

Running iterate_model()
Scoring runs with score_runs()
Computing metrics with metric_calc()
Computing probabilities with prob_calc()

all using looping functions.

Loading packages

The packages we need for this tutorial starts and ends with matilda and ggplot2. The looping functions we will use are all part of base R. ggplot will be used for plotting and if you have not installed matilda you will have to do that first with the following code:

# Install Matilda from Github
remotes::install_github("jgcri/matilda")

If you already have matilda installed we can proceed with loading the package:

# Loading packages
library(matilda)

## Loading required package: hector

library(ggplot2)

Building .ini List

matilda is integrated with the Hector simple climate model and therefore we run the model in a similar way.

For this tutorial we will run matilda with different scenarios:

# locate the ini directory
ini_dir <- paste0(system.file("input", package = "hector"), "/")

# read in ini files into a list
ini_list <- list(ssp119 = paste0(ini_dir, "hector_ssp119.ini"),
                 ssp245 = paste0(ini_dir, "hector_ssp245.ini"),
                 ssp370 = paste0(ini_dir, "hector_ssp370.ini"),
                 ssp585 = paste0(ini_dir, "hector_ssp585.ini"))

By creating a list we can take advantage of lapply() which will loop through each element of the list applying the function we direct it to.

Building Perturbed Parameter Set

Before running the model we will use one of the ini files to initiate a core that will be used to produce parameters for the model.

# Using generate_params() to build the perturbed parameter set

# set seed for replication
set.seed(123)

# set sample size (we will run with a small sample size for now)
n = 10

# initiate a core using the first ini file in ini_list
params_core <- newcore(ini_list[[1]])

# produce parameter set 
params <- generate_params(core = params_core,
                          draws = n)

Running the Model for all Scenarios

We will use lapply() to loop across the elements of ini_list, running the model for each element (scenario). This function will first initiate a core for each ini file in ini_list then will run the model using the params produced above.

# using element names in ini_list, loop through each scenario_name and run the model
model_results <- lapply(names(ini_list), function(scenario_name){
  
  # extract scenario information from each element in the ini_list by 
  # scenario name
  scenario = ini_list[[scenario_name]]
  
  # initiate core for the scenario
  core = newcore(scenario, name = scenario_name)
  
  # run the model 
  result = iterate_model(core = core, 
                         params = params, 
                         save_years = 1800:2100,
                         save_vars = GMST())
  
})

This will take a minute to run, but the result will be a list of model results. This list, called model_results and each element is a matilda result data frame. There will be one elements for each of the scenarios in ini_list.

Score the Model Results

We can use the same method for scoring the results in model_results.

# loop through results in model_results to compute model weights for each 
score_list <- lapply(model_results, function(df){
  
  # use observed temperature to compute model weights
  scores_temp <- score_runs(df, 
                            criterion = criterion_gmst_obs(),
                            score_function = score_bayesian)
  
  # remove any NAs -- this is important to keep the alignment of results and scores correct
  scores_temp = na.omit(scores_temp)
  
})

Here, we are not looping through model_results using names, because 1) the names are NULL and 2) we don’t need to pass names to any pieces of the anonymous function. We only need to tell R to loop through each element in the model_results list to compute scores.

The result is a new list of scores called score_list. Similar to before, this is a list of elements where each element is a data frame that contains scores (or model weights for each data frame in model_results).

At this point we can merge our model weights with our model results. To do this we will use the Map() function. This function is used to apply a function across corresponding elements of multiple lists.

# merge results df in model_results with the corresponding scores in score_list 
# by the run_number column
scored_result_list <- Map(merge, model_results, score_list, by = "run_number")

# bind the elements in scored_result_list to build data frame for plotting
scored_result_df <- do.call(rbind, scored_result_list)

Plot:

ggplot(data = scored_result_df) +
  geom_line(
    aes(x = year, 
        y = value, 
        group = run_number,
        color = weights)) +
  scale_color_continuous() +
  facet_wrap(~scenario)

Calculating Metrics

Before we can calculate metrics, we need to establish our metric of interest.

my_metric <- new_metric(var = GMST(), years = 2090:2100, op = median)

Now we can calculate metrics for each of the data frames in model_results. We will follow the same lapply() method.

# loop through results and compute metrics for each
metric_list <- lapply(model_results, function(df){
  
  # remove any NAs in the current df (this is important for alignment of data frames)
  df_na.rm <- na.omit(df)
  
  # calculate metrics using the metric we defined in the previous step
  metric_calc(df_na.rm, my_metric)
  
})

# merge the metrics_list dfs with score_list 
scored_metric_list <- Map(merge, metric_list, score_list, by = "run_number")

The final step above merges the metric_list elements with the score_list elements. This will make computing the probabilities much easier.

Computing Probabilities with Weights and Metrics

Now that we have everything we need to compute probabilities of warming we can use prob_calc() to compute those values for each scenario.

# Define metric ranges
temp_ranges <- c(1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, Inf)

# compute probabilities for each element in metric_list
probability_result <- lapply(scored_metric_list, function(df){
  
  # use prob_calc to compute probabilities
  prob_calc(metrics = df$metric_result,
            bins = temp_ranges,
            scores = df$weights)
  
})

We also want to add scenario names to the data frames in probability_result. This does not yet exist, but will be helpful when plotting. To complete this step we can take advantage of Map() once again.

# scenario names
scenario_id <- c("ssp119", "ssp245", "ssp370", "ssp585")

# Use Map to add scenario names to metric_list()
probability_result <- Map(function(df, scenario_id){

  # add scenario column and add corresponding scenario id in the scenario_id vector
  df$scenario <- scenario_id
  
  return(df)
  
}, probability_result, scenario_id)

Here we are defining an anonymous function to add the scenario column to each data frame in probability_result list. Finally, we can bind the probability_result list into a data frame.

# bind results to data frame
probability_result_df <- do.call(rbind, probability_result)

With the resulting data frame we can compares probabilities of warming across scenarios or plot. stacked bar graphs as is presented in the matilda v1.0 paper (Brown et al. 2023).

In closing it is important to consider that this workflow not only works to simplify running matilda functions across multiple scenarios, but can also be applied to runs matilda functions with different perturbed parameter sets, or even a combination of them both.

Running Matilda with apply() Functions

Joe Brown

2024-03-13