Installing Matilda

First we need to install Matilda from source the repo lives at JGCRI/matilda.

library(remotes)

# installing matilda
install_github("jgcri/matilda")
## Skipping install of 'matilda' from a github remote, the SHA1 (308b2cc4) has not changed since last install.
##   Use `force = TRUE` to force installation
library(matilda)
## Loading required package: hector

What is the purpose of Matilda

Matilda was built to do 3 critical things at its core:

  1. Produce ensembles of model parameter values based on a prior probability distribution.
  2. Use the parameter ensembles to run hector iteratively with each parameter ensemble. This results in what we call a peturbed parameter ensemble or (PPE).
  3. Analyze each model run against known (observed) climate responses to identify model results that are most likely based on a historical context.

How do we build a set of parameters?

In Matilda, model parameters exist as distirbutions based on information gathered from the mean, SD, and distirbution shape defined in the literature. Most distributions are normal or lognormal. The distributions can be found in the original documentation for Matilda, Brown et al. 2024.

Most functions to begin a Matilda analysis reqiuires a Hector core:

ini <- system.file("input/hector_ssp245.ini", package = "hector")
hcore <- newcore(ini, name = 'ssp245')

The Hector core contains the parameters and the initial values of the parameters (means). The distributions of those parameters are programmed into Matilda. We can use the generate_params() functions to create a new set or parameters:

# generate a data frame of parameters
parameter_values <- generate_params(hcore, draws = 10)
head(parameter_values)
##        BETA    Q10_RH NPP_FLUX0 AERO_SCALE DIFFUSIVITY      ECS
## 1 0.5628201 4.9457858  59.30919  0.6014690   1.1101596 2.593385
## 2 0.7126109 2.9332676  54.56850  0.6914578   1.0845902 2.934814
## 3 0.6572886 0.2481591  39.51428  1.3411496   1.0015714 3.309064
## 4 0.6495781 0.8674813  70.90611  1.2718645   1.0618324 3.202467
## 5 0.5657010 1.5865923  59.15299  1.0384514   0.9045734 2.034566
## 6 0.7668448 2.8802898  44.88327  1.3177796   0.9701541 3.552721

In this output we can see that each of the parameters in the model from row one to row six, has a slightly different configuration. This is how we implement uncertainty into Hector.

Sometimes we want to vary all the parameter in the way that we did above. This gives us a full spectrum oof uncertainty. However, this can get complex rather quickly. When we vary every single parameter and run the model with all parameter sets, we can’t tell exactly which parameter is contributing to the variation in model outputs the most, and that can be important information.

For example, sometimes we want to run what is called a sensitivity test. This is a test to determine how sensitive a model is to variations in a single parameter or to a combination of parameters. In other words, how much of spread in model outputs is related directly to the ECS parameter, or the BETA parameter, or a few parameters that interact closely, like BETA, Q_10, and NPP. In cases where we want to quantify the effect of a single parameter on the model outputs, we will want to see how the model behaves to varying a single parameter.

We can do this simply by dropping all other columns in our parameter_values dataframe:

beta_values <- parameter_values[, "BETA", drop = F]

There are several ways to do this exact same parameter isolation step, here is another simpler method:

beta_values_alt <- parameter_values["BETA"]

Do you remember what the [] are telling R to do in this type of code?

Here is another intuitive method that uses functions in the tidyverse family of packages, which is great for data organization, I encourage you to learn some of the things you can do with dplyr and tidyr:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
beta_values_alt2 <- 
  parameter_values %>% 
  select(BETA)

You should notice these all result in the exact same output!

We only need one of these subsets for this exercise, so I will take out the extras to keep the environment clean:

# use rm() to remove elements from you environment as needed.
# usually best to perform this in the console if it is not part of the analysis.
rm(beta_values_alt, beta_values_alt2)

Running Hector iterations

The function iterate_model() is used to run Hector with each of the parameter ensembles we have created in the parameter_values df. Just like when we run Hector, we will also need to supply our Hector core:

# running hector using all parameters 
model <- iterate_model(
  core = hcore, 
  params = parameter_values,
  save_years = 1850:2100, 
  save_vars = c(GMST(), CONCENTRATIONS_CO2())
)

These results can now be plotted with ggplot2:

library(ggplot2)
plot <- ggplot(data = model) +
  geom_line(
    aes(
      x = year, 
      y = value, group = run_number)) +
  facet_wrap(~variable, scales = "free_y") +
  labs(x = "Years", y = "Values") +
  theme_light()

plot

This is what the results look like if we were to vary all the parameters in the model. But what if we only wanted to vary BETA? We can do this with using iterate_model with our beta_values dataframe:

model2 <- iterate_model(hcore, 
                        params = beta_values, 
                        save_years = 1850:2100, 
                        save_vars = c(GMST(), CONCENTRATIONS_CO2())) # what this do?

Now we can plot this result. How does it compare to the output where we vary all the parameters?

plot2 <- ggplot(data = model2) +
  geom_line(
    aes(
      x = year, 
      y = value, group = run_number)) +
  facet_wrap(~variable, scales = "free_y") +
  labs(x = "Years", y = "Values") +
  theme_light()

plot2

What does this look like if we want to get a better understanding of how different BETA values produce different model results? Similar to the Hector tutorial on senstivity analysis, we can color. each model run based on the BETA value used to run the model.

First we have to edit the data a bit. We need to add a run_number column to the parameter set, then we can use it as a key to join the BETA values to the model outputs:

# Add run_number to parameter set
beta_values_rn <- beta_values %>% 
  mutate(run_number = row_number())

# Use run_number to add correct BETA value to each model run
model2_plot <- 
  model2 %>% 
  left_join(beta_values_rn, by = "run_number")

Now we can plot:

plot3 <- ggplot(data = model2_plot) +
  geom_line(
    aes(
      x = year, 
      y = value, 
      group = run_number, 
      color = BETA)) +
  scale_color_viridis_c(name = "BETA", option = "viridis") +
  facet_wrap(~variable, scales = "free_y") +
  labs(x = "Years", y = "Values") +
  theme_light()

plot3

What did we learn from above?

In the full-parameter ensemble, many parameters vary at the same time, so the model spread reflects multiple sources of uncertainty (plot 1). In the BETA-only ensemble, only the CO2 fertilization parameter varies, so the spread of model runs can be interpreted as the model response to uncertainty in the CO2 fertilization parameter.

By coloring each model run by its BETA value, we can see whether and how higher/lower CO2 fertilization values produce different atmos CO2 and GMST outcomes. This is the first step toward asking how uncertainty in a biological parameter propagates into some climate-relevant model outputs.

What next?

In the next phase, you will construct code similar to above in order to run Hector-Matilda, varying BETA values on the SSP2-4.5 climate scenario. You will pull data from 1850-2100 for the variables VEG_C (vegetative carbon stock, how much total carbon is in vegetation), SOIL_C (soil carbon stock, how much total carbon is in soil), NPP (Net primary production, rate at which new carbon is added to vegetation), NBP (net biome production; total carbon flux into or out of the terrestrial pool), RH (heterotrophic respiration; the rate at which microbes are breaking down carbon and emitting it back into the atmosphere),CONCENTRATIONS_CO2 (atmospheric CO2 concentration), and GMST (global mean surface temperature).

You will create a plot for each variable. For now, our goal is to run a small BETA-only Hector-Matilda experiment and learn how to visualize and interpret the outputs. We are not trying to draw final conclusions yet. We are building the workflow that will let us ask which model outputs are most sensitive to variations in the CO2 fertilization factor (BETA).