First we need to install Matilda from source the repo lives at
JGCRI/matilda.
library(remotes)
# installing matilda
install_github("jgcri/matilda")
## Skipping install of 'matilda' from a github remote, the SHA1 (308b2cc4) has not changed since last install.
## Use `force = TRUE` to force installation
library(matilda)
## Loading required package: hector
Matilda was built to do 3 critical things at its core:
In Matilda, model parameters exist as distirbutions based on information gathered from the mean, SD, and distirbution shape defined in the literature. Most distributions are normal or lognormal. The distributions can be found in the original documentation for Matilda, Brown et al. 2024.
Most functions to begin a Matilda analysis reqiuires a Hector core:
ini <- system.file("input/hector_ssp245.ini", package = "hector")
hcore <- newcore(ini, name = 'ssp245')
The Hector core contains the parameters and the initial values of the
parameters (means). The distributions of those parameters are programmed
into Matilda. We can use the generate_params() functions to
create a new set or parameters:
# generate a data frame of parameters
parameter_values <- generate_params(hcore, draws = 10)
head(parameter_values)
## BETA Q10_RH NPP_FLUX0 AERO_SCALE DIFFUSIVITY ECS
## 1 0.5628201 4.9457858 59.30919 0.6014690 1.1101596 2.593385
## 2 0.7126109 2.9332676 54.56850 0.6914578 1.0845902 2.934814
## 3 0.6572886 0.2481591 39.51428 1.3411496 1.0015714 3.309064
## 4 0.6495781 0.8674813 70.90611 1.2718645 1.0618324 3.202467
## 5 0.5657010 1.5865923 59.15299 1.0384514 0.9045734 2.034566
## 6 0.7668448 2.8802898 44.88327 1.3177796 0.9701541 3.552721
In this output we can see that each of the parameters in the model from row one to row six, has a slightly different configuration. This is how we implement uncertainty into Hector.
Sometimes we want to vary all the parameter in the way that we did above. This gives us a full spectrum oof uncertainty. However, this can get complex rather quickly. When we vary every single parameter and run the model with all parameter sets, we can’t tell exactly which parameter is contributing to the variation in model outputs the most, and that can be important information.
For example, sometimes we want to run what is called a sensitivity
test. This is a test to determine how sensitive a model is to variations
in a single parameter or to a combination of parameters. In other words,
how much of spread in model outputs is related directly to the
ECS parameter, or the BETA parameter, or a few
parameters that interact closely, like BETA,
Q_10, and NPP. In cases where we want to
quantify the effect of a single parameter on the model outputs, we will
want to see how the model behaves to varying a single parameter.
We can do this simply by dropping all other columns in our
parameter_values dataframe:
beta_values <- parameter_values[, "BETA", drop = F]
There are several ways to do this exact same parameter isolation step, here is another simpler method:
beta_values_alt <- parameter_values["BETA"]
Do you remember what the [] are telling R to do in
this type of code?
Here is another intuitive method that uses functions in the
tidyverse family of packages, which is great for data
organization, I encourage you to learn some of the things you can do
with dplyr and tidyr:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
beta_values_alt2 <-
parameter_values %>%
select(BETA)
You should notice these all result in the exact same output!
We only need one of these subsets for this exercise, so I will take out the extras to keep the environment clean:
# use rm() to remove elements from you environment as needed.
# usually best to perform this in the console if it is not part of the analysis.
rm(beta_values_alt, beta_values_alt2)
The function iterate_model() is used to run Hector with
each of the parameter ensembles we have created in the
parameter_values df. Just like when we run Hector, we will
also need to supply our Hector core:
# running hector using all parameters
model <- iterate_model(
core = hcore,
params = parameter_values,
save_years = 1850:2100,
save_vars = c(GMST(), CONCENTRATIONS_CO2())
)
These results can now be plotted with ggplot2:
library(ggplot2)
plot <- ggplot(data = model) +
geom_line(
aes(
x = year,
y = value, group = run_number)) +
facet_wrap(~variable, scales = "free_y") +
labs(x = "Years", y = "Values") +
theme_light()
plot
This is what the results look like if we were to vary all the parameters
in the model. But what if we only wanted to vary
BETA? We
can do this with using iterate_model with our
beta_values dataframe:
model2 <- iterate_model(hcore,
params = beta_values,
save_years = 1850:2100,
save_vars = c(GMST(), CONCENTRATIONS_CO2())) # what this do?
Now we can plot this result. How does it compare to the output where we vary all the parameters?
plot2 <- ggplot(data = model2) +
geom_line(
aes(
x = year,
y = value, group = run_number)) +
facet_wrap(~variable, scales = "free_y") +
labs(x = "Years", y = "Values") +
theme_light()
plot2
What does this look like if we want to get a better understanding of
how different BETA values produce different model results?
Similar to the Hector
tutorial on senstivity analysis, we can color. each model run based
on the BETA value used to run the model.
First we have to edit the data a bit. We need to add a
run_number column to the parameter set, then we can use it
as a key to join the BETA values to the model outputs:
# Add run_number to parameter set
beta_values_rn <- beta_values %>%
mutate(run_number = row_number())
# Use run_number to add correct BETA value to each model run
model2_plot <-
model2 %>%
left_join(beta_values_rn, by = "run_number")
Now we can plot:
plot3 <- ggplot(data = model2_plot) +
geom_line(
aes(
x = year,
y = value,
group = run_number,
color = BETA)) +
scale_color_viridis_c(name = "BETA", option = "viridis") +
facet_wrap(~variable, scales = "free_y") +
labs(x = "Years", y = "Values") +
theme_light()
plot3
In the full-parameter ensemble, many parameters vary at the same time, so the model spread reflects multiple sources of uncertainty (plot 1). In the BETA-only ensemble, only the CO2 fertilization parameter varies, so the spread of model runs can be interpreted as the model response to uncertainty in the CO2 fertilization parameter.
By coloring each model run by its BETA value, we can see whether and how higher/lower CO2 fertilization values produce different atmos CO2 and GMST outcomes. This is the first step toward asking how uncertainty in a biological parameter propagates into some climate-relevant model outputs.
In the next phase, you will construct code similar to above in order
to run Hector-Matilda, varying BETA values on the SSP2-4.5 climate
scenario. You will pull data from 1850-2100 for the variables
VEG_C (vegetative carbon stock, how much total carbon is in
vegetation), SOIL_C (soil carbon stock, how much total
carbon is in soil), NPP (Net primary production, rate at
which new carbon is added to vegetation), NBP (net biome
production; total carbon flux into or out of the terrestrial pool),
RH (heterotrophic respiration; the rate at which microbes
are breaking down carbon and emitting it back into the
atmosphere),CONCENTRATIONS_CO2 (atmospheric CO2
concentration), and GMST (global mean surface
temperature).
You will create a plot for each variable. For now, our goal is to run a small BETA-only Hector-Matilda experiment and learn how to visualize and interpret the outputs. We are not trying to draw final conclusions yet. We are building the workflow that will let us ask which model outputs are most sensitive to variations in the CO2 fertilization factor (BETA).