Background

Once you have produced a result from matilda there are a number of useful ways to visualize the data. We will cover some of those in a series of tutorials to help you plot Matilda results, including:

  1. Spaghetti plot - these are plots aimed to visualize the trajectories of all model iterations and the weights associated with each.

  2. Probability bar plot - these are stacked bar plots colored by variable ranges (e.g., temperature) used to compute probabilities in the matilda workflow.

  3. Probability with range distinction - these are similar to stacked bar plots but in the form of a probability distribution rather than a stacked bar.

  4. Median value with confidence ribbon - classic plotting technique for climate model projections.

This is Part 1 of the tutorial series and will cover the Spaghetti Plot!

library(matilda)
library(ggplot2)

Spaghetti plot

The goal of this plot is to visualize all the results and provide some context as to why certain results have higher weights than others.

Data

If you are plotting in a session different from the original analysis, ensure you have the data needed to complete the plots.

For plot type 1 (spaghetti plot) we need at a minimum the results produced after running iterate_model as well as the associated weights for each run.

The data frame should look something like this:

##    run_number scenario year variable       value units    weights
## 1           1   ssp119 1800     gmst -0.05391733  degC 0.01581314
## 2           1   ssp119 1801     gmst -0.04609289  degC 0.01581314
## 3           1   ssp119 1802     gmst -0.03992357  degC 0.01581314
## 4           1   ssp119 1803     gmst -0.03456811  degC 0.01581314
## 5           1   ssp119 1804     gmst -0.02978247  degC 0.01581314
## 6           1   ssp119 1805     gmst -0.02637711  degC 0.01581314
## 7           1   ssp119 1806     gmst -0.02310283  degC 0.01581314
## 8           1   ssp119 1807     gmst -0.02038021  degC 0.01581314
## 9           1   ssp119 1808     gmst -0.01854424  degC 0.01581314
## 10          1   ssp119 1809     gmst -0.29648851  degC 0.01581314

Plotting

All plotting will be completed using ggplot2. If you are unfamiliar with ggplot2 there are several well-written tutorials online.

We will build this plot over a series of steps. First plotting the data without an color distinction, then adding color and finally overlaying observed data.

The final plot will present data as a line graph grouped by run_number with color and alpha (transparency) level coded by the weight. This will give a continuous color an transparency gradient across model weights. Member of the ensemble of model runs that are more aligned with the observed data will appear darker and more opaque than ensembles members that diverge significantly from the observed data.

In this plot the x-axis will represent the year and the y-axis will represent the values of the variable we want to plot. In this example we will plot values for gmst or global mean surface temperature anomaly as our variable of interest. This is currently the only variable in the results data frame. If you have other variables in the results data frame you will need to subset the data frame to include the information you want to plot.

gmst_spaghetti_plot <- 
  ggplot(data = plot_1_df) + # sets the plotting environment
  geom_line( # identifies plot type
    aes(
      x = year, # identifies x-axis variable
      y = value, # identifies y-axis variable
      group = run_number), # identifies how data should be grouped 
    linewidth = 1) + # sets line width - personal preference
  facet_wrap(~ scenario) + # panels the figure by scenario name
  labs(x = "Years", y = "Temperature Anomaly (C)") + # adds labels to axes
  ggtitle(label = "Matilda Ensemble weighted by historical temperature") +
  theme_light() # sets a general theme - personal preference

gmst_spaghetti_plot

Now we can add the color distinction to represent ensemble weights:

gmst_spaghetti_plot <- 
  ggplot(data = plot_1_df) + # sets the plotting environment
  geom_line( # identifies plot type
    aes(
      x = year, # identifies x-axis variable
      y = value, # identifies y-axis variable
      group = run_number, # identifies how data should be grouped
      color = weights, # how color should be applied
      alpha = weights), # how transparency should be applied
    linewidth = 1) + # sets line width - personal preference
  scale_color_gradient(low = "skyblue", high = "dodgerblue4", name = "Weights") + # what colors should be used
  scale_alpha_continuous(range(c(0.8, 1))) + # how extreme should the transparency gradient be - this is optional
  guides(alpha = "none") + # remove alpha legend 
  facet_wrap(~ scenario) + # panels the figure by scenario name
  labs(x = "Years", y = "Temperature Anomaly (C)") + # adds labels to axes
  ggtitle(label = "Matilda Ensemble weighted by historical temperature") +
  theme_light() # sets a general theme - personal preference 

gmst_spaghetti_plot

Assigning the colors used is partially a personal preference but a spectrum of a single color works best for me. Setting low and high colors in the gives the color gradient direction of how high and low values should be colored.

Transparency (using a continuous alpha range) is an optional setting and can be adjusted or removed based on personal preference. I use this to increase the contrast of models that have high weights vs. models that have lower weights.

These plots are significantly more clear (and impressive with higher numbers of runs).

We can add a line to this plot to represent the observed data used to weight the ensemble. In this case we will add the observed temperature from 1950-2023.

# create data frame with observed data using inbuilt criterion
obs_data <- data.frame (
  year = criterion_gmst_obs()$year,
  obs_values = criterion_gmst_obs()$obs_values
)

head(obs_data)
##   year  obs_values
## 1 1950 -0.07349151
## 2 1951  0.09197631
## 3 1952  0.16848484
## 4 1953  0.23076102
## 5 1954  0.03638007
## 6 1955 -0.04417965

Add observed data to gmst_spaghetti_plot:

gmst_spghetti_with_obs <- 
  gmst_spaghetti_plot +
  geom_line(
    data = obs_data, 
    aes(
      x = year, 
      y = obs_values), 
    linewidth = 0.5, 
    color = "red")

gmst_spghetti_with_obs

You can see from the final plot that Matilda ensemble members that follow the observed data more closely are weight higher than those that diverge more significantly. This relationship becomes more apparent as the number of runs increases.