Once you have produced a result from matilda
there are a
number of useful ways to visualize the data. We will cover some of those
in a series of tutorials to help you plot Matilda results,
including:
Spaghetti plot - these are plots aimed to visualize the trajectories of all model iterations and the weights associated with each.
Probability bar plot - these are stacked bar
plots colored by variable ranges (e.g., temperature) used to compute
probabilities in the matilda
workflow.
Probability with range distinction - these are similar to stacked bar plots but in the form of a probability distribution rather than a stacked bar.
Median value with confidence ribbon - classic plotting technique for climate model projections.
This is Part 1 of the tutorial series and will cover the Spaghetti Plot!
library(matilda)
library(ggplot2)
If you are plotting in a session different from the original analysis, ensure you have the data needed to complete the plots.
For plot type 1 (spaghetti plot) we need at a minimum the results
produced after running iterate_model
as well as the
associated weights for each run.
The data frame should look something like this:
## run_number scenario year variable value units weights
## 1 1 ssp119 1800 gmst -0.05391733 degC 0.01581314
## 2 1 ssp119 1801 gmst -0.04609289 degC 0.01581314
## 3 1 ssp119 1802 gmst -0.03992357 degC 0.01581314
## 4 1 ssp119 1803 gmst -0.03456811 degC 0.01581314
## 5 1 ssp119 1804 gmst -0.02978247 degC 0.01581314
## 6 1 ssp119 1805 gmst -0.02637711 degC 0.01581314
## 7 1 ssp119 1806 gmst -0.02310283 degC 0.01581314
## 8 1 ssp119 1807 gmst -0.02038021 degC 0.01581314
## 9 1 ssp119 1808 gmst -0.01854424 degC 0.01581314
## 10 1 ssp119 1809 gmst -0.29648851 degC 0.01581314
All plotting will be completed using ggplot2
. If you are
unfamiliar with ggplot2
there are several well-written
tutorials online.
We will build this plot over a series of steps. First plotting the data without an color distinction, then adding color and finally overlaying observed data.
The final plot will present data as a line graph grouped by
run_number
with color and alpha (transparency) level coded
by the weight. This will give a continuous color an transparency
gradient across model weights. Member of the ensemble of model runs that
are more aligned with the observed data will appear darker and more
opaque than ensembles members that diverge significantly from the
observed data.
In this plot the x-axis will represent the year
and the
y-axis will represent the values of the variable we want to plot. In
this example we will plot values for gmst
or global mean
surface temperature anomaly as our variable of interest. This is
currently the only variable in the results data frame. If you have other
variables in the results data frame you will need to subset the data
frame to include the information you want to plot.
gmst_spaghetti_plot <-
ggplot(data = plot_1_df) + # sets the plotting environment
geom_line( # identifies plot type
aes(
x = year, # identifies x-axis variable
y = value, # identifies y-axis variable
group = run_number), # identifies how data should be grouped
linewidth = 1) + # sets line width - personal preference
facet_wrap(~ scenario) + # panels the figure by scenario name
labs(x = "Years", y = "Temperature Anomaly (C)") + # adds labels to axes
ggtitle(label = "Matilda Ensemble weighted by historical temperature") +
theme_light() # sets a general theme - personal preference
gmst_spaghetti_plot
Now we can add the color distinction to represent ensemble weights:
gmst_spaghetti_plot <-
ggplot(data = plot_1_df) + # sets the plotting environment
geom_line( # identifies plot type
aes(
x = year, # identifies x-axis variable
y = value, # identifies y-axis variable
group = run_number, # identifies how data should be grouped
color = weights, # how color should be applied
alpha = weights), # how transparency should be applied
linewidth = 1) + # sets line width - personal preference
scale_color_gradient(low = "skyblue", high = "dodgerblue4", name = "Weights") + # what colors should be used
scale_alpha_continuous(range(c(0.8, 1))) + # how extreme should the transparency gradient be - this is optional
guides(alpha = "none") + # remove alpha legend
facet_wrap(~ scenario) + # panels the figure by scenario name
labs(x = "Years", y = "Temperature Anomaly (C)") + # adds labels to axes
ggtitle(label = "Matilda Ensemble weighted by historical temperature") +
theme_light() # sets a general theme - personal preference
gmst_spaghetti_plot
Assigning the colors used is partially a personal preference but a
spectrum of a single color works best for me. Setting low
and high
colors in the gives the color gradient direction
of how high and low values should be colored.
Transparency (using a continuous alpha
range) is an
optional setting and can be adjusted or removed based on personal
preference. I use this to increase the contrast of models that have high
weights vs. models that have lower weights.
These plots are significantly more clear (and impressive with higher numbers of runs).
We can add a line to this plot to represent the observed data used to weight the ensemble. In this case we will add the observed temperature from 1950-2023.
# create data frame with observed data using inbuilt criterion
obs_data <- data.frame (
year = criterion_gmst_obs()$year,
obs_values = criterion_gmst_obs()$obs_values
)
head(obs_data)
## year obs_values
## 1 1950 -0.07349151
## 2 1951 0.09197631
## 3 1952 0.16848484
## 4 1953 0.23076102
## 5 1954 0.03638007
## 6 1955 -0.04417965
Add observed data to gmst_spaghetti_plot
:
gmst_spghetti_with_obs <-
gmst_spaghetti_plot +
geom_line(
data = obs_data,
aes(
x = year,
y = obs_values),
linewidth = 0.5,
color = "red")
gmst_spghetti_with_obs
You can see from the final plot that Matilda ensemble members that follow the observed data more closely are weight higher than those that diverge more significantly. This relationship becomes more apparent as the number of runs increases.