Once you have produced a result from matilda
there are a
number of useful ways to visualize the data. We will cover some of those
in a series of tutorials to help you plot Matilda results,
including:
Spaghetti plot - these are plots aimed to visualize the trajectories of all model iterations and the weights associated with each.
Probability bar plot - these are stacked bar
plots colored by variable ranges (e.g., temperature) used to compute
probabilities in the matilda
workflow.
Probability with range distinction - these are similar to stacked bar plots but in the form of a probability distribution rather than a stacked bar.
Median value with confidence ribbon - classic plotting technique for climate model projections.
This is Part 3 of the tutorial series and will cover the Probability Distribution Plot with range distinction (aka weighted probability distribution)!
library(matilda)
library(ggplot2)
If you are plotting in a session different from the original analysis, ensure you have the data needed to complete the plots.
For a probability distribution plot we need the weighted metrics for
each runs, the result after running metric_calc
and merging
with model likelihood weights. As a review, metric_calc
uses a user defined metric definition to compute and extract metrics
from the model results. For example, we can compute the median end of
century global mean surface temperature anomaly relative to a past
reference period.
After merging with the model weights and adding scenario name the data should look something like this:
## run_number metric_result scenario weights
## 1 1 0.7715469 ssp119 1.581314e-02
## 2 2 0.8429721 ssp119 1.785500e-02
## 3 3 1.2356367 ssp119 2.103415e-02
## 4 4 1.1842129 ssp119 1.923625e-02
## 5 5 0.7671621 ssp119 1.492278e-02
## 6 6 0.9800407 ssp119 5.276083e-03
## 7 7 0.9400672 ssp119 3.778613e-03
## 8 8 0.7499700 ssp119 6.111274e-03
## 9 9 0.9100293 ssp119 1.711479e-05
## 10 10 1.0911564 ssp119 6.053824e-07
A probability distribution is created by plotting continuous data on the x-axis (in our case warming values) and estimates the density at a specific point along the temperature range. The kernel density value is used to represent the estimated probability density of the data at each point along the warming (x-axis) range in our data. The kernel density is a useful tool for visualizing the shape of data distribution – this is what we are using it for!
We can use R functions to compute the kernel density values for our data, and will use this to plot the probability distribution.
First we will split our metric data frame into a list. The list will contain four data frames, one for each of our scenarios:
I do not include this here, but you may also want to check and make sure your scenario column is a set of factors.
# Split data into a list by scenario factor levels
scenario_list <- split(plot_3_df, plot_3_df$scenario)
Now that our data is stored in a list we can take advantage of
apply()
functions to compute kernel density values in each
data frame stored in scenario_list
. The result of the
density
function is itself a list where the contents x and
y are the coordinates we want to plot. Therefor the apply
loop we write will include the extraction and storage of those
variables:
# use lapply to loop across data frames in scenario_list to compute KDE values for each
kde_values <- lapply(names(scenario_list), function(df_name){
# store data based on name of the current df
# this way we can add a scenario column to the final results
df <- scenario_list[[df_name]]
# use that df to compute density estimates weight by `weights`
density_values <- density(df$metric_result, weights = df$weights)
# build data frame with coordinates we want to plot
density_estimate <- data.frame(scenario = df_name,
value = density_values$x,
density = density_values$y)
return(density_estimate)
})
## Warning in density.default(df$metric_result, weights = df$weights): Selecting
## bandwidth *not* using 'weights'
## Warning in density.default(df$metric_result, weights = df$weights): Selecting
## bandwidth *not* using 'weights'
## Warning in density.default(df$metric_result, weights = df$weights): Selecting
## bandwidth *not* using 'weights'
## Warning in density.default(df$metric_result, weights = df$weights): Selecting
## bandwidth *not* using 'weights'
# bind the rows of kde_values to create a data frame
kde_df <- do.call(rbind, kde_values)
With our data prepped, we can move on to plotting.
All plotting will be completed using ggplot2
. If you are
unfamiliar with ggplot2
there are several well-written
tutorials online.
The final plot from this tutorial will be probability distribution where each curve band represents the temperature probabilities for each SSP scenario we used to run our model. There are a number of freedoms that can be taken with respect to design. Here, we use a palette that pairs with 9 temperature ranges from 0 - >4.5. In this figure, the probability density curve will be divided into temperature bands colored with this palette.
Color palette:
library(scales)
temp_cols <- c("navyblue","#2166AC","#4393C3","#D1E5f0","#FDDBC7","#F4A582","#D6604D","#B2182B","#67001F")
show_col(temp_cols)
As previously mentioned, the x-axis in these plots will represent the range of possible warming outcomes and the y-axis will be density values.
# Define breaks and colors to correspond
breaks <- c(0, 1, 1.5, 2, 2.5, 3.0, 3.5, 4.0, 4.5, Inf) # define breaks for values in kde_df (kde_df$value)
# what colors we will assign to each break
colors <- c(
"navyblue",
"#2166AC",
"#4393C3",
"#D1E5f0",
"#FDDBC7",
"#F4A582",
"#D6604D",
"#B2182B",
"#67001F")
# names of the color breaks -- will be used to label the legend in the plot
legend_labels <- c("0 to 1.0 C",
"1.0 to 1.5 C",
"1.5 to 2.0 C",
"2.0 to 2.5 C",
"2.5 to 3.0 C",
"3.0 to 3.5 C",
"3.5 to 4.0 C",
"4.0 to 4.5 C",
"4.5+ C")
# Create intervals and assign colors
kde_df$interval <- cut(kde_df$value, # which column to cut with the breaks
breaks = breaks, # the breaks object
labels = colors, # use the colors object here
include.lowest = TRUE) # should the lowest value in the data be included?
# Plot using ggplot
warming_probability <- # name the plot
ggplot(kde_df, # specify the data frame containing the data
aes(x = value, # x-axis variable
y = density, # y-axis variable
fill = interval)) + # how to color fill the plot
geom_area() + # specifying the plot type - will fill area with 'fill'
scale_fill_manual(values = colors, # manually assign fill colors to match the palette we want
labels = legend_labels, # labels to use in the legend
name = "Warming") + # name of the legend
scale_x_continuous(breaks = seq(0, 6, by = 0.5), # set the scale and labels of the x-axis
limits = c(0,6), # set x-axis limits
expand = c(0, 0.1)) + # limit white space
labs(title = "Estimated probability of warming",
x = "Temperature (C)",
y = "Density") +
theme_light() + # set theme - personal preference
facet_wrap(~scenario) # facet by scenario name
warming_probability
## Warning: Removed 93 rows containing non-finite outside the scale range
## (`stat_align()`).