Criterion Sensitivity Testing

library(tidyverse)
library(matilda)

gcp_data <- read.csv("data/gcp_data.csv", stringsAsFactors = F)

ocean_uptake_df <- gcp_data %>% 
  select(Year, ocean_sink) %>% 
  rename(ocean_uptake = ocean_sink,
         year = Year) %>%
  filter(year > 1799) %>% 
  na.omit()

criterion_ocean_uptake <- new_criterion(var = OCEAN_UPTAKE(), 
                                        years = ocean_uptake_df$year, 
                                        obs_values = ocean_uptake_df$ocean_uptake)

Background

When scoring models using historcial data it may be important to understand the effect each criterion has on final weights of model iterations, as the weights will eventually affect the final distribution of the variable of interest.

One way to assess the affect of different criterion on model weights is a sensitivity test whereby we use different scoring mechanisms and compare the final results among them.

In matilda there is a way we can build a sensitivity test using multi_criterion_weighting. This function takes a score list of different criterion (e.g., criterion_co2_obs() and criterion_gmst_obs()) and produces a combined score. The primary utility of this function is we can use multiple line of evidence to weight models. But secondarily, we can use this function to conduct sensitivity tests. If we think of all our separate criterion as dials, we can change the volume of each criterion as we compute combined multi-criterion score. By default, all the dials are balanced. But, if we want temperature to be louder in the final multi-criterion scores, we can turn it up, and turn CO2 down.

Data

The data for this tutorial should be an unweighted model result. For this criterion, we are using a data frame with only one scenario. However, if you are analyzing multiple scenarios (a list of unweighted model results) this tutorial is transferable to an lapply workflow.

We will weight this result with different criterion and show how scores change as we adjust the “criterion dials” with multi-criterion weighting.

The data we are using looks like this:

##   scenario year variable       value units run_number
## 1   ssp245 1800     gmst -0.05391733  degC          1
## 2   ssp245 1801     gmst -0.04609289  degC          1
## 3   ssp245 1802     gmst -0.03992357  degC          1
## 4   ssp245 1803     gmst -0.03456811  degC          1
## 5   ssp245 1804     gmst -0.02978247  degC          1
## 6   ssp245 1805     gmst -0.02637711  degC          1

Scoring

If we score this result using observed temperature (criterion_gmst_obs()), like this:

# Compute model scores based on observed temperature
temp_scores <-
  score_runs(model_result_df, # model result as a data frame
             criterion_gmst_obs(), # criterion used in scoring
             score_bayesian) # score function we want to use

We get this result:

# Print first five lines of the data frame
head(temp_scores)

##        weights run_number
## 1 1.578293e-02          1
## 2 5.270561e-03          2
## 3 6.127983e-06          3
## 4 1.027169e-06          4
## 5 2.881162e-02          5
## 6 2.153625e-02          6

If we score the result using observed CO2 concentrations (criterion_co2_obs()), like this

# Compute model scores based on observed CO2 concentrations
co2_scores <- score_runs(model_result_df, 
                         criterion_co2_obs(),
                         score_bayesian)

We get this result:

# Print first five lines of the data frame
head(co2_scores)

##        weights run_number
## 1 1.326210e-03          1
## 2 1.268191e-02          2
## 3 4.241526e-03          3
## 4 1.289947e-06          4
## 5 1.845426e-03          5
## 6 4.090668e-03          6

Using `multi_criteria_weighting`

Now that we have scores, we can play with the criterion dials.

The multi_criteria_weighting function take two arguments:

score_list: a list of the score data frames - these are the two data frames we previously created that score using observed temperature and CO2. The length of this list has not limit.
criterion_weights: a list of weights applied to each scored data frame - this list should add to 1.0 and can be thought of as a proportion of a total. The list corresponds to the order of the score list. In other words, df1 in the score list corresponds to value1 in criterion_weights vector list.

We can test the logic of this function.

Based on how the function works, if we turn the “temperature dial” up to 100% and the co2 dial down to 0%. the scores produced should equal what we have already calculated using observed temperature.

# list of scored data frames 
scores <- list(temp_scores, co2_scores)

# running multi_criteria_weighting function
mc_weights <- multi_criteria_weighting(scores_list = scores, criterion_weights = c(1.0, 0.0))

# View first five lines of the mc_weights result 
head(mc_weights)

##   run_number    mc_weight
## 1          1 1.578293e-02
## 2          2 5.270561e-03
## 3          3 6.127983e-06
## 4          4 1.027169e-06
## 5          5 2.881162e-02
## 6          6 2.153625e-02

# View first five lines of the temp_scores result 
head(temp_scores)

##        weights run_number
## 1 1.578293e-02          1
## 2 5.270561e-03          2
## 3 6.127983e-06          3
## 4 1.027169e-06          4
## 5 2.881162e-02          5
## 6 2.153625e-02          6

We could do the same thing by turning the “CO2 dial” up and temperature down.

If we compute multi_criteria_weights with a balance (default). We should get a compromised score.

Writing as Function

In this example I add a new scoring criterion - ocean_c_uptake.

We can build our own function to complete sensitivity tests:

score_sensitivity <- function(result_df, criterion_weights = NULL, mc_weights_name = "mc_weights"){
  
  # Scoring with temperature
  scores_temp = score_runs(result_df,
                           criterion_gmst_obs(),
                           score_bayesian)
  
  # Scoring with co2 
  scores_co2 = score_runs(result_df,
                          criterion_co2_obs(), 
                          score_bayesian)
  
  # Scoring with ocean_c_uptake
  scores_ocean_c = score_runs(result_df,
                              criterion_ocean_uptake,
                              score_bayesian)
  
  # creating score_list
  score_list = list(scores_temp, scores_co2, scores_ocean_c)
  
  # computing criterion weights
  if (is.null(criterion_weights)) {
    mc_weights = multi_criteria_weighting(score_list, criterion_weights = criterion_weights)
  } else {
    mc_weights = multi_criteria_weighting(score_list)
  }

  # merge with data frame
  result_scored_df <- merge(result_df, scores_temp, by = "run_number")
  result_scored_df <- merge(result_scored_df, scores_co2, by = "run_number")
  result_scored_df <- merge(result_scored_df, scores_ocean_c, by = "run_number") 
  result_scored_df <- merge(result_scored_df, mc_weights, by = "run_number")
  
  # rename mc_weights column 
  names(result_scored_df)[names(result_scored_df) == "mc_weight"] <- mc_weights_name
  names(result_scored_df)[names(result_scored_df) == "weights.x"] <- "temp_score"
  names(result_scored_df)[names(result_scored_df) == "weights.y"] <- "co2_score"
  names(result_scored_df)[names(result_scored_df) == "weights"] <- "ocean_c_score"
  
  return(result_scored_df)        
}

With this function we can quickly substitute in and out different “dial settings” to emphasize different criterion for weighting our model results.

These scores can then be added to our result data frame and be used when computing metric and probabilities, for example:

# computing a temperature scored df
result_scored <- score_sensitivity(model_result_df, criterion_weights = c(0.4, 0.1, 0.6), mc_weights_name = "oc_temp")

We can check to see how scoring has worked with a simple visualization. However, the spaghetti plot is only goes so far. In order to really see the possible impact, completing the probabilistic workflow to produce a stacked bar plot of probabilities will be the most informative.

gmst_spaghetti_plot <- 
  ggplot(data = result_scored) + # sets the plotting environment
  geom_line( # identifies plot type
    aes(
      x = year, # identifies x-axis variable
      y = value, # identifies y-axis variable
      group = run_number, # identifies how data should be grouped
      color = temp_score, # how color should be applied
      alpha = temp_score), # how transparency should be applied
    linewidth = 0.5) + # sets line width - personal preference
  scale_color_gradient(low = "skyblue", high = "dodgerblue4", name = "Weights") + # what colors should be used
  scale_alpha_continuous(range(c(0.8, 1))) + # how extreme should the transparency gradient be - this is optional
  guides(alpha = "none") + # remove alpha legend 
  facet_wrap(~ variable, scales = "free_y") + # panels the figure by scenario name
  labs(x = "Years", y = "Ocean Carbon Uptake") + # adds labels to axes
  ggtitle(label = "Matilda Ensemble weighted by historical temperature") +
  theme_light() # sets a general theme - personal preference 

gmst_spaghetti_plot

Criterion Sensitivity Testing

Joe Brown

2024-04-25

Background

Data

Scoring

Using `multi_criteria_weighting`

Writing as Function

Criterion Sensitivity Testing

Joe Brown

2024-04-25

Background

Data

Scoring

Using multi_criteria_weighting

Writing as Function

Using `multi_criteria_weighting`