R verification example

Present document contains a computational verification example using R and its verification package. The example includes some data processing steps. These are done using the reshape2, lubridate, and dplyr packages. These processing steps will not be explained in detail. Plotting is done using the ggplot2 package.

Preliminaries

First, we empty the R environment of any existing variables and we load all required libraries.

rm(list=objects())
library('verification')
library('reshape2')
library('lubridate')
library('dplyr')
library('ggplot2')
library('gganimate')

Read data

First, we read the data from file. Here, the file contains readily paired forecast and observation data. Upon reading the file, we assign names to the columns, add various metadata columns, convert the date-time vector to POSIXct format and compute the reference time of the forecasts. Finally, we tidy up the data to allow for convenient data processing and plotting.

pairs <- read.csv('H-MS-SINT_Q-fas_cosmo-leps_pairs_raw.asc', sep=' ', header=F, na.strings='-999.0')
names(pairs) <- c('validtime','leadtime','obs',paste0('member',sprintf('%02.0f',seq(1,16))))
pairs$location <- 'H-MS-SINT'
pairs$parameter <- 'Q.fas'
pairs$scenario <- 'cosmo-leps'
pairs$validtime <- ymd_hm(pairs$validtime)
pairs$reftime <- pairs$validtime - hours(pairs$leadtime)
pairs <- melt(pairs, measure.vars = paste0('member',sprintf('%02.0f',seq(1,16))), 
              variable.name = 'member', value.name = 'fcst')

Plot raw data

First, we plot observed values. For that, we create a new object obs:

obs <- pairs %>% select(validtime, obs) %>% distinct()
ggplot(data=obs, aes(x=validtime,y=obs)) + geom_line() + scale_x_datetime()

Then we plot hydrographs that combine forecast and observations. Note that here, we only plot the June 19, 12Z forecast:

my_pairs <- pairs %>% filter(reftime == ymd_hm('200706191200')) %>% na.omit()
ggplot(data=my_pairs, aes(x=validtime, y=fcst, group=interaction(reftime,member))) +
  geom_line() + geom_line(aes(x=validtime, y=obs), col='blue')

Finally, we plot a scatter diagram.

ggplot(data=pairs, aes(x=fcst, y=obs, col = leadtime)) + geom_point() +
  transition_states(leadtime, transition_length = 2, state_length = 1) + enter_fade() + exit_shrink() + ease_aes('sine-in-out')

Verification

We verify two sets of forecasts. Deterministic forecasts are created by taking the ensemble mean at each valid time. Probabilistic forecasts are created by computing the probability of the ensemble to exceed a value. Here, that value is the 90th percentile of the observations.

p90 <- as.numeric(quantile(obs$obs, probs = c(0.9), na.rm = T))

pairs_det <-
  pairs %>% group_by(validtime, leadtime) %>%mutate(fcst = mean(fcst)) %>% select(-member) %>%
  distinct() %>% filter(leadtime == 120)

pairs_prob <-
  pairs %>% group_by(validtime, leadtime) %>% mutate(fcst = length(fcst[fcst > p90])/(length(fcst)+1)) %>%
  mutate(obs = obs>p90) %>% select(-member) %>% distinct() %>% na.omit() %>% filter(leadtime == 120)

For each set of forecasts, we create a ‘verification object’. This contains some summary verification metrics and can be used to create various technical diagrams. First, we do so for the set of deterministic forecasts:

my_verification_object <- verify(pairs_det$obs, pairs_det$fcst, frcst.type='cont', obs.type='cont')

We first explore the contents of the object:

names(my_verification_object)

##  [1] "baseline.tf"  "MAE"          "MSE"          "ME"           "MSE.baseline"
##  [6] "MSE.pers"     "SS.baseline"  "obs"          "pred"         "baseline"

And then we show the values of the mean absolute error, mean squared error and mean error, respectively:

my_verification_object$MAE

## [1] 92.07429

my_verification_object$MSE

## [1] 28085.24

my_verification_object$ME

## [1] 9.879896

Then we do the same for the probabilistic forecasts:

my_verification_object <- verify(pairs_prob$obs, pairs_prob$fcst, frcst.type='prob', obs.type='binary')

## If baseline is not included, baseline values  will be calculated from the  sample obs.

names(my_verification_object)

##  [1] "baseline.tf"    "bs"             "bs.baseline"    "ss"            
##  [5] "bs.reliability" "bs.resol"       "bs.uncert"      "y.i"           
##  [9] "obar.i"         "prob.y"         "obar"           "thres"         
## [13] "check"          "bins"           "obs"            "pred"

From the ‘probabilistic’ verification object, we can extract the value of various summary metrics including the Brier score and its decomposition:

my_verification_object$bs

## [1] 0.06754244

my_verification_object$bs.reliability

## [1] 0.01091352

my_verification_object$bs.resol

## [1] 0.03350687

my_verification_object$bs.uncert

## [1] 0.09013579

The ‘probabilistic’ verification object allows us to easily plot various technical diagrams…

… the reliability diagram:

reliability.plot(my_verification_object)

… the attribute diagram:

attribute(my_verification_object)

## NULL

… the relative operating characteristic:

roc.plot(my_verification_object)

R verification example

Jan Verkade

September 11, 2022

Preliminaries

Read data

Plot raw data

Verification