Present document contains a computational verification example using
R and its verification package. The example includes some
data processing steps. These are done using the reshape2,
lubridate, and dplyr packages. These
processing steps will not be explained in detail. Plotting is done using
the ggplot2 package.
First, we empty the R environment of any existing variables and we load all required libraries.
rm(list=objects())
library('verification')
library('reshape2')
library('lubridate')
library('dplyr')
library('ggplot2')
library('gganimate')
First, we read the data from file. Here, the file contains readily paired forecast and observation data. Upon reading the file, we assign names to the columns, add various metadata columns, convert the date-time vector to POSIXct format and compute the reference time of the forecasts. Finally, we tidy up the data to allow for convenient data processing and plotting.
pairs <- read.csv('H-MS-SINT_Q-fas_cosmo-leps_pairs_raw.asc', sep=' ', header=F, na.strings='-999.0')
names(pairs) <- c('validtime','leadtime','obs',paste0('member',sprintf('%02.0f',seq(1,16))))
pairs$location <- 'H-MS-SINT'
pairs$parameter <- 'Q.fas'
pairs$scenario <- 'cosmo-leps'
pairs$validtime <- ymd_hm(pairs$validtime)
pairs$reftime <- pairs$validtime - hours(pairs$leadtime)
pairs <- melt(pairs, measure.vars = paste0('member',sprintf('%02.0f',seq(1,16))),
variable.name = 'member', value.name = 'fcst')
First, we plot observed values. For that, we create a new object
obs:
obs <- pairs %>% select(validtime, obs) %>% distinct()
ggplot(data=obs, aes(x=validtime,y=obs)) + geom_line() + scale_x_datetime()
Then we plot hydrographs that combine forecast and observations. Note that here, we only plot the June 19, 12Z forecast:
my_pairs <- pairs %>% filter(reftime == ymd_hm('200706191200')) %>% na.omit()
ggplot(data=my_pairs, aes(x=validtime, y=fcst, group=interaction(reftime,member))) +
geom_line() + geom_line(aes(x=validtime, y=obs), col='blue')
Finally, we plot a scatter diagram.
ggplot(data=pairs, aes(x=fcst, y=obs, col = leadtime)) + geom_point() +
transition_states(leadtime, transition_length = 2, state_length = 1) + enter_fade() + exit_shrink() + ease_aes('sine-in-out')
We verify two sets of forecasts. Deterministic forecasts are created by taking the ensemble mean at each valid time. Probabilistic forecasts are created by computing the probability of the ensemble to exceed a value. Here, that value is the 90th percentile of the observations.
p90 <- as.numeric(quantile(obs$obs, probs = c(0.9), na.rm = T))
pairs_det <-
pairs %>% group_by(validtime, leadtime) %>%mutate(fcst = mean(fcst)) %>% select(-member) %>%
distinct() %>% filter(leadtime == 120)
pairs_prob <-
pairs %>% group_by(validtime, leadtime) %>% mutate(fcst = length(fcst[fcst > p90])/(length(fcst)+1)) %>%
mutate(obs = obs>p90) %>% select(-member) %>% distinct() %>% na.omit() %>% filter(leadtime == 120)
For each set of forecasts, we create a ‘verification object’. This contains some summary verification metrics and can be used to create various technical diagrams. First, we do so for the set of deterministic forecasts:
my_verification_object <- verify(pairs_det$obs, pairs_det$fcst, frcst.type='cont', obs.type='cont')
We first explore the contents of the object:
names(my_verification_object)
## [1] "baseline.tf" "MAE" "MSE" "ME" "MSE.baseline"
## [6] "MSE.pers" "SS.baseline" "obs" "pred" "baseline"
And then we show the values of the mean absolute error, mean squared error and mean error, respectively:
my_verification_object$MAE
## [1] 92.07429
my_verification_object$MSE
## [1] 28085.24
my_verification_object$ME
## [1] 9.879896
Then we do the same for the probabilistic forecasts:
my_verification_object <- verify(pairs_prob$obs, pairs_prob$fcst, frcst.type='prob', obs.type='binary')
## If baseline is not included, baseline values will be calculated from the sample obs.
names(my_verification_object)
## [1] "baseline.tf" "bs" "bs.baseline" "ss"
## [5] "bs.reliability" "bs.resol" "bs.uncert" "y.i"
## [9] "obar.i" "prob.y" "obar" "thres"
## [13] "check" "bins" "obs" "pred"
From the ‘probabilistic’ verification object, we can extract the value of various summary metrics including the Brier score and its decomposition:
my_verification_object$bs
## [1] 0.06754244
my_verification_object$bs.reliability
## [1] 0.01091352
my_verification_object$bs.resol
## [1] 0.03350687
my_verification_object$bs.uncert
## [1] 0.09013579
The ‘probabilistic’ verification object allows us to easily plot various technical diagrams…
… the reliability diagram:
reliability.plot(my_verification_object)
… the attribute diagram:
attribute(my_verification_object)
## NULL
… the relative operating characteristic:
roc.plot(my_verification_object)