Cross-Scale Aggregation (CSA) framework

Spatiotemporal applications over Holland

Introduction

In this introductory presentation, the CSA methodology is applied for the comparison of different data sets. This includes observational and model simulation data products from various sources, i.e., the satellite GPM IMERG data product, KNMI station and radar data sets, as well as, CNRM simulation and NCEP/NCAR reanalysis. Two examples are given, one in temporal and one in spatial domain. In the first case, the aggregation curves of the dataset are estimated and then plotted together. In the second application, the two-dimensional aggregation curves of six precipitation events which have similar mean over the whole domain are compared.

The full description of the methodology and the examples can be found in the journal of Environmental Modelling and Software:

Markonis et al., Cross-scale analysis framework for model/data comparison and integration, Environmental Modelling and Software, under review.


Functions

The csa package can be downloaded from github with package devtools:

devtools::install_github("imarkonis/csa")

csa Computes (and by default plots) the aggregation curve of a given statistic in a single dimension, e.g., time. The statistics, that can be estimated across the cross-scale continuum, are: variance (“var”; default), standard deviation (“sd”), skewness (“skew”), kurtosis (“kurt”), L-scale (“l2”), coefficient of L-variation (“t2”), L-skewness (“t3”), L-kurtosis (“t4”).

csas As above but in two dimensions, i.e., space. Input variable is in brick format (raster), while output is in tidy format (data.table), with each layer of the initial brick coresponding to a different variable number. This allows for further extending CSA plots to both space and time.

csa.plot Plots a data.table in the CSA format (one column for scale number and one for statistic value). If wn argument is true, then the aggregation curve of white noise process is also plotted.

csa.multiplot As above for a group of aggregation curves. In the input data.table, a column describing the data set is added, while the argument smooth allows for loess smoothing of the aggregation curves. If there are more than 10 lines then no legend is ploted and the lines are transparent.

csa.rescale Auxiliary function for matching and comparing aggregation curves estimated at different scales.

dt.to.brick Auxiliary function for transforming data table objects to brick format, for integration in csas function.


More can be found in the accompanying help files of each function, as well as in two interactive applications about aggregation and the aggregation curves of some well-known stochastic models.


Data

Satellite data (GPM)

  • Dataset: GPM IMERG Final Precipitation
  • Temporal Resolution: 24 hours
  • Spatial Resolution: 0.1 x 0.1 deg
  • Date Range: 2014-03-12 to 2018-05-15
  • Spatial Region: latitude: 50.75, 53.55, longitude: 3.45, 7.15
  • Format: netCDF 4
  • Variables: gpm_cells (coordinates) & gpm_prcp (values) in observ_raw.Rdata
  • IDs: gpm_1
  • Size: 1681652 total values, multiplied by 1102 grid cells
  • Downloaded from: https://climexp.knmi.nl/select.cgi?id=312456c83e660703df1bfea9ba4fba50&field=imerg_daily

Station data (KNMI)

  • Dataset: 240 homogenised stations 1951-now
  • Temporal Resolution: 24 hours
  • Spatial Resolution: -
  • Date Range: 1950-12-31 to 2018-04-29
  • Spatial Region: latitude: 50.78, 53.48, longitude: 3.4, 7.11
  • Format: Text file
  • Variables: knmi_stations (coordinates) & knmi_prcp (values) in observ_raw.Rdata
  • IDs: 010
  • Size: 5902080 total values, multiplied by 240 stations
  • Downloaded from: https://climexp.knmi.nl/PhomNL.cgi?id=312456c83e660703df1bfea9ba4fba50

Radar data (KNMI)

  • Dataset: RAD_NL25_RAC_MFBS_24H_NC
  • Temporal Resolution: 24 hours
  • Spatial Resolution: 1 x 1 km
  • Date Range: 2014-03-11 to 2018-03-30
  • Spatial Region: latitude: 50.76, 53.56, longitude: 3.37, 7.22
  • Format: netCDF 4
  • Variables: rdr_cells (coordinates) & rdr_prcp (values) in observ_raw.Rdata
  • IDs: rdr_1
  • Size: 50284992 total values, multiplied by 30816 grid cells
  • Downloaded from: https://climexp.knmi.nl/select.cgi?id=312456c83e660703df1bfea9ba4fba50&field=knmi_radar_daily

Reanalysis data (NCEP/NCAR)

Simulation data (CNRM)


Code structure

The code needed to reproduce the importing and preparation of the datasets provided by the csa package consists of two R scripts. These are:

data_import.R Imports observed and modeled data for the Netherlands. Observational data are composed of two data tables, one for values and one for cell/station coordinates and then saved in observ_raw.Rdata. Model simulation data represent a single grid cell (coarser resolution) and thus a single data table is stored in model_raw.Rdata.

data_prep.R Prepares datasets for comparison. This script reads the files prepared in data_import.R and makes the necessary changes for the application of csa in time and space. For the temporal domain, all observational datasets are aggregated (average) over the Netherlands and then saved in temporal.Rdata. For the spatial domain, 6 precipitation events (daily) with mean precipitation between 10 and 11 mm, are randomly picked and then stored in spatial.Rdata.

Although here, we will use directly the data from the csa package, we have included the links for reasons of reproducibility.

data(gpm_nl, knmi_nl, rdr_nl, ncep_nl, cnrm_nl, gpm_events)

The libraries needed are:

library(data.table); library(reshape2); library(csa); library(raster)

The examples presented below can also be found in:

temporal_agg.R A demonstration of csa function in temporal domain for variance, as well as csa.multiplot for the 5 datasets.

spatial_agg.R Same as above, but for GPM dataset in spatial domain.


Results

Temporal Aggregation

To estimate and plot the aggregation curve of variance (function default), the csa function is used with a time series in vector format as input:

csa(gpm_nl$prcp)

To compare the aggregation curves of different data sets, the csa.multiplot function can be applied. Below three different utilizations of the function are presented (file temporal_agg.R). In the first case, the common arguments for the CSA plot are used, while in the second only the logarithmic scales are used (argument fast = T), which could be handy if speeding up the calculations is needed. The third plot, uses the csa.multiplot argument smooth to produce a loess smoothed line for each aggregation curve, as well as the wn argument to compare with the white noise process.

set_1 <- data.frame(csa(gpm_nl$prcp, plot = F), dataset = "gpm")
set_2 <- data.frame(csa(rdr_nl$prcp, plot = F), dataset = "radar")
set_3 <- data.frame(csa(knmi_nl$prcp, plot = F), dataset = "station")
set_4 <- data.frame(csa(ncep_nl$prcp, plot = F), dataset = "ncep")
set_5 <- data.frame(csa(cnrm_nl$prcp, plot = F), dataset = "cnrm")

g1 <- csa.multiplot(rbind(set_1, set_2, set_3, set_4, set_5))
print(g1)

set_1 <- data.frame(csa(gpm_nl$prcp, plot = F, fast = T), dataset = "gpm")
set_2 <- data.frame(csa(rdr_nl$prcp, plot = F, fast = T), dataset = "radar")
set_3 <- data.frame(csa(knmi_nl$prcp, plot = F, fast = T), dataset = "station")
set_4 <- data.frame(csa(ncep_nl$prcp, plot = F, fast = T), dataset = "ncep")
set_5 <- data.frame(csa(cnrm_nl$prcp, plot = F, fast = T), dataset = "cnrm")

g2 <- csa.multiplot(rbind(set_1, set_2, set_3, set_4, set_5))
print(g2)

g3 <- csa.multiplot(rbind(set_1, set_2, set_3, set_4, set_5), smooth = T, wn = T)
print(g3)

We can also use the csa.multiplot to represent the variance ratio between one dataset (here knmi stations) with the others.

set_1_comp <- set_3[set_3$scale %in% set_1$scale, ]
set_1_comp$dataset <- set_1$dataset
set_1_comp$var <- set_1_comp$var/set_1$var

set_2_comp <- set_3[set_3$scale %in% set_2$scale, ]
set_2_comp$dataset <- set_2$dataset
set_2_comp$var  <- set_2_comp$var/set_2$var

set_4_comp <- set_3[set_3$scale %in% set_4$scale, ]
set_4_comp$dataset <- set_4$dataset
set_4_comp$var <- set_4_comp$var/set_4$var

set_5_comp <- set_3[set_3$scale %in% set_5$scale, ]
set_5_comp$dataset <- set_5$dataset
set_5_comp$var <- set_5_comp$var/set_5$var

set_comp <- rbind(set_1_comp, set_2_comp, set_4_comp, set_5_comp)
csa.multiplot(set_comp, log_x = T, log_y = F)

We can see that the model results are closer to station until the 10-day scale and then begin to overestimate the variability, whereas the opposite behaviour is observed from scale 2, in the remote sensing data (satellite and radar).


Spatial Aggregation

For the application in the spatial domain, six daily precipitation events that share average precipitation over the Netherlands (between 10 and 11 mm) are selected:

event_dates <- format(gpm_events[, unique(time)], "%d-%m-%Y") 
gpm_events_brick <- dt.to.brick(gpm_events, var_name = "prcp")
plot(gpm_events_brick, col = rev(colorspace::sequential_hcl(40)),
     main = event_dates)

The application of csas is quite similar to csa:

g4 <-csas(gpm_events_brick)
g4$plot

gpm_sp_scale <- csas(gpm_events_brick, plot = F)
gpm_sp_scale[, variable := factor(variable, labels = event_dates)]
csa.multiplot(gpm_sp_scale, smooth = T, log_x = F, log_y = F) 

Here, we see how the variance in space changes as the scale gets coarser (to a threshold of 30 grid cells as a final scale).


Further Reading

Interested readers might find useful