Scalegram

Applications in time & space

Introduction

This is a presentation of the scalegram methodology, applied for the comparison of different data sets. Here, we apply the methodology over Netherlands for observational and model simulation data products from various sources, i.e., the satellite GPM IMERG data product, KNMI station and radar data sets, as well as, CNRM simulation and NCEP/NCAR reanalysis. Two examples are given, one in temporal and one in spatial domain. In the first case, the aggregation curve for each data set are estimated over Netherlands and then plotted together. In the second application, the two-dimensional aggregation curves of 6 similar, in terms of mean, precipitation events are compared.

Functions

The scalegram package can be downloaded from github with package devtools:

devtools::install_github("imarkonis/scalegram")

scalegram Computes (and by default plots) the aggregation curve of a given statistic in a single dimension, e.g., time. The statistics, that can be estimated across the cross-scale continuum, are: variance (“var”; default), standard deviation (“sd”), skewness (“skew”), kurtosis (“kurt”), L-scale (“l2”), coefficient of L-variation (“t2”), L-skewness (“t3”), L-kurtosis (“t4”).

scalegram_space As above but in two dimensions, i.e., space. Input variable is in brick format (raster), while output is in tidy format (data.table), with each layer of the initial brick coresponding to a different variable number. This allows for further extending scalegram to both space and time.

scalegram_plot Plots a data.table in the scalegram format (one column for scale number and one for statistic value). If wn argument is true, then the aggregation curve of white noise process is also plotted.

scalegram_multiplot As above for a group of scalegrams. In the input data.table, a column describing the data set is added, while the argument smooth allows for loess smoothing of the aggregation curves. If there are more than 10 lines then no legend is ploted and the lines are transparent.

scalegram_rescale Auxiliary function for matching and comparing scalegrams estimated at different scales.

dt_to_brick Auxiliary function for transforming data table objects to brick format, for integration in scalegram_space function.

More can be found at the help file of each function.

Data

Satellite data (GPM)

Dataset: GPM IMERG Final Precipitation
Temporal Resolution: 24 hours
Spatial Resolution: 0.1 x 0.1 deg
Date Range: 2014-03-12 to 2018-05-15
Spatial Region: latitude: 50.75, 53.55, longitude: 3.45, 7.15
Format: netCDF 4
Variables: gpm_cells (coordinates) & gpm_prcp (values) in observ_raw.Rdata
IDs: gpm_1
Size: 1681652 total values, multiplied by 1102 grid cells
Downloaded from: https://climexp.knmi.nl/select.cgi?id=312456c83e660703df1bfea9ba4fba50&field=imerg_daily

Station data (KNMI)

Dataset: 240 homogenised stations 1951-now
Temporal Resolution: 24 hours
Spatial Resolution: -
Date Range: 1950-12-31 to 2018-04-29
Spatial Region: latitude: 50.78, 53.48, longitude: 3.4, 7.11
Format: Text file
Variables: knmi_stations (coordinates) & knmi_prcp (values) in observ_raw.Rdata
IDs: 010
Size: 5902080 total values, multiplied by 240 stations
Downloaded from: https://climexp.knmi.nl/PhomNL.cgi?id=312456c83e660703df1bfea9ba4fba50

Radar data (KNMI)

Dataset: RAD_NL25_RAC_MFBS_24H_NC
Temporal Resolution: 24 hours
Spatial Resolution: 1 x 1 km
Date Range: 2014-03-11 to 2018-03-30
Spatial Region: latitude: 50.76, 53.56, longitude: 3.37, 7.22
Format: netCDF 4
Variables: rdr_cells (coordinates) & rdr_prcp (values) in observ_raw.Rdata
IDs: rdr_1
Size: 50284992 total values, multiplied by 30816 grid cells
Downloaded from: https://climexp.knmi.nl/select.cgi?id=312456c83e660703df1bfea9ba4fba50&field=knmi_radar_daily

Reanalysis data (NCEP/NCAR)

Dataset: NMC reanalysis
Temporal Resolution: 24 hours
Spatial Resolution: 2.5 x 2.5 deg
Date Range: 1948-01-01 to 2018-06-05
Spatial Region: 1 grid cell at 52.38N & 5.625E
Format: netCDF 4
Variables: ncep_prcp (values) in model_raw.Rdata
Size: 25601 total values in 1 grid cell
Downloaded from: https://climexp.knmi.nl/select.cgi?id=312456c83e660703df1bfea9ba4fba50&field=nprate_daily

Simulation data (CNRM)

Dataset: cnrm cm3 20c3m pr
Temporal Resolution: 24 hours
Spatial Resolution: 2.8 x 2.8 deg
Date Range: 1961-01-01 to 2000-12-31
Spatial Region: 1 grid cell at 51.625N & 5.625E
Format: netCDF 4
Variables: cnrm_prcp (values) in model_raw.Rdata
Size: 14610 total values in 1 grid cell
Downloaded from: https://climexp.knmi.nl/select.cgi?id=312456c83e660703df1bfea9ba4fba50&field=pr_A2_cnrm_cm3_20c3m

Code structure

The code needed to reproduce the importing and preparation of the datasets provided by the scalegram package consists of two R scripts. These are:

data_import.R Imports observed and modeled data for the Netherlands. Observational data are composed of two data tables, one for values and one for cell/station coordinates and then saved in observ_raw.Rdata. Model simulation data represent a single grid cell (coarser resolution) and thus a single data table is stored in model_raw.Rdata.

data_prep.R Prepares datasets for comparison. This script reads the files prepared in data_import.R and makes the necessary changes for the application of scalegram in time and space. For the temporal domain, all observational datasets are aggregated (average) over the Netherlands and then saved in temporal.Rdata. For the spatial domain, 6 precipitation events (daily) with mean precipitation between 10 and 11 mm, are randomly picked and then stored in spatial.Rdata.

Although here, we will use directly the data from the scalegram package, we have included the links for reasons of reproducibility.

data(gpm_nl, knmi_nl, rdr_nl, ncep_nl, cnrm_nl, gpm_events)

The libraries needed are:

library(data.table); library(reshape2); library(scalegram); library(raster)

The examples presented below can also be found in:

temporal_agg.R A demonstration of scalegram function in temporal domain for variance, as well as scalegram_multiplot for the 5 datasets.

spatial_agg.R Same as above, but for GPM dataset in spatial domain.

Results

Temporal Aggregation

To estimate and plot the aggregation curve of variance (function default), the scalegram function is used with a time series in vector format as input:

scalegram(gpm_nl$prcp)

To compare the aggregation curves of different data sets, the scalegram_multiplot function can be applied. Below three different utilizations of the function are presented (file temporal_agg.R). In the first case, the common arguments for the scalegram are used, while in the second only the logarithmic scales are used (argument fast = T), which could be handy if speeding up the calculations is needed. The third plot, uses the scalegram_multiplot argument smooth to produce a loess smoothed line for each aggregation curve, as well as the wn argument to compare with the white noise process.

set_1 <- data.frame(scalegram(gpm_nl$prcp, plot = F), dataset = "gpm")
set_2 <- data.frame(scalegram(rdr_nl$prcp, plot = F), dataset = "radar")
set_3 <- data.frame(scalegram(knmi_nl$prcp, plot = F), dataset = "station")
set_4 <- data.frame(scalegram(ncep_nl$prcp, plot = F), dataset = "ncep")
set_5 <- data.frame(scalegram(cnrm_nl$prcp, plot = F), dataset = "cnrm")

g1 <- scalegram_multiplot(rbind(set_1, set_2, set_3, set_4, set_5))

set_1 <- data.frame(scalegram(gpm_nl$prcp, plot = F, fast = T), dataset = "gpm")
set_2 <- data.frame(scalegram(rdr_nl$prcp, plot = F, fast = T), dataset = "radar")
set_3 <- data.frame(scalegram(knmi_nl$prcp, plot = F, fast = T), dataset = "station")
set_4 <- data.frame(scalegram(ncep_nl$prcp, plot = F, fast = T), dataset = "ncep")
set_5 <- data.frame(scalegram(cnrm_nl$prcp, plot = F, fast = T), dataset = "cnrm")

g2 <- scalegram_multiplot(rbind(set_1, set_2, set_3, set_4, set_5))

g3 <- scalegram_multiplot(rbind(set_1, set_2, set_3, set_4, set_5), smooth = T, wn = T)

We can also use the scalegram_multiplot to represent the variance ratio between one dataset (here knmi stations) with the others.

set_1_comp <- set_3[set_3$scale %in% set_1$scale, ]
set_1_comp$dataset <- set_1$dataset
set_1_comp$var <- set_1_comp$var/set_1$var

set_2_comp <- set_3[set_3$scale %in% set_2$scale, ]
set_2_comp$dataset <- set_2$dataset
set_2_comp$var  <- set_2_comp$var/set_2$var

set_4_comp <- set_3[set_3$scale %in% set_4$scale, ]
set_4_comp$dataset <- set_4$dataset
set_4_comp$var <- set_4_comp$var/set_4$var

set_5_comp <- set_3[set_3$scale %in% set_5$scale, ]
set_5_comp$dataset <- set_5$dataset
set_5_comp$var <- set_5_comp$var/set_5$var

set_comp <- rbind(set_1_comp, set_2_comp, set_4_comp, set_5_comp)
scalegram_multiplot(set_comp, log_x = T, log_y = F)

We can see that the model results are closer to station until the 10-day scale and then begin to overestimate the variability, whereas the opposite behaviour is observed from scale 2, in the remote sensing data (satellite and radar).

Spatial Aggregation

For the application in the spatial domain, six daily precipitation events that share average precipitation over the Netherlands (between 10 and 11 mm) are selected:

event_dates <- format(gpm_events[, unique(time)], "%d-%m-%Y") 
gpm_events_brick <- dt_to_brick(gpm_events, var_name = "prcp")
plot(gpm_events_brick, col = rev(colorspace::sequential_hcl(40)),
     main = event_dates)

The application of scalegram_space is quite similar to scalegram:

gpm_sp_scale <- scalegram_space(gpm_events_brick, thres = 10)
gpm_sp_scale[, variable := factor(variable, labels = event_dates)]
scalegram_multiplot(gpm_sp_scale, smooth = T, log_x = F, log_y = F)

Here, we see how the variance in space changes as the scale gets coarser (to a threshold of 10 grid cells as a final scale).