This is a presentation of the scalegram methodology, applied for the comparison of different data sets. Here, we apply the methodology over Netherlands for observational and model simulation data products from various sources, i.e., the satellite GPM IMERG data product, KNMI station and radar data sets, as well as, CNRM simulation and NCEP/NCAR reanalysis. Two examples are given, one in temporal and one in spatial domain. In the first case, the aggregation curve for each data set are estimated over Netherlands and then plotted together. In the second application, the two-dimensional aggregation curves of 6 similar, in terms of mean, precipitation events are compared.
The scalegram package can be downloaded from github with package devtools:
devtools::install_github("imarkonis/scalegram")
scalegram Computes (and by default plots) the aggregation curve of a given statistic in a single dimension, e.g., time. The statistics, that can be estimated across the cross-scale continuum, are: variance (“var”; default), standard deviation (“sd”), skewness (“skew”), kurtosis (“kurt”), L-scale (“l2”), coefficient of L-variation (“t2”), L-skewness (“t3”), L-kurtosis (“t4”).
scalegram_space As above but in two dimensions, i.e., space. Input variable is in brick format (raster), while output is in tidy format (data.table), with each layer of the initial brick coresponding to a different variable number. This allows for further extending scalegram to both space and time.
scalegram_plot Plots a data.table in the scalegram format (one column for scale number and one for statistic value). If wn argument is true, then the aggregation curve of white noise process is also plotted.
scalegram_multiplot As above for a group of scalegrams. In the input data.table, a column describing the data set is added, while the argument smooth allows for loess smoothing of the aggregation curves. If there are more than 10 lines then no legend is ploted and the lines are transparent.
scalegram_rescale Auxiliary function for matching and comparing scalegrams estimated at different scales.
dt_to_brick Auxiliary function for transforming data table objects to brick format, for integration in scalegram_space function.
More can be found at the help file of each function.
Satellite data (GPM)
Station data (KNMI)
Radar data (KNMI)
Reanalysis data (NCEP/NCAR)
Simulation data (CNRM)
The code needed to reproduce the importing and preparation of the datasets provided by the scalegram package consists of two R scripts. These are:
data_import.R Imports observed and modeled data for the Netherlands. Observational data are composed of two data tables, one for values and one for cell/station coordinates and then saved in observ_raw.Rdata. Model simulation data represent a single grid cell (coarser resolution) and thus a single data table is stored in model_raw.Rdata.
data_prep.R Prepares datasets for comparison. This script reads the files prepared in data_import.R and makes the necessary changes for the application of scalegram in time and space. For the temporal domain, all observational datasets are aggregated (average) over the Netherlands and then saved in temporal.Rdata. For the spatial domain, 6 precipitation events (daily) with mean precipitation between 10 and 11 mm, are randomly picked and then stored in spatial.Rdata.
Although here, we will use directly the data from the scalegram package, we have included the links for reasons of reproducibility.
data(gpm_nl, knmi_nl, rdr_nl, ncep_nl, cnrm_nl, gpm_events)
The libraries needed are:
library(data.table); library(reshape2); library(scalegram); library(raster)
The examples presented below can also be found in:
temporal_agg.R A demonstration of scalegram function in temporal domain for variance, as well as scalegram_multiplot for the 5 datasets.
spatial_agg.R Same as above, but for GPM dataset in spatial domain.
To estimate and plot the aggregation curve of variance (function default), the scalegram function is used with a time series in vector format as input:
scalegram(gpm_nl$prcp)
To compare the aggregation curves of different data sets, the scalegram_multiplot function can be applied. Below three different utilizations of the function are presented (file temporal_agg.R). In the first case, the common arguments for the scalegram are used, while in the second only the logarithmic scales are used (argument fast = T), which could be handy if speeding up the calculations is needed. The third plot, uses the scalegram_multiplot argument smooth to produce a loess smoothed line for each aggregation curve, as well as the wn argument to compare with the white noise process.
set_1 <- data.frame(scalegram(gpm_nl$prcp, plot = F), dataset = "gpm")
set_2 <- data.frame(scalegram(rdr_nl$prcp, plot = F), dataset = "radar")
set_3 <- data.frame(scalegram(knmi_nl$prcp, plot = F), dataset = "station")
set_4 <- data.frame(scalegram(ncep_nl$prcp, plot = F), dataset = "ncep")
set_5 <- data.frame(scalegram(cnrm_nl$prcp, plot = F), dataset = "cnrm")
g1 <- scalegram_multiplot(rbind(set_1, set_2, set_3, set_4, set_5))
set_1 <- data.frame(scalegram(gpm_nl$prcp, plot = F, fast = T), dataset = "gpm")
set_2 <- data.frame(scalegram(rdr_nl$prcp, plot = F, fast = T), dataset = "radar")
set_3 <- data.frame(scalegram(knmi_nl$prcp, plot = F, fast = T), dataset = "station")
set_4 <- data.frame(scalegram(ncep_nl$prcp, plot = F, fast = T), dataset = "ncep")
set_5 <- data.frame(scalegram(cnrm_nl$prcp, plot = F, fast = T), dataset = "cnrm")
g2 <- scalegram_multiplot(rbind(set_1, set_2, set_3, set_4, set_5))
g3 <- scalegram_multiplot(rbind(set_1, set_2, set_3, set_4, set_5), smooth = T, wn = T)
We can also use the scalegram_multiplot to represent the variance ratio between one dataset (here knmi stations) with the others.
set_1_comp <- set_3[set_3$scale %in% set_1$scale, ]
set_1_comp$dataset <- set_1$dataset
set_1_comp$var <- set_1_comp$var/set_1$var
set_2_comp <- set_3[set_3$scale %in% set_2$scale, ]
set_2_comp$dataset <- set_2$dataset
set_2_comp$var <- set_2_comp$var/set_2$var
set_4_comp <- set_3[set_3$scale %in% set_4$scale, ]
set_4_comp$dataset <- set_4$dataset
set_4_comp$var <- set_4_comp$var/set_4$var
set_5_comp <- set_3[set_3$scale %in% set_5$scale, ]
set_5_comp$dataset <- set_5$dataset
set_5_comp$var <- set_5_comp$var/set_5$var
set_comp <- rbind(set_1_comp, set_2_comp, set_4_comp, set_5_comp)
scalegram_multiplot(set_comp, log_x = T, log_y = F)
We can see that the model results are closer to station until the 10-day scale and then begin to overestimate the variability, whereas the opposite behaviour is observed from scale 2, in the remote sensing data (satellite and radar).
For the application in the spatial domain, six daily precipitation events that share average precipitation over the Netherlands (between 10 and 11 mm) are selected:
event_dates <- format(gpm_events[, unique(time)], "%d-%m-%Y")
gpm_events_brick <- dt_to_brick(gpm_events, var_name = "prcp")
plot(gpm_events_brick, col = rev(colorspace::sequential_hcl(40)),
main = event_dates)
The application of scalegram_space is quite similar to scalegram:
gpm_sp_scale <- scalegram_space(gpm_events_brick, thres = 10)
gpm_sp_scale[, variable := factor(variable, labels = event_dates)]
scalegram_multiplot(gpm_sp_scale, smooth = T, log_x = F, log_y = F)
Here, we see how the variance in space changes as the scale gets coarser (to a threshold of 10 grid cells as a final scale).