Partly based on this source.
Scales all numeric variables in the range [0,1]. One possible formula is given below:
\[ x_{new} = \frac{x - x_{min}}{x_{max} - x_{min}} \]
normalize <- function(x) {
min <- raster::minValue(x)
max <- raster::maxValue(x)
return((x - min) / (max - min))
}Similar to normalization, but each cell values is divided by the sum of all other cell values. Values are not bound in range [0, 1].
\[ x_{new} = \frac{x - x_{min}}{\sum_{}^{} (x - x_{min})} \]
ol_normalize <- function(x) {
min <- raster::minValue(x)
return((x - min) / raster::cellStats(x - min, "sum"))
}
ol_normalize_ns <- function(x) {
return((x - min(x)) / sum(x - min(x)))
}On the other hand, you can use standardization on your data set. It will then transform it to have zero mean and unit variance, for example using the equation below:
\[ x_{new} = \frac{x - \mu}{\sigma} \]
standardize <- function(x) {
mean <- raster::cellStats(x, "mean")
sd <- raster::cellStats(x, "sd", asSample = FALSE)
return((x - mean) / sd)
}All of these techniques have their drawbacks. If you have outliers in your data set, normalizing your data will certainly scale the “normal” data to a very small interval. And generally, most of data sets have outliers. When using standardization, your new data aren’t bounded (unlike normalization).
Robust alternatives includ subtracting the median and divididing by the IQR:
\[ x_{new} = \frac{x - M_{x}}{IQR_{x}} \]
IQRize <- function(x) {
x_values <- raster::getValues(x)
med <- median(x_values)
iqr <- IQR(x_values, na.rm = TRUE)
return((x - med) / iqr)
}or scale linearly so that the 5th and 95th percentiles meet some standard range.
Use function gaussian_field() from package protectr to generat 9 simulated features.
set.seed(42)
e <- raster::extent(0, 100, 0, 100)
r <- raster::raster(e, nrows = 100, ncols = 100, vals = 1)
features <- gaussian_field(r, range = 20, n = 9, mean = 10, variance = 3)
levelplot(features, margin = FALSE, col.regions = viridis, layout = c(3, 3))gf_stack <- raster::stack(features[[1]],
normalize(features[[1]]),
ol_normalize(features[[1]]),
standardize(features[[1]]),
IQRize(features[[1]]))
names(gf_stack) <- c("Original", "Normalized", "Oc_normalized", "Standardized", "IQRized")
grid.arrange(
levelplot(gf_stack[[1]], main = names(gf_stack[[1]]), margin = FALSE, col.regions = viridis),
levelplot(gf_stack[[2]], main = names(gf_stack[[2]]), margin = FALSE, col.regions = viridis),
levelplot(gf_stack[[3]], main = names(gf_stack[[3]]), margin = FALSE, col.regions = viridis),
rasterVis::histogram(gf_stack[[1]]),
rasterVis::histogram(gf_stack[[2]]),
rasterVis::histogram(gf_stack[[3]]),
ncol = 3, nrow = 2)grid.arrange(
levelplot(gf_stack[[1]], main = names(gf_stack[[1]]), margin = FALSE, col.regions = viridis),
levelplot(gf_stack[[4]], main = names(gf_stack[[4]]), margin = FALSE, col.regions = viridis),
levelplot(gf_stack[[5]], main = names(gf_stack[[5]]), margin = FALSE, col.regions = viridis),
rasterVis::histogram(gf_stack[[1]]),
rasterVis::histogram(gf_stack[[4]]),
rasterVis::histogram(gf_stack[[5]]),
ncol = 3, nrow = 2)The maps and histograms below are for feature carbon_sequestration
es_stack <- raster::stack(raw_rasters[[1]],
normalize(raw_rasters[[1]]),
ol_normalize(raw_rasters[[1]]))
names(es_stack) <- c("Original", "Normalized", "OL_normalized")
grid.arrange(
levelplot(es_stack[[1]], main = names(es_stack[[1]]), margin = FALSE, col.regions = viridis),
levelplot(es_stack[[2]], main = names(es_stack[[2]]), margin = FALSE, col.regions = viridis),
levelplot(es_stack[[3]], main = names(es_stack[[3]]), margin = FALSE, col.regions = viridis),
rasterVis::histogram(es_stack[[1]]),
rasterVis::histogram(es_stack[[2]]),
rasterVis::histogram(es_stack[[3]]),
ncol = 3, nrow = 2)