This notebook illustrates how to calculate uniband and multiband image statistics using the raster library.
I will use a spatial and spectral subset of a Landsat 8 scene collected in 2013. The subset is a seven-band image which covers the area known as Montes de Maria, in the Colombian Caribbean region. I explained in a previous notebook how to find, download, and subset Landsat images.
Let’s use the raster library to create a raster object from a Landsat multispectral TIF file obtained in a previous notebook.
library(raster)
## Loading required package: sp
landsat13 <- brick("./landsat/montes-landsat.TIF")
Let’s check what is landsat13:
landsat13
## class : RasterBrick
## dimensions : 3392, 3409, 11563328, 7 (nrow, ncol, ncell, nlayers)
## resolution : 30, 30 (x, y)
## extent : 422745, 525015, 1019775, 1121535 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## source : /Users/ivanlizarazo/Documents/ivan/PR/datos/landsat/montes-landsat.TIF
## names : ultra.blue, blue, green, red, NIR, SWIR1, SWIR2
## min values : 0, 0, 0, 0, 0, 0, 0
## max values : 20788, 21783, 23294, 27555, 43270, 65535, 65535
We can see the spatial resolution, extent, number of layers, coordinate reference system and more.
Note that digital values range from 0 to 65535. In order to obtain apparent reflectance values it is necessary to review the metadata and look for the conversion factors:
As the conversion factors are equal for bands 1 to 7, we can convert all bands in a single operation:
toa13 <- -0.1 + landsat13*2.0E-5
Let’s check the output:
toa13
## class : RasterBrick
## dimensions : 3392, 3409, 11563328, 7 (nrow, ncol, ncell, nlayers)
## resolution : 30, 30 (x, y)
## extent : 422745, 525015, 1019775, 1121535 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## source : /private/var/folders/l_/1nv5l0dx78g_d3k3_t85lrjr0000gn/T/RtmpDYCDt2/raster/r_tmp_2021-03-05_204931_10069_15267.grd
## names : ultra.blue, blue, green, red, NIR, SWIR1, SWIR2
## min values : -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1
## max values : 0.31576, 0.33566, 0.36588, 0.45110, 0.76540, 1.21070, 1.21070
In a previous notebook, we saw how to use terra for computing uniband stats. Let’s do it now with raster:
Mean value for all bands:
toa13_mean <- cellStats(toa13, 'mean')
toa13_mean
## ultra.blue blue green red NIR SWIR1 SWIR2
## 0.07200951 0.05691908 0.04736394 0.03298331 0.18072476 0.11212926 0.04841935
Standard deviation value for all bands:
toa13_sd <- cellStats(toa13, 'sd')
toa13_sd
## ultra.blue blue green red NIR SWIR1 SWIR2
## 0.05569004 0.05104050 0.04863306 0.04600959 0.12010867 0.09146298 0.05948233
First, image visualization:
par(mfrow=c(1, 2))
tmp <- stretch(toa13[[1]], minv=0, maxv=255, minq=0, maxq=1)
plot(tmp, col = gray(0:100 / 100), main='L8 2013 B1 - linear stretch', legend=FALSE, axes=FALSE)
tmp <- stretch(toa13[[2]], minv=0, maxv=255, minq=0, maxq=1)
plot(tmp, col = gray(0:100 / 100), main='L8 2013 B2 - linear stretch', legend=FALSE, axes=FALSE)
Now, the corresponding histograms:
hist(toa13, c(1,2), maxpixels=1000000, plot=TRUE, main="Histograms of B1 and B2", xlab= "TOA reflectance", ylab="Frequency", col="wheat3")
par(mfrow=c(1, 2))
tmp <- stretch(toa13[[3]], minv=0, maxv=255, minq=0, maxq=1)
plot(tmp, col = gray(0:100 / 100), main='L8 2013 B3 - linear stretch', legend=FALSE, axes=FALSE)
tmp <- stretch(toa13[[4]], minv=0, maxv=255, minq=0, maxq=1)
plot(tmp, col = gray(0:100 / 100), main='L8 2013 B4 - linear stretch', legend=FALSE, axes=FALSE)
hist(toa13, c(3,4), maxpixels=1000000, plot=TRUE, main="Histograms of B3 and B4", xlab= "TOA reflectance", ylab="Frequency", col="darkgreen")
par(mfrow=c(1, 2))
tmp <- stretch(toa13[[5]], minv=0, maxv=255, minq=0, maxq=1)
plot(tmp, col = gray(0:100 / 100), main='L8 2013 B5 - linear stretch', legend=FALSE, axes=FALSE)
tmp <- stretch(toa13[[6]], minv=0, maxv=255, minq=0, maxq=1)
plot(tmp, col = gray(0:100 / 100), main='L8 2013 B6 - linear stretch', legend=FALSE, axes=FALSE)
hist(toa13, c(5,6), maxpixels=1000000, plot=TRUE, main="Histograms of B5 and B6", xlab= "TOA reflectance", ylab="Frequency", col="springgreen")
The next tasks are for you:
How to compute band skewness, that is, the asymmetry of the probability distribution of the band digital values about its mean?
How to compute band kurtosis, that is, the measure of outliers present in the distribution?
For a conceptual explanation of these two measures see this link.
par(mfrow=c(1,2))
# Create an RGB image from the raster stack
plotRGB(toa13, r = 4, g = 3, b = 2,main='L8 2013 RGB432', stretch='lin', colNA='white')
plotRGB(toa13, r = 5, g = 6, b = 4,main='L8 2013 RGB564', stretch='lin', colNA='white')
Multiband statistics include the covariance matrix, the correlation matrix, and pairwise scatterplots.
A pairwise scatterplot is usually accompanied by the corresponding correlation coefficient.
In remote sensing, the most common used correlation coefficient is the Pearson correlation (r), which measures a linear dependence between two bands (x and y). It is meaningful only when x and y are characterized by a normal distribution. The plot of y = f(x) is named the linear regression curve.
Let’s obtain pairwise plots and correlation coefficients:
pairs(toa13[[1:3]], hist=TRUE, cor=TRUE, use="pairwise.complete.obs", maxpixels=100000)
pairs(toa13[[4:6]], hist=TRUE, cor=TRUE, use="pairwise.complete.obs", maxpixels=100000)
Covariance is a measure of the relationship between two random variables. The metric evaluates how much – to what extent – the variables change together. In other words, it is essentially a measure of the variance between two variables. However, the metric does not assess the dependency between variables.
Unlike the correlation coefficient, covariance is measured in units. The units are computed by multiplying the units of the two variables. The variance can take any positive or negative values. The values are interpreted as follows:
The covariance formula is similar to the formula for correlation and deals with the calculation of data points from the average value in a dataset. For example, the covariance between two random variables X and Y can be calculated using the following formula (for population):
# Small fig.width
include_graphics("./pngs/Covariance.png")
Now, let’s create the covariance matrix for our Landsat image:
covmat <- cov(as.matrix(toa13))
# Note the values rescaling
round(covmat*1000,2)
## ultra.blue blue green red NIR SWIR1 SWIR2
## ultra.blue 3.10 2.84 2.67 2.44 4.87 3.86 2.73
## blue 2.84 2.61 2.46 2.26 4.43 3.60 2.55
## green 2.67 2.46 2.37 2.20 4.57 3.71 2.57
## red 2.44 2.26 2.20 2.12 4.09 3.69 2.56
## NIR 4.87 4.43 4.57 4.09 14.43 9.29 5.59
## SWIR1 3.86 3.60 3.71 3.69 9.29 8.37 5.34
## SWIR2 2.73 2.55 2.57 2.56 5.59 5.34 3.54
Covariance and correlation both primarily assess the relationship between variables. The closest analogy to the relationship between them is the relationship between the variance and standard deviation.
Covariance measures the total variation of two random variables from their expected values. Using covariance, we can only gauge the direction of the relationship (whether the variables tend to move in tandem or show an inverse relationship). However, it does not indicate the strength of the relationship, nor the dependency between the variables.
On the other hand, correlation measures the strength of the relationship between variables. Correlation is the scaled measure of covariance. It is dimensionless. In other words, the correlation coefficient is always a pure value and not measured in any units.
The relationship between the two concepts can be expressed using the formula below:
# Small fig.width
include_graphics("./pngs/Correlation.png")
Now, let’s create the correlation matrix for our Landsat image:
#boxplot(toa13[[4:6]], maxpixels=100000)
cmat <- cor(as.matrix(toa13))
round(cmat, 2)
## ultra.blue blue green red NIR SWIR1 SWIR2
## ultra.blue 1.00 1.00 0.99 0.95 0.73 0.76 0.82
## blue 1.00 1.00 0.99 0.96 0.72 0.77 0.84
## green 0.99 0.99 1.00 0.98 0.78 0.83 0.89
## red 0.95 0.96 0.98 1.00 0.74 0.88 0.94
## NIR 0.73 0.72 0.78 0.74 1.00 0.85 0.78
## SWIR1 0.76 0.77 0.83 0.88 0.85 1.00 0.98
## SWIR2 0.82 0.84 0.89 0.94 0.78 0.98 1.00
It would be nice to get the p-values for every correlation coefficient, isn’t?
Let’s try with a pair of bands:
cor.test(x=as.matrix(toa13[[4]]), y=as.matrix(toa13[[5]]), method="pearson")
##
## Pearson's product-moment correlation
##
## data: as.matrix(toa13[[4]]) and as.matrix(toa13[[5]])
## t = 3747.6, df = 11563326, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7403116 0.7408321
## sample estimates:
## cor
## 0.740572
I explored here how to obtain and display bands statistics. However, I used the R old-style way for visualization. There may be much modern ways to make graphs and charts more informative, clear, and attractive. You can make many changes.
For example, we can use the corrplot package to create a graphical display of a correlation matrix, highlighting the most correlated variables in a data table.
In this plot, correlation coefficients are colored according to the value. Correlation matrix can be also reordered according to the degree of association between variables.
library(corrplot)
## corrplot 0.84 loaded
ncmat <- round(cmat, 2)
corrplot(ncmat, type = "upper", order = "hclust",
tl.col = "black", tl.srt = 45)
The function chart.Correlation()[ in the package PerformanceAnalytics], can be used to display a chart of a correlation matrix.
tmp1 <- as.matrix(sample(toa13,500000))
library(PerformanceAnalytics)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
chart.Correlation(tmp1, histogram=TRUE, pch=19)
The rasterVis package and/or the leaflet package may help for such purpose.
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] PerformanceAnalytics_2.0.4 xts_0.12.1
## [3] zoo_1.8-8 corrplot_0.84
## [5] png_0.1-7 ggplot2_3.3.3
## [7] knitr_1.31 raster_3.4-5
## [9] sp_1.4-5
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 bslib_0.2.4 compiler_4.0.4 pillar_1.4.7
## [5] jquerylib_0.1.3 highr_0.8 tools_4.0.4 digest_0.6.27
## [9] jsonlite_1.7.2 evaluate_0.14 lifecycle_1.0.0 tibble_3.0.6
## [13] gtable_0.3.0 lattice_0.20-41 pkgconfig_2.0.3 rlang_0.4.10
## [17] DBI_1.1.1 yaml_2.2.1 rgdal_1.5-23 xfun_0.21
## [21] withr_2.4.1 dplyr_1.0.4 stringr_1.4.0 generics_0.1.0
## [25] sass_0.3.1 vctrs_0.3.6 tidyselect_1.1.0 grid_4.0.4
## [29] glue_1.4.2 R6_2.5.0 rmarkdown_2.7 purrr_0.3.4
## [33] magrittr_2.0.1 scales_1.1.1 codetools_0.2-18 htmltools_0.5.1.1
## [37] ellipsis_0.3.1 assertthat_0.2.1 colorspace_2.0-0 quadprog_1.5-8
## [41] stringi_1.5.3 munsell_0.5.0 crayon_1.4.1