Nearest Centroid Sampling Exampple

It’s always easier to look at data than it is to talk about data. So, here’s what I was trying to explain in my last email… I’ll create a two-band raster with random values, and run sample_nc() on it using different parameters to see how those parameters result in different sampling schemes.

First, I’ll run:

nSamp = 10
k = 6

library(sgsR)
library(terra)

## terra 1.7.39

library(sf)

## Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE

# create two-band raster with random values
r1 <- rast(nrows = 100, ncols = 100)
r2 <- r1
values(r1) <- rnorm(10000)
values(r2) <- rnorm(10000)
r <- c(r1,r2)
names(r) <- c("r1", "r2")

# run sample_nc with nSamp = 10 and k = 6
samp <- sample_nc(mraster = r,
                  nSamp = 10,
                  k = 6)

## K-means being performed on 2 layers with 10 centers.

# plot out the pixel values of the resulting samples
plot(r2 ~ r1, data = samp, type = "n",
     xlab = "r1", ylab = "r2")
text(x = samp$r1,
     y = samp$r2,
     labels = samp$kcenter)

Kind of hard to see the numbers, but I labeled points with the cluster ID. As you can see, the variable space as a whole is not very well-sampled. Instead, you only really capture 10 “chunks” of variability within the broader dataset. Within each of those chunks, you wind up with 6 really similar plots.

Now I’ll try it with: - nSamp = 30 - k = 2

# run sample_nc with nSamp = 10 and k = 6
samp <- sample_nc(mraster = r,
                  nSamp = 30,
                  k = 2)

## K-means being performed on 2 layers with 30 centers.

# plot out the pixel values of the resulting samples
plot(r2 ~ r1, data = samp, type = "n",
     xlab = "r1", ylab = "r2")
text(x = samp$r1,
     y = samp$r2,
     labels = samp$kcenter)

…Clearly captures a much wider range of variability.

Nearest Centroid Sampling Exampple

Michael Campbell

2023-07-05