Interpolation in Spatial Analysis

Interpolation is essential for filling in gaps in data across geographic regions, especially useful when we have measurements at specific points but need predictions over a broader area. In this example, we focus on interpolating climate and air quality data across California using terra in R. These interpolation techniques help in creating predictive maps valuable for environmental monitoring, urban planning, agriculture, and public health.

Purpose: - Interpolation for spatial prediction. - Climate & air quality data mapping.

Applications: environmental monitoring, urban planning, public health.

Temperature in California

Load Packages

if (!require("rspat")) remotes::install_github('rspatial/rspat') 

Load Data

library(rspat) 
d <- spat_data('precipitation') 
head(d) 

Calculate Annual Precipitation

  • Sum monthly data
  • plot distribution.
mnts <- toupper(month.abb) ### month
mnts
##  [1] "JAN" "FEB" "MAR" "APR" "MAY" "JUN" "JUL" "AUG" "SEP" "OCT" "NOV" "DEC"
d$prec <- rowSums(d[, mnts]) ### add month
plot(sort(d$prec), ylab="Annual precipitation (mm)", las=1, xlab="Stations") 

Add up monthly precipitation values for each station to get total annual precipitation. Plotting this helps us see how precipitation is spread across stations.

Map Data Points

  • Create SpatVector,
  • define California map boundarie
dsp <- vect(d, c("LONG", "LAT"), crs="+proj=longlat +datum=NAD83") 
head(dsp)
CA <- spat_data("counties") 
head(CA)

Set Color Ranges

  • Define precipitation level groups, color palette.
### define groups for mapping 
cuts <- c(0,200,300,500,1000,3000) 
### set up a palette of interpolated colors 
blues <- colorRampPalette(c('yellow', 'orange', 'blue', 'darkblue'))

Define precipitation levels with color groups (from yellow to dark blue) so we can easily see variations in precipitation on the map.

Plot California Map

  • Overlay precipitation data on California map
plot(CA, col="light gray", lwd=4, border="dark gray") 
plot(dsp, "prec", type="interval", col=blues(10), legend=TRUE,cex=2,breaks=cuts, add=TRUE, plg=list(x=-117.27, y=41.54)) 
lines(CA) 

Draw the California map with precipitation data overlaid. This visualizes precipitation differences across the state.

Transform Projection (Teale Albers)

  • Use Teale Albers for accurate CA mapping.
TA <- "+proj=aea +lat_1=34 +lat_2=40.5 +lat_0=0 +lon_0=-120 +x_0=0 +y_0=-4000000 +datum=WGS84 +units=m"
dta <- project(dsp, TA) 
cata <- project(CA, TA) 

Change the map projection to Teale Albers. This reduces map distortion and makes it more accurate for local analysis.

NULL model

Root Mean Square Error (RMSE)

  • Define RMSE function,
  • calculate Null model RMSE.
RMSE <- function(observed, predicted) { 
  sqrt(mean((predicted - observed)^2, na.rm=TRUE)) }

null <- RMSE(mean(dsp$prec), dsp$prec) 
null 
## [1] 435.3217

Define the RMSE function, which calculates the average error between predicted and actual values. The “Null model” is a baseline measurement to compare with our later models.

proximity polygons

Proximity polygon / nearest neighbour

  • Create Voronoi Polygons for spatial interpolation.
v <- voronoi(dta) 
plot(v) 
points(dta) 

Build polygons around each station point so each region represents the closest measurement. This is a simple way to estimate precipitation across areas between stations.

Crop & Rasterize

  • Clip polygons to California, convert to raster
vca <- crop(v, cata) 
plot(vca, "prec") 

Clip out areas outside California from the polygons

rasterizethe result

r <- rast(vca, res=10000) 
vr <- rasterize(vca, r, "prec") 
plot(vr) 

convert them to a grid (raster). This grid is easier to analyze and plot.

5-Fold Cross-Validation

  • 5-fold cross-validation to evaluate this model
set.seed(5132015) 
kf <- sample(1:5, nrow(dta), replace=TRUE) 
rmse <- rep(NA, 5) 
for (k in 1:5) { 
  test <- dta[kf == k, ] 
  train <- dta[kf != k, ] 
  v <- voronoi(train) 
  p <- extract(v, test) 
  rmse[k] <- RMSE(test$prec, p$prec) } 

rmse 
## [1] 192.0568 203.1304 183.5556 177.5523 205.6921
mean(rmse)
## [1] 192.3974

Relative model performance

perf <- 1 - (mean(rmse) / null) 
round(perf, 3) 
## [1] 0.558

Divide the data into five parts and test the model’s accuracy on each part. This helps check how well the model predicts precipitation at stations not used in training.

Nearest neighbour interpolation

Data

library(gstat) 
d <- data.frame(geom(dta)[,c("x", "y")], as.data.frame(dta)) 
head(d) 

Interpolate

gs <- gstat(formula=prec~1, locations=~x+y, data=d, nmax=5, set=list(idp = 0)) 
nn <- interpolate(r, gs, debug.level=0) 
nnmsk <- mask(nn, vr) 
plot(nnmsk, 1) 

Use a nearest-neighbor method to estimate precipitation in locations without data. It looks at nearby points to make predictions, useful when data points are sparse.

Cross-Validate Nearest Neighbor Model

rmsenn <- rep(NA, 5) 
for (k in 1:5) { 
  test <- d[kf == k, ] 
  train <- d[kf != k, ] 
  gscv <- gstat(formula=prec~1, locations=~x+y, data=train, nmax=5, set=list(idp = 0)) 
  p <- predict(gscv, test, debug.level=0)$var1.pred 
  rmsenn[k] <- RMSE(test$prec, p) } 
rmsenn 
## [1] 215.0993 209.5838 197.0604 177.1946 189.8130
mean(rmsenn) 
## [1] 197.7502
1 - (mean(rmsenn) / null) 
## [1] 0.5457377

Test the accuracy of the nearest-neighbor method using cross-validation. This step shows how accurate this model is compared to others.

Calfornia Air Pollution data

Load, transform, and visualize air quality data for California.

x <- rspat::spat_data("airqual") 
x$OZDLYAV <- x$OZDLYAV * 1000 
x <- vect(x, c("LONGITUDE", "LATITUDE"), crs="+proj=longlat +datum=WGS84") 

Download air quality data for California. We transform it to a compatible format and scale it for analysis. This is useful to analyze the effect of air quality on public health.

SpatVector and transform to Teale Albers

TAkm <- "+proj=aea +lat_1=34 +lat_2=40.5 +lat_0=0 +lon_0=-120 +x_0=0 +y_0=-4000000 +datum=WGS84 +units=km" 
aq <- project(x, TAkm) 

SpatRaster to interpolate

ca <- project(CA, TAkm) 
r <- rast(ca) 
res(r) <- 10

Fit a variogram

p <- data.frame(geom(aq)[, c("x", "y")], as.data.frame(aq)) 
gs <- gstat(formula=OZDLYAV~1, locations=~x+y, data=p) 
v <- variogram(gs, width=20) 
v 

Plot

plot(v) 

fit a model variogram

fve <- fit.variogram(v, vgm(85, "Exp", 75, 20)) 
fve 

Plot Data

plot(variogramLine(fve, 400), type='l', ylim=c(0,120))
points(v[,2:3], pch=20, col='red') 

Ordinary kriging

k <- gstat(formula=OZDLYAV~1, locations=~x+y, data=p, model=fve) ### predicted values 
kp <- interpolate(r, k, debug.level=0) 
ok <- mask(kp, ca) 
names(ok) <- c('prediction', 'variance') 
plot(ok) 

Compare with other methods

idm <- gstat(formula=OZDLYAV~1, locations=~x+y, data=p) 
idp <- interpolate(r, idm, debug.level=0) 
idp <- mask(idp, ca) 
plot(idp, 1) 

Fit a variogram model to understand how data points are related over distance. Then, apply kriging, a sophisticated interpolation method that predicts values in unsampled areas. This step is especially useful for accurate predictions in environmental studies.

Applications:

Environment: Climate, drought, resource management. Public Health: Air quality, health impact zones. Urban Planning: Infrastructure, risk assessments.