Interpolation is essential for filling in gaps in data across geographic regions, especially useful when we have measurements at specific points but need predictions over a broader area. In this example, we focus on interpolating climate and air quality data across California using terra in R. These interpolation techniques help in creating predictive maps valuable for environmental monitoring, urban planning, agriculture, and public health.
Purpose: - Interpolation for spatial prediction. - Climate & air quality data mapping.
Applications: environmental monitoring, urban planning, public health.
if (!require("rspat")) remotes::install_github('rspatial/rspat')
library(rspat)
d <- spat_data('precipitation')
head(d)
mnts <- toupper(month.abb) ### month
mnts
## [1] "JAN" "FEB" "MAR" "APR" "MAY" "JUN" "JUL" "AUG" "SEP" "OCT" "NOV" "DEC"
d$prec <- rowSums(d[, mnts]) ### add month
plot(sort(d$prec), ylab="Annual precipitation (mm)", las=1, xlab="Stations")
Add up monthly precipitation values for each station to get total annual
precipitation. Plotting this helps us see how precipitation is spread
across stations.
dsp <- vect(d, c("LONG", "LAT"), crs="+proj=longlat +datum=NAD83")
head(dsp)
CA <- spat_data("counties")
head(CA)
### define groups for mapping
cuts <- c(0,200,300,500,1000,3000)
### set up a palette of interpolated colors
blues <- colorRampPalette(c('yellow', 'orange', 'blue', 'darkblue'))
Define precipitation levels with color groups (from yellow to dark blue) so we can easily see variations in precipitation on the map.
plot(CA, col="light gray", lwd=4, border="dark gray")
plot(dsp, "prec", type="interval", col=blues(10), legend=TRUE,cex=2,breaks=cuts, add=TRUE, plg=list(x=-117.27, y=41.54))
lines(CA)
Draw the California map with precipitation data overlaid. This
visualizes precipitation differences across the state.
TA <- "+proj=aea +lat_1=34 +lat_2=40.5 +lat_0=0 +lon_0=-120 +x_0=0 +y_0=-4000000 +datum=WGS84 +units=m"
dta <- project(dsp, TA)
cata <- project(CA, TA)
Change the map projection to Teale Albers. This reduces map distortion and makes it more accurate for local analysis.
RMSE <- function(observed, predicted) {
sqrt(mean((predicted - observed)^2, na.rm=TRUE)) }
null <- RMSE(mean(dsp$prec), dsp$prec)
null
## [1] 435.3217
Define the RMSE function, which calculates the average error between predicted and actual values. The “Null model” is a baseline measurement to compare with our later models.
v <- voronoi(dta)
plot(v)
points(dta)
Build polygons around each station point so each region represents the
closest measurement. This is a simple way to estimate precipitation
across areas between stations.
vca <- crop(v, cata)
plot(vca, "prec")
Clip out areas outside California from the polygons
r <- rast(vca, res=10000)
vr <- rasterize(vca, r, "prec")
plot(vr)
convert them to a grid (raster). This grid is easier to analyze and
plot.
set.seed(5132015)
kf <- sample(1:5, nrow(dta), replace=TRUE)
rmse <- rep(NA, 5)
for (k in 1:5) {
test <- dta[kf == k, ]
train <- dta[kf != k, ]
v <- voronoi(train)
p <- extract(v, test)
rmse[k] <- RMSE(test$prec, p$prec) }
rmse
## [1] 192.0568 203.1304 183.5556 177.5523 205.6921
mean(rmse)
## [1] 192.3974
perf <- 1 - (mean(rmse) / null)
round(perf, 3)
## [1] 0.558
Divide the data into five parts and test the model’s accuracy on each part. This helps check how well the model predicts precipitation at stations not used in training.
library(gstat)
d <- data.frame(geom(dta)[,c("x", "y")], as.data.frame(dta))
head(d)
gs <- gstat(formula=prec~1, locations=~x+y, data=d, nmax=5, set=list(idp = 0))
nn <- interpolate(r, gs, debug.level=0)
nnmsk <- mask(nn, vr)
plot(nnmsk, 1)
Use a nearest-neighbor method to estimate precipitation in locations
without data. It looks at nearby points to make predictions, useful when
data points are sparse.
rmsenn <- rep(NA, 5)
for (k in 1:5) {
test <- d[kf == k, ]
train <- d[kf != k, ]
gscv <- gstat(formula=prec~1, locations=~x+y, data=train, nmax=5, set=list(idp = 0))
p <- predict(gscv, test, debug.level=0)$var1.pred
rmsenn[k] <- RMSE(test$prec, p) }
rmsenn
## [1] 215.0993 209.5838 197.0604 177.1946 189.8130
mean(rmsenn)
## [1] 197.7502
1 - (mean(rmsenn) / null)
## [1] 0.5457377
Test the accuracy of the nearest-neighbor method using cross-validation. This step shows how accurate this model is compared to others.
x <- rspat::spat_data("airqual")
x$OZDLYAV <- x$OZDLYAV * 1000
x <- vect(x, c("LONGITUDE", "LATITUDE"), crs="+proj=longlat +datum=WGS84")
Download air quality data for California. We transform it to a compatible format and scale it for analysis. This is useful to analyze the effect of air quality on public health.
TAkm <- "+proj=aea +lat_1=34 +lat_2=40.5 +lat_0=0 +lon_0=-120 +x_0=0 +y_0=-4000000 +datum=WGS84 +units=km"
aq <- project(x, TAkm)
ca <- project(CA, TAkm)
r <- rast(ca)
res(r) <- 10
p <- data.frame(geom(aq)[, c("x", "y")], as.data.frame(aq))
gs <- gstat(formula=OZDLYAV~1, locations=~x+y, data=p)
v <- variogram(gs, width=20)
v
plot(v)
fve <- fit.variogram(v, vgm(85, "Exp", 75, 20))
fve
plot(variogramLine(fve, 400), type='l', ylim=c(0,120))
points(v[,2:3], pch=20, col='red')
k <- gstat(formula=OZDLYAV~1, locations=~x+y, data=p, model=fve) ### predicted values
kp <- interpolate(r, k, debug.level=0)
ok <- mask(kp, ca)
names(ok) <- c('prediction', 'variance')
plot(ok)
idm <- gstat(formula=OZDLYAV~1, locations=~x+y, data=p)
idp <- interpolate(r, idm, debug.level=0)
idp <- mask(idp, ca)
plot(idp, 1)
Fit a variogram model to understand how data points are related over
distance. Then, apply kriging, a sophisticated interpolation method that
predicts values in unsampled areas. This step is especially useful for
accurate predictions in environmental studies.
Environment: Climate, drought, resource management. Public Health: Air quality, health impact zones. Urban Planning: Infrastructure, risk assessments.