This markdown file is adapted from Applied Spatial Analysis in Public Health online course available from https://hughst.github.io/. We would like to thank the contributors of the course who have provided the datasets and course in open domain.
Spatial interpolation is done to estimate spatial variation in risk of continous spatial variables when the data has been collected from sample point locations. For example, temperature is a continous spatial variable, however, data from meterological stations is available as sample point locations.
One of the common methods employed for spatial interpolation is based on Inverse Distance Weighing (IDW). The present document focuses on its application using R.
We will be illustrating the process using datasets made available from course github repository. The first dataset (dat) is a csv file containing location details of schools and respective falciparum rates.
dat <- read.csv("https://raw.githubusercontent.com/HughSt/HughSt.github.io/master/course_materials/week1/Lab_files/Data/mal_data_eth_2009_no_dups.csv",header=T)
The second dataset (ethopia_adm) is geographic data from GADM for the study area.
ethopia_adm <- raster::getData("GADM", country="ETH", level=1)
names(dat)
## [1] "country" "country_id" "continent_id" "site_id" "site_name"
## [6] "latitude" "longitude" "rural_urban" "year_start" "lower_age"
## [11] "upper_age" "examined" "pf_pos" "pf_pr" "method"
## [16] "title1" "citation1"
options(tibble.width = Inf)
head(dat)
To undertake IDW interpolation, an observation window (owin) is to be specified as first step. Since the present data used in the course module was from Oromia in Ethopia, an owin for the same is created.
suppressMessages(library(spatstat))
oromia <- ethopia_adm[ethopia_adm$NAME_1=="Oromia",]
obs_window <- owin(oromia@bbox[1,], oromia@bbox[2,])
Once owin is defined, the next step includes creation of point pattern object (ppp) within the specified owin. Marks are added as atttribute value to each point (falciparum rate).
ppp_malaria<-ppp(dat$longitude,dat$latitude,
marks=dat$pf_pr,window=obs_window)
IDW object is created using idw function from spatstat library in R. It is important to understand that the interpolation using IDW is determined critically by the power value and “at” argument. We shall be using a power value of 0.05 and “at” argument pixels for illustration.
idw_malaria <- idw(ppp_malaria, power=0.05, at="pixels")
plot(idw_malaria,
col=heat.colors(20),
main="Interpolated spatial variation in risk of malaria based on IDW method \n (Power = 0.05)")
To demonstrate the change in visualisation when “at” argument is used in idw function, we shall replicate the above code with changes only in “at” argument.
idw_points <- idw(ppp_malaria, power = 0.05, at = "points")
plot(idw_points,
col=heat.colors(64))
Mean squared error (MSE) is a measure of accuracy of the interpolation method.
Metrics::mse(ppp_malaria$marks, idw_points)
## [1] 0.0001646708
Determination of value of power while undertaking interpolation is of utmost importance. As the power is reduced, smoothness of interpolated spatial distribution of risk is increased. However, with reduced power chance of missing a true high risk region increases. On the other hand, choosing a very high power value may lead to detection of regions with false high risk estimates. To demonstrate, lets look at plots with varying power values.
par(mfrow=c(2,2))
plot(idw(ppp_malaria, power=0.001, at="pixels"),col=heat.colors(20), main="power = 0.001")
plot(idw(ppp_malaria, power=0.01, at="pixels"),col=heat.colors(20), main="power = 0.01")
plot(idw(ppp_malaria, power=0.1, at="pixels"),col=heat.colors(20), main="power = 0.1")
plot(idw(ppp_malaria, power=10, at="pixels"),col=heat.colors(20), main="power = 10")
To determine optimal value of power for undertaking interpolation using IDW, we can undertake cross validation using leave one out method for point data.
powers <- seq(0.001, 10, 0.01)
mse_result <- NULL
for(power in powers){
CV_idw <- idw(ppp_malaria, power=power, at="points")
mse_result <- c(mse_result,
Metrics::mse(ppp_malaria$marks,CV_idw))
}
optimal_power <- powers[which.min(mse_result)]
optimal_power
## [1] 1.411
plot(powers, mse_result)
suppressMessages(library(raster))
suppressMessages(library(leaflet))
# Convert to a raster
idw_raster <- raster(idw(ppp_malaria, power=optimal_power, at="pixels"),
crs= crs(ethopia_adm))
# Define a color palette
colPal <- colorNumeric(wesanderson::wes_palette("Zissou1")[1:5], idw_raster[], n = 5)
# Plot
leaflet() %>% addTiles() %>% addRasterImage(idw_raster, col = colPal, opacity=0.7) %>%
addLegend(pal = colPal, values = idw_raster[])
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## ellps WGS 84 in CRS definition: +proj=merc +a=6378137 +b=6378137 +lat_ts=0
## +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum WGS_1984 in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## ellps WGS 84 in CRS definition: +proj=merc +a=6378137 +b=6378137 +lat_ts=0
## +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum WGS_1984 in CRS definition