1 Introduction.

This markdown file is adapted from Applied Spatial Analysis in Public Health online course available from https://hughst.github.io/. We would like to thank the contributors of the course who have provided the datasets and course in open domain.

Spatial interpolation is done to estimate spatial variation in risk of continous spatial variables when the data has been collected from sample point locations. For example, temperature is a continous spatial variable, however, data from meterological stations is available as sample point locations.

One of the common methods employed for spatial interpolation is based on Inverse Distance Weighing (IDW). The present document focuses on its application using R.

1.1 Loading data.

We will be illustrating the process using datasets made available from course github repository. The first dataset (dat) is a csv file containing location details of schools and respective falciparum rates.

dat <- read.csv("https://raw.githubusercontent.com/HughSt/HughSt.github.io/master/course_materials/week1/Lab_files/Data/mal_data_eth_2009_no_dups.csv",header=T)

The second dataset (ethopia_adm) is geographic data from GADM for the study area.

ethopia_adm <- raster::getData("GADM", country="ETH", level=1)

1.2 Understanding data.

names(dat)
##  [1] "country"      "country_id"   "continent_id" "site_id"      "site_name"   
##  [6] "latitude"     "longitude"    "rural_urban"  "year_start"   "lower_age"   
## [11] "upper_age"    "examined"     "pf_pos"       "pf_pr"        "method"      
## [16] "title1"       "citation1"
options(tibble.width = Inf)
head(dat)

2 Inverse Distance Weighing in R

2.1 Creation of observation window.

To undertake IDW interpolation, an observation window (owin) is to be specified as first step. Since the present data used in the course module was from Oromia in Ethopia, an owin for the same is created.

suppressMessages(library(spatstat))
oromia <- ethopia_adm[ethopia_adm$NAME_1=="Oromia",]
obs_window <- owin(oromia@bbox[1,], oromia@bbox[2,])

2.2 Creation of point pattern object.

Once owin is defined, the next step includes creation of point pattern object (ppp) within the specified owin. Marks are added as atttribute value to each point (falciparum rate).

ppp_malaria<-ppp(dat$longitude,dat$latitude,
                         marks=dat$pf_pr,window=obs_window)

2.3 Creating idw object.

IDW object is created using idw function from spatstat library in R. It is important to understand that the interpolation using IDW is determined critically by the power value and “at” argument. We shall be using a power value of 0.05 and “at” argument pixels for illustration.

idw_malaria <- idw(ppp_malaria, power=0.05, at="pixels")

2.4 Visualisation of interpolated results.

plot(idw_malaria,
     col=heat.colors(20), 
     main="Interpolated spatial variation in risk of malaria based on IDW method \n (Power = 0.05)") 

2.5 Interpolation of spatial risk based on points.

To demonstrate the change in visualisation when “at” argument is used in idw function, we shall replicate the above code with changes only in “at” argument.

idw_points <- idw(ppp_malaria, power = 0.05, at = "points")
plot(idw_points, 
     col=heat.colors(64))

2.6 Calculation of mean squared error.

Mean squared error (MSE) is a measure of accuracy of the interpolation method.

Metrics::mse(ppp_malaria$marks, idw_points)
## [1] 0.0001646708

3 Determination of optimal power for idw by cross validation.

3.1 Understanding the need for optimal power

Determination of value of power while undertaking interpolation is of utmost importance. As the power is reduced, smoothness of interpolated spatial distribution of risk is increased. However, with reduced power chance of missing a true high risk region increases. On the other hand, choosing a very high power value may lead to detection of regions with false high risk estimates. To demonstrate, lets look at plots with varying power values.

par(mfrow=c(2,2))
plot(idw(ppp_malaria, power=0.001, at="pixels"),col=heat.colors(20), main="power = 0.001") 
plot(idw(ppp_malaria, power=0.01, at="pixels"),col=heat.colors(20), main="power = 0.01")
plot(idw(ppp_malaria, power=0.1, at="pixels"),col=heat.colors(20), main="power = 0.1")
plot(idw(ppp_malaria, power=10, at="pixels"),col=heat.colors(20), main="power = 10") 

3.2 Cross validating results to obtain lowest error.

To determine optimal value of power for undertaking interpolation using IDW, we can undertake cross validation using leave one out method for point data.

powers <- seq(0.001, 10, 0.01)
mse_result <- NULL
for(power in powers){
  CV_idw <- idw(ppp_malaria, power=power, at="points")
  mse_result <- c(mse_result,
                  Metrics::mse(ppp_malaria$marks,CV_idw))
}
optimal_power <- powers[which.min(mse_result)]
optimal_power
## [1] 1.411
plot(powers, mse_result)

4 Visualisation of results.

suppressMessages(library(raster))
suppressMessages(library(leaflet))
# Convert to a raster
idw_raster <- raster(idw(ppp_malaria, power=optimal_power, at="pixels"),
                                     crs= crs(ethopia_adm))

# Define a color palette
colPal <- colorNumeric(wesanderson::wes_palette("Zissou1")[1:5], idw_raster[], n = 5)

# Plot
leaflet() %>% addTiles() %>% addRasterImage(idw_raster, col = colPal, opacity=0.7) %>%
      addLegend(pal = colPal, values = idw_raster[])
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## ellps WGS 84 in CRS definition: +proj=merc +a=6378137 +b=6378137 +lat_ts=0
## +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum WGS_1984 in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## ellps WGS 84 in CRS definition: +proj=merc +a=6378137 +b=6378137 +lat_ts=0
## +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum WGS_1984 in CRS definition

  1. Ph.D. Scholar, AMCHSS, SCTIMST

  2. Professor, AMCHSS, SCTIMST