1 Introduction.

Spatial variation in risk estimation is based on type of dataset under consideration.
For point spatial data (e.g. case occurence), smoothing methods are performed whereas for continous spatial data (e.g. temperature) interpolation methods are applied. We will be discussing on smoothing method in this code. The present file has taken enormous inputs from week 03 of Applied Spatial Analysis in Public Health online course available from https://hughst.github.io/. We would like to thank the contributors of the course who have provided the datasets and course in open domain.

2 Data pre-processing.

2.1 Loading data sets

We will be loading datasets as provided in github repository of the course.

dat<-read.csv("https://raw.githubusercontent.com/HughSt/HughSt.github.io/master/course_materials/week3/Lab_files/CaseControl.csv")

The first dataset (dat) is a csv file and includes details regarding 233 malaria occurence related locations in Namibia.

The second dataset (shp) is shapefile for Namibia.

2.2 Understanding csv dataset.

The csv data has location details for 92 cases and 141 controls. The type of variable for cases and controls, labelled as 0 and 1 in the data was integer. The same is thus converted as a factor/ categorical variable in the dataset.

dat$case <- factor(dat$case,
                   levels = c(0,1),
                   labels = c("Controls", "Cases"))

2.3 Sub setting case and control data.

For further analysis, we need to separate cases and control data set. It is required for this illustration code but may not be required in many spatial analyses in epidemiology.

suppressMessages(library(tidyverse))
cases <- dat %>% 
  filter(case=="Cases")
controls <- dat %>% 
  filter(case=="Controls")
table(dat$case)
## 
## Controls    Cases 
##      141       92

2.4 Converting csv to sf object.

library(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
dat_map <- st_as_sf(dat,
                coords = c("long", "lat"))

3 Data visualisation.

3.1 Visualisation of cases and control locations.

3.1.1 Visualisation using ggplot:

ggplot()+ 
  geom_sf(data = dat_map,
                 aes(color = case))+
    labs(title = "Location of malaria cases and controls in Namibia",
       color = "Test result")+
  xlab("Longitude")+
  ylab("Latitude")+
  ggspatial::annotation_north_arrow(location = "br")+
  ggspatial::annotation_scale(location = "bl") 
## Using plotunit = 'm'

3.1.2 Visualisation using leaflet package:

library(leaflet)
leaflet() %>% addTiles() %>% 
  addCircleMarkers(data=dat_map,
                   color = c("#000099", "#009900"),
                   radius = 1) %>% 
  addLegend(colors = c("#000099", "#009900"),
            labels = c("Cases", "Controls"),
            title = "Malaria positivity") %>% 
  addScaleBar("bottomleft",
              options = scaleBarOptions())

4 Use of kernel density method in estimation of spatial variation of case occurence.

4.1 Creation of an observation window.

To estimate spatial variation, as a first step, an observation window object (owin) needs to be created. This is done using owin function from spatstat library. The shape of the observation window can be rectangular, polygonal, binary image or mathematically defined. We will be creating a rectangular owin object.

suppressMessages(library(spatstat))
obs_window <- owin(xrange = range(dat$long), yrange = range(dat$lat))

4.2 Visualisation of observation window.

plot(obs_window, main = "Observation window")

4.3 Creation of a point pattern object.

Kernel density method is applied on a point pattern (ppp) object. The ppp object is created using ppp function from spatstat library. Since we are looking at case data, we will use dataset containing location detail of cases only. Alternatively, we could have used the subset from the complete dataset. It is important to understand that the ppp object is based on x and y cartesian coordinates and is created within the limits as specified in owin object.

ppp_cases <- ppp(x = cases$long, 
                 y = cases$lat, 
                 window = obs_window,
                 check = T)

4.4 Visualisation of ppp object.

plot(ppp_cases, main = "Visualisation of ppp object in observation window")

4.5 Creation of kernel smoothened density of ppp object.

kernel_density <- density(ppp_cases,
                          edge = T,
                          kernel = "gaussian")

4.6 Visualisation of the density estimates.

plot(kernel_density, main = "Spatial variation in case occurence based on kernel density estimates")

4.7 Adjusting bandwith of kernel density estimates and visualisation of results.

We can adjust the bandwidth of the smoothened kernel density estimate. The visualisation of the estimate becomes smoother with increasing bandwidth. Since bandwith is decided apriori, it is important to understand that increasing the bandwidth may lead to missed risk area identification. On the other hand, smaller values may result in detection of spurious regions with increased risk. Hence, the smoothing bandwidth sigma should be a single numeric value, giving the standard deviation of the isotropic Gaussian kernel. Further, bandwidth is selected using cross-validation functions such as “bw.ppl”. To highlight the issue through a simple illustration, lets look with bandwith with arbitary selected values of 0.01,1 and “bw.ppl”:

plot(density(ppp_cases,0.01), main = "Spatial variation in kernel density estimates for bandwidth of 0.01")

And with bandwith of 1:

plot(density(ppp_cases, 1), main = "Spatial variation in kernel density estimates for bandwith of 1")

plot(density(ppp_cases,bw.ppl), main = "Spatial variation in kernel density estimates using bw.ppl method")

5 Visualisation of kernel density estimates on map.

5.1 Raster creation from kernel density estimates.

To plot the kernel density estimates, the kernel density object is required to be converted into a raster.

suppressMessages(library(raster))
map_data <- getData('GADM',country='NAM',level=0)
kernel_raster <- raster(kernel_density, crs = crs(map_data))

5.2 Visualisation using web map.

leaflet() %>% 
  addTiles() %>%
  addRasterImage(kernel_raster,
                 opacity = 0.5)
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## ellps WGS 84 in CRS definition: +proj=merc +a=6378137 +b=6378137 +lat_ts=0
## +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum WGS_1984 in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## ellps WGS 84 in CRS definition: +proj=merc +a=6378137 +b=6378137 +lat_ts=0
## +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum WGS_1984 in CRS definition

6 Kelsall and Diggle method for kernel estimation of relative risk.

It is important to note that when making inferences in epidemiology, it is very important to understand the population at risk. In other words, looking at the maps above, we may infer that the risk is high in northern part of the study area. However, the visualisation may be because of higher number of people living in that area rather than the actual increase in risk. Therefore, when population density across the study area is heterogenous, it is better to calculate rates. Kelsall and Diggle method is a cross validation method which enables calculation of kernel estimates for relative risk.

6.1 Creation of owin object.

We shall use the same owin object (obs_window) as created above.

6.2 Adding marks to the point data.

Addition of marks enables differentiation between cases and controls in the ppp object.

ppp_all <- ppp(dat$long,
               dat$lat,
               window = obs_window,
               marks = as.factor(dat$case))

6.3 Calculation of probability of being a case.

The relrisk function in the spatstat library estimates spatially varying probability of point data attributes.
Given a multitype point pattern, this function estimates the spatially-varying probability of each type of point, or the ratios of such probabilities, using kernel smoothing. The default smoothing bandwidth is selected by cross-validation. bw.relrisk function is applied as cross validation method to estimate bandwidth based on default 32 trial values of smoothing bandwidth and likelihood based CV method (alternative methods include leastsquares and weighted least squares).

prob_case <- relrisk(ppp_all)

6.4 Visualisation of spatial variation in probability of being a case.

plot(prob_case, main = "Spatial variation in probability of being a case")

6.5 Calculation of spatial variation in relative risk.

The relative argument in relrisk is a logical. If FALSE (the default) the algorithm computes the probabilities of each type of point. If TRUE, it computes the relative risk, the ratio of probabilities of each type relative to the probability of a control.

rr <- relrisk(ppp_all, relative = T)

6.6 Visualisation of spatial variation in relative risk.

plot(rr, main = "Spatial variation in relative risk")

6.7 Visualisation of spatial variation in relative risk on map.

rr_raster <- raster(rr, crs =  crs(map_data))
custom_palette <- colorNumeric(palette=topo.colors(64), domain=values(rr_raster), na.color = NA)
leaflet() %>% addTiles() %>% 
      addRasterImage(rr_raster, opacity=0.6, col = custom_palette)
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## ellps WGS 84 in CRS definition: +proj=merc +a=6378137 +b=6378137 +lat_ts=0
## +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum WGS_1984 in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## ellps WGS 84 in CRS definition: +proj=merc +a=6378137 +b=6378137 +lat_ts=0
## +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum WGS_1984 in CRS definition
## Warning in colors(.): Some values were outside the color scale and will be
## treated as NA

  1. Ph.D. Scholar, AMCHSS, SCTIMST

  2. Professor, AMCHSS, SCTIMST