# Load essential packages here to avoid clutter later
library(sf)
## Linking to GEOS 3.11.2, GDAL 3.8.2, PROJ 9.3.1; sf_use_s2() is TRUE
library(sp)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(mapview)
library(tmap)
## Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
## remotes::install_github('r-tmap/tmap')
library(geosphere)
library(spData)
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
library(terra)
## terra 1.7.71
library(spatstat)
## Loading required package: spatstat.data
## Loading required package: spatstat.geom
## spatstat.geom 3.2-9
##
## Attaching package: 'spatstat.geom'
## The following objects are masked from 'package:terra':
##
## area, delaunay, is.empty, rescale, rotate, shift, where.max,
## where.min
## The following object is masked from 'package:geosphere':
##
## perimeter
## Loading required package: spatstat.random
## spatstat.random 3.2-3
## Loading required package: spatstat.explore
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
## spatstat.explore 3.2-7
## Loading required package: spatstat.model
## Loading required package: rpart
## spatstat.model 3.2-11
## Loading required package: spatstat.linnet
## spatstat.linnet 3.1-5
##
## spatstat 3.0-8
## For an introduction to spatstat, type 'beginner'
library(SpatialEpi)
library(rgdal)
## Please note that rgdal will be retired during October 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
## See https://r-spatial.org/r/2023/05/15/evolution4.html and https://github.com/r-spatial/evolution
## rgdal: version: 1.6-7, (SVN revision 1203)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.8.2, released 2023/16/12
## Path to GDAL shared files: C:/Users/Admin/AppData/Local/R/win-library/4.3/rgdal/gdal
## GDAL does not use iconv for recoding strings.
## GDAL binary built with GEOS: TRUE
## Loaded PROJ runtime: Rel. 9.3.1, December 1st, 2023, [PJ_VERSION: 931]
## Path to PROJ shared files: C:/Users/Admin/AppData/Local/R/win-library/4.3/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:2.1-3
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
##
## Attaching package: 'rgdal'
## The following object is masked from 'package:terra':
##
## project
Welcome to Module I of “Spatial Analysis and Disease Mapping.” This module lays the foundation for understanding spatial data, spatial concepts, and how we can leverage these to investigate health-related phenomena. We will introduce the principles of spatial analysis, explore different types of spatial data, and demonstrate how these data types are relevant to disease mapping and related public health investigations. Throughout the module, we will use R as our primary tool for data manipulation, visualization, and analysis.
Upon completion of this module, students will be able to:
Spatial analysis is a powerful set of analytical techniques used to study spatial phenomena. It involves the application of various methods to examine spatial data, looking for patterns, relationships, and variations that may be linked to underlying spatial processes. In essence, spatial analysis goes beyond traditional statistical analyses by explicitly considering the spatial context of data. This is particularly important in disease mapping where geographic location is a crucial determinant of disease occurrence and spread.
Before proceeding with data types, let us understand some of the key concepts that underpins spatial analysis:
Spatial data comes in different forms, each with its own characteristics and analytical applications. We will cover three major types relevant to health mapping:
Areal data represents spatial information aggregated within predefined areas. These areas can be administrative boundaries (e.g., districts, counties, regions), census tracts, or any other defined spatial units. The data associated with these areas often represents summaries of events, characteristics, or attributes (e.g., counts, rates, averages).
Let us generate a map of prevalence of women reporting an unmet need for family planning, from the most recent DHS data for Kenya in 2022.
# Load the DHS data. You would need to download the spatial data from the DHS website
# and prepare it as a sf object for use.
# Here, I will be using a fake dataset to simulate the real situation.
kenya_map <- st_read("path/to/your/kenya_admin_shapefile.shp", quiet=TRUE)
# Create fake data to simulate DHS indicators (you can replace this with actual data)
set.seed(123)
kenya_data <- data.frame(
district = unique(kenya_map$admin_name),
unmet_need_percentage = sample(10:40, length(unique(kenya_map$admin_name)), replace=TRUE),
anc_coverage = sample(20:90, length(unique(kenya_map$admin_name)), replace = TRUE)
)
# Merge the map data with our simulated data based on the common field (e.g., district or county name)
kenya_map_merged <- left_join(kenya_map, kenya_data, by = c("admin_name" = "district"))
# Interactive visualization of the map
mapview(kenya_map_merged, zcol="unmet_need_percentage", main = "Unmet Need for Family Planning in Kenya",
map.types = "CartoDB.Positron")
# Basic static plot of the map
ggplot(kenya_map_merged) +
geom_sf(aes(fill = unmet_need_percentage)) +
labs(title = "Unmet Need for Family Planning in Kenya",
fill = "Unmet need (%)") +
theme_minimal()
Point pattern data represent individual locations or events as points in space. Each point represents a specific occurrence or observation, without reference to administrative units. It’s essential to understand the locations and spatial arrangement of these points rather than aggregated data across areas.
# Simulate point pattern of disease cases
set.seed(456)
n_cases <- 150
x_coords <- runif(n_cases, min = 0, max = 100)
y_coords <- runif(n_cases, min = 0, max = 100)
disease_points <- data.frame(x = x_coords, y = y_coords)
# Convert to spatial points object
coordinates(disease_points) <- ~ x + y
proj4string(disease_points) <- CRS("+proj=utm +zone=37 +datum=WGS84")
# Generate a base map
plot(disease_points, pch=16, main="Simulated Disease Cases", xlab="Longitude", ylab="Latitude")
# Let's generate an intensity map for the data
# Generate a spatial window from the point data
window <- as.owin(disease_points)
# convert to point pattern object
points_pp <- as.ppp(disease_points, W=window)
plot(density(points_pp), main="Kernel Density of Simulated Disease Cases")
plot(points_pp, add=TRUE, pch=16, cex=0.4)
* **Disease Clustering**: Detecting spatial clusters of disease cases to identify potential sources of outbreaks or areas requiring targeted interventions.
* **Accessibility**: Mapping locations of healthcare facilities relative to the population to analyze spatial accessibility and plan health service delivery.
* **Spatial epidemiology**: Investigating environmental risk factors by examining spatial association between health outcomes and environmental exposure
Geostatistical data consists of values of a variable measured or estimated at specific locations across a continuous spatial field. The data is considered continuous in space, allowing for interpolation and prediction of values at unobserved locations based on known samples.
# Simulate geostatistical data
set.seed(789)
n_stations <- 50
x_coords <- runif(n_stations, min = 0, max = 100)
y_coords <- runif(n_stations, min = 0, max = 100)
# Simulate temperature data using a Gaussian process model
temp <- 10 + 0.01 * (x_coords^2 + y_coords^2) + rnorm(n_stations, 0, 3)
temp_data <- data.frame(x = x_coords, y = y_coords, temperature = temp)
# Convert to spatial points dataframe
coordinates(temp_data) <- ~ x + y
gridded(temp_data) <- TRUE #convert the data to a grid
proj4string(temp_data) <- CRS("+proj=utm +zone=37 +datum=WGS84")
# Generate raster map
# Create a raster object from the geostatistical data
raster_temp <- raster(temp_data, layer="temperature")
plot(raster_temp, main = "Simulated Temperature Data")
# Plot the points
temp_points <- st_as_sf(as.data.frame(temp_data), coords = c("x", "y"))
plot(temp_points, add=TRUE, pch =16, col="black", cex = 0.5)
In this module, we introduced the basic principles of spatial analysis and distinguished between different types of spatial data (areal, point patterns, and geostatistical). You learned to visualize these types of data in R. Understanding spatial data types is crucial in our disease mapping and spatial analysis journey, as different data types require unique analytical approaches. We will build upon this foundation in subsequent modules, moving into more advanced methods of spatial analysis and their application to disease mapping and public health.
In the next module, we will learn about data wrangling and manipulation and how we can join and integrate different spatial data types to be able to model using more complex datasets.