In modern data science, the ability to programmatically access, download, and integrate data is a critical skill. It ensures that our analyses are reproducible, scalable, and can easily incorporate the latest available information. The R ecosystem contains a powerful suite of packages that act as clients for major open data repositories, transforming R into a self-contained environment for the entire spatial analysis workflow—from data acquisition to analysis and visualization.
This module will introduce several key R packages for downloading different types of spatial and spatio-temporal data. We will cover:
Throughout this module, we will use practical examples from a global, regional (East Africa), and local (Somalia and Somaliland) context, building on the skills you learned in Modules II and III.
rnaturalearth)A base map of administrative boundaries (countries, states, regions)
is the foundation for most spatial analysis and visualization. The
rnaturalearth package provides an easy way to download
high-quality, global vector data from Natural Earth.
Example: Download and plot the boundaries of Somaliland
The ne_countries() function can fetch specific
countries, while ne_states() can retrieve first-level
administrative divisions (regions or states).
# You may need to install the high-resolution data package first
# devtools::install_github("ropensci/rnaturalearthhires")
library(rnaturalearth)
library(sf)
library(ggplot2)
# Download the country boundary for Somaliland
# Note: Somaliland is treated as a country in the rnaturalearth dataset
somaliland_country <- ne_countries(country = "Somaliland",
scale = "medium",
returnclass = "sf")
# Download the administrative regions (states) for Somaliland
somaliland_regions <- ne_states(country = "Somaliland", returnclass = "sf")
# Plot the regions
ggplot(data = somaliland_regions) +
geom_sf(aes(fill = name)) +
ggtitle("Administrative Regions of Somaliland") +
labs(fill = "Region Name") +
theme_bw()Administrative Regions of Somaliland, downloaded using the rnaturalearth package.
Interpretation: Having access to these administrative polygons is crucial. We can use them as base maps, for joining statistical data (as seen in Module II), or for clipping and masking raster data to our specific study area (as seen in Module III).
geodata)The geodata package is the modern successor to the
raster::getData() function and provides access to a wealth
of geographic data, including climate (from WorldClim), elevation, and
land cover.
Example: Download average minimum temperature for Somalia
The worldclim_country() function downloads monthly
climate data for a specified country and variable. The result is a
SpatRaster object from the terra package, with
12 layers representing each month.
library(geodata)
library(terra)
# Download monthly minimum temperature data for Somalia
# var = "tmin" for minimum temp, "tmax" for max, "prec" for precipitation
# path = tempdir() saves the files to a temporary directory for this R session
som_tmin <- worldclim_country(country = "Somalia", var = "tmin", path = tempdir())
# The output is a SpatRaster with 12 layers. Let's calculate the annual mean.
som_avg_tmin <- mean(som_tmin)
# Plot the result
plot(som_avg_tmin,
main = "Average Annual Minimum Temperature in Somalia (°C * 10)",
plg = list(title = "Temp (°C * 10)"))Average annual minimum temperature in Somalia, downloaded using the geodata package.
Interpretation: The map clearly shows spatial variation, with cooler minimum temperatures in the northern highlands and warmer temperatures along the coast and southern regions. This data is vital for agricultural planning, ecological modeling, and understanding climate patterns. Note: WorldClim temperature data is often stored as integer values multiplied by 10 to save space.
chirps)For more fine-grained temporal analysis, such as studying daily or
monthly rainfall patterns, the chirps package is
invaluable. It provides access to the Climate Hazards Group
InfraRed Precipitation with Station data (CHIRPS), a high-resolution
(0.05 degrees) dataset available from 1981 to the near-present.
Example: Get daily precipitation for Hargeisa
We can query precipitation for a specific point location over a defined time period.
library(chirps)
library(ggplot2)
# Define the location for Hargeisa, Somaliland
hargeisa_loc <- data.frame(lon = 44.0697, lat = 9.5625)
# Get daily precipitation data for the last few years
# The "ClimateSERV" server is often faster for point-based queries
precip_hargeisa <- get_chirps(hargeisa_loc,
dates = c("2020-01-01", "2022-12-31"),
server = "ClimateSERV")
# Plot the time series
ggplot(precip_hargeisa, aes(x = date, y = chirps)) +
geom_line(color = "dodgerblue") +
labs(title = "Daily Precipitation in Hargeisa (2020-2022)",
y = "Precipitation (mm)",
x = "Date") +
theme_minimal()Time series of daily precipitation in Hargeisa from the CHIRPS dataset.
Interpretation: This time-series plot allows us to
identify the rainy seasons (Gu and Deyr), dry
spells (Jilaal), and extreme rainfall events. This type of
data is fundamental for drought monitoring, flood risk assessment, and
food security analysis.
elevatr)Topography is a key driver of many environmental processes. The
elevatr package provides an easy interface to download
digital elevation models (DEMs) from various sources, including Amazon
Web Services (AWS) Terrain Tiles.
Example: Get an elevation raster for Somaliland
We can provide an sf object (like the country boundary
we downloaded earlier) to define the area of interest.
library(elevatr)
library(terra)
# We use the somaliland_country sf object from Section 2
# z determines the zoom level (and thus resolution) of the data
# clip = "locations" will clip the raster to the exact boundary of our sf object
somaliland_elev <- get_elev_raster(locations = somaliland_country, z = 8, clip = "locations")
# The result is a RasterLayer, let's convert to SpatRaster and plot with terra
plot(rast(somaliland_elev),
main = "Elevation in Somaliland",
plg = list(title = "Elevation (m)"))Digital Elevation Model (DEM) for Somaliland from the elevatr package.
Interpretation: The elevation map clearly shows the rugged highlands in the central and northern parts of Somaliland. This topography is directly linked to the higher rainfall and cooler temperatures we observed in the previous sections, a phenomenon known as orographic lift.
osmdata)OpenStreetMap is a global, collaborative project to create a free,
editable map of the world. The osmdata package allows you
to query this massive database for features like roads, rivers,
buildings, hospitals, schools, and more.
The workflow involves: 1. Defining a bounding box
(getbb()). 2. Building a query (opq()). 3.
Adding the desired feature (add_osm_feature()). 4.
Downloading the data as an sf object
(osmdata_sf()).
Example: Find hospitals in Mogadishu
library(osmdata)
library(sf)
# 1. Get the bounding box for Mogadishu
mog_bb <- getbb("Mogadishu")
# 2-4. Build query and download data for amenities tagged as 'hospital'
hospitals_mog <- opq(bbox = mog_bb) %>%
add_osm_feature(key = "amenity", value = "hospital") %>%
osmdata_sf()
# The result is a list of sf objects (points, lines, polygons, etc.)
# Let's look at the hospital points, which are stored in the $osm_points element
print(hospitals_mog$osm_points)## Simple feature collection with 170 features and 9 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 45.29562 ymin: 2.013057 xmax: 45.38787 ymax: 2.091325
## Geodetic CRS: WGS 84
## First 10 features:
## osm_id name addr:street amenity barrier fixme healthcare name:en
## 1223371658 1223371658 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 1388586669 1388586669 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 1388586696 1388586696 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 1388586755 1388586755 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 1388586844 1388586844 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 1388586850 1388586850 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 1391087964 1391087964 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 1391087967 1391087967 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 1391087971 1391087971 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 1391087976 1391087976 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## phone geometry
## 1223371658 <NA> POINT (45.29665 2.029179)
## 1388586669 <NA> POINT (45.33368 2.039844)
## 1388586696 <NA> POINT (45.3334 2.040118)
## 1388586755 <NA> POINT (45.33312 2.040742)
## 1388586844 <NA> POINT (45.33337 2.041466)
## 1388586850 <NA> POINT (45.3336 2.041726)
## 1391087964 <NA> POINT (45.30667 2.045059)
## 1391087967 <NA> POINT (45.30843 2.045142)
## 1391087971 <NA> POINT (45.30918 2.045181)
## 1391087976 <NA> POINT (45.30665 2.045619)
Interpretation: This data provides the precise locations of critical health infrastructure. For a health data scientist, this is invaluable for calculating travel times to care, assessing service coverage, or planning resource allocation during a public health emergency.
wbstats)The World Bank is a primary source for global and national-level
development indicators. The wbstats package provides a
direct interface to the World Bank’s API, allowing you to search for and
download thousands of indicators.
Example: Compare female labor force participation in East Africa
First, we can search for relevant indicators, then download the data for a specific set of countries and years.
library(wbstats)
library(tidyverse)
# Search for indicators related to labor force
# We will only show the first few results for brevity
head(wb_search(pattern = "labor force participation rate"))## # A tibble: 6 × 3
## indicator_id indicator indicator_desc
## <chr> <chr> <chr>
## 1 9.0.Labor.All Labor Force Participation Rate (%) Share of the …
## 2 9.0.Labor.B40 Labor Force Participation Rate (%)-Bottom 40 Per… Share of the …
## 3 9.0.Labor.T60 Labor Force Participation Rate (%)-Top 60 Percent Share of the …
## 4 9.1.Labor.All Labor Force Participation Rate (%), Male Share of the …
## 5 9.1.Labor.B40 Labor Force Participation Rate (%)-Bottom 40 Per… Share of the …
## 6 9.1.Labor.T60 Labor Force Participation Rate (%)-Top 60 Percen… Share of the …
# We will use "SL.TLF.CACT.FM.ZS" - Ratio of female to male labor force participation rate
# Download data for East African countries
east_africa_lfp <- wb_data(country = c("SOM", "ETH", "KEN", "TZA", "DJI"),
indicator = "SL.TLF.CACT.FM.ZS",
start_date = 1990, end_date = 2021)
# Plot the trends over time
ggplot(east_africa_lfp, aes(x = date, y = SL.TLF.CACT.FM.ZS, color = country)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
labs(title = "Female to Male Labor Force Participation Ratio in East Africa",
subtitle = "Modeled ILO Estimate (%)",
y = "Ratio (Female/Male %)",
x = "Year",
color = "Country") +
theme_minimal()Time series of the female-to-male labor force participation ratio in East Africa.
Interpretation: This chart reveals distinct trends in gender parity in the labor market across the region. Such data is essential for research in economics, development studies, and public policy, helping to track progress towards gender equality goals.
rdhs)The Demographic and Health Surveys (DHS) Program provides some of the
most important and widely used datasets in global health. The
rdhs package allows users to query the DHS API, find
surveys, and download datasets directly into R.
Important Note: Accessing DHS data requires you to register for an account on the DHS Program website and get your project approved. You will use your credentials to authenticate within R.
Since DHS data for Somalia is not currently available through the API, we will demonstrate how to find surveys for neighboring Kenya and Ethiopia.
Example: Find available DHS surveys for Kenya and Ethiopia
# NOTE: The following code will not run without authentication.
# You must first register on the DHS website and then run the
# set_rdhs_config() command with your credentials.
library(rdhs)
# You would first authenticate with your credentials (run this once per session)
# set_rdhs_config(email = "your_email@example.com", project = "Your Project Title")
# Find available surveys for Kenya and Ethiopia
surveys <- dhs_surveys(countryIds = c("KE", "ET"))
# View the key information about the surveys found
print(surveys[, c("CountryName", "SurveyYear", "SurveyType", "DHS_CountryCode")])Below is the example output you would see after running the code above:
CountryName SurveyYear SurveyType DHS_CountryCode
1 Ethiopia 2019 MIS ET
2 Ethiopia 2016 DHS ET
3 Ethiopia 2011 DHS ET
4 Ethiopia 2005 DHS ET
5 Ethiopia 2000 DHS ET
6 Kenya 2022 DHS KE
7 Kenya 2015 MIS KE
8 Kenya 2014 DHS KE
9 Kenya 2010 AIS KE
10 Kenya 2008-09 DHS KE
11 Kenya 2007 MIS KE
12 Kenya 2003 DHS KE
Interpretation: This output shows us all the surveys
(DHS, Malaria Indicator Survey - MIS, etc.) available for these
countries. From here, a researcher could use other rdhs
functions like dhs_datasets() to find specific data files
(e.g., the women’s recode, household recode) and
get_datasets() to download them for analysis. This provides
a reproducible pathway to accessing rich microdata on fertility, child
mortality, nutrition, and many other key health topics.
This module has demonstrated the immense power and convenience of
accessing open spatial data directly within the R environment. By
leveraging packages like rnaturalearth,
geodata, chirps, elevatr,
osmdata, wbstats, and rdhs, we
can build complex, multi-layered datasets for sophisticated analysis
without ever leaving our R script.
This programmatic approach is the cornerstone of modern, reproducible research. It allows us to seamlessly integrate administrative, environmental, socio-economic, and health data to answer pressing questions in data science, public health, and beyond. As you move forward, we encourage you to explore the detailed documentation for each of these packages to unlock their full potential. ````