1. Introduction: The Power of Data Access in R

In modern data science, the ability to programmatically access, download, and integrate data is a critical skill. It ensures that our analyses are reproducible, scalable, and can easily incorporate the latest available information. The R ecosystem contains a powerful suite of packages that act as clients for major open data repositories, transforming R into a self-contained environment for the entire spatial analysis workflow—from data acquisition to analysis and visualization.

This module will introduce several key R packages for downloading different types of spatial and spatio-temporal data. We will cover:

  • Administrative Boundaries: Country and sub-national polygons.
  • Environmental Data: Climate, precipitation, and elevation rasters.
  • Infrastructure & Points of Interest: Roads, buildings, and amenities from OpenStreetMap.
  • Socio-Economic Indicators: National statistics from the World Bank.
  • Health & Demographics: Survey data from the Demographic and Health Surveys (DHS) Program.

Throughout this module, we will use practical examples from a global, regional (East Africa), and local (Somalia and Somaliland) context, building on the skills you learned in Modules II and III.

2. Administrative Boundaries (rnaturalearth)

A base map of administrative boundaries (countries, states, regions) is the foundation for most spatial analysis and visualization. The rnaturalearth package provides an easy way to download high-quality, global vector data from Natural Earth.

Example: Download and plot the boundaries of Somaliland

The ne_countries() function can fetch specific countries, while ne_states() can retrieve first-level administrative divisions (regions or states).

# You may need to install the high-resolution data package first
# devtools::install_github("ropensci/rnaturalearthhires")

library(rnaturalearth)
library(sf)
library(ggplot2)

# Download the country boundary for Somaliland
# Note: Somaliland is treated as a country in the rnaturalearth dataset
somaliland_country <- ne_countries(country = "Somaliland", 
                                   scale = "medium", 
                                   returnclass = "sf")

# Download the administrative regions (states) for Somaliland
somaliland_regions <- ne_states(country = "Somaliland", returnclass = "sf")

# Plot the regions
ggplot(data = somaliland_regions) +
  geom_sf(aes(fill = name)) +
  ggtitle("Administrative Regions of Somaliland") +
  labs(fill = "Region Name") +
  theme_bw()
Administrative Regions of Somaliland, downloaded using the rnaturalearth package.

Administrative Regions of Somaliland, downloaded using the rnaturalearth package.

Interpretation: Having access to these administrative polygons is crucial. We can use them as base maps, for joining statistical data (as seen in Module II), or for clipping and masking raster data to our specific study area (as seen in Module III).

3. Climatic and Geographic Data (geodata)

The geodata package is the modern successor to the raster::getData() function and provides access to a wealth of geographic data, including climate (from WorldClim), elevation, and land cover.

Example: Download average minimum temperature for Somalia

The worldclim_country() function downloads monthly climate data for a specified country and variable. The result is a SpatRaster object from the terra package, with 12 layers representing each month.

library(geodata)
library(terra)

# Download monthly minimum temperature data for Somalia
# var = "tmin" for minimum temp, "tmax" for max, "prec" for precipitation
# path = tempdir() saves the files to a temporary directory for this R session
som_tmin <- worldclim_country(country = "Somalia", var = "tmin", path = tempdir())

# The output is a SpatRaster with 12 layers. Let's calculate the annual mean.
som_avg_tmin <- mean(som_tmin)

# Plot the result
plot(som_avg_tmin, 
     main = "Average Annual Minimum Temperature in Somalia (°C * 10)",
     plg = list(title = "Temp (°C * 10)"))
Average annual minimum temperature in Somalia, downloaded using the geodata package.

Average annual minimum temperature in Somalia, downloaded using the geodata package.

Interpretation: The map clearly shows spatial variation, with cooler minimum temperatures in the northern highlands and warmer temperatures along the coast and southern regions. This data is vital for agricultural planning, ecological modeling, and understanding climate patterns. Note: WorldClim temperature data is often stored as integer values multiplied by 10 to save space.

4. High-Resolution Precipitation Data (chirps)

For more fine-grained temporal analysis, such as studying daily or monthly rainfall patterns, the chirps package is invaluable. It provides access to the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS), a high-resolution (0.05 degrees) dataset available from 1981 to the near-present.

Example: Get daily precipitation for Hargeisa

We can query precipitation for a specific point location over a defined time period.

library(chirps)
library(ggplot2)

# Define the location for Hargeisa, Somaliland
hargeisa_loc <- data.frame(lon = 44.0697, lat = 9.5625)

# Get daily precipitation data for the last few years
# The "ClimateSERV" server is often faster for point-based queries
precip_hargeisa <- get_chirps(hargeisa_loc, 
                              dates = c("2020-01-01", "2022-12-31"),
                              server = "ClimateSERV")

# Plot the time series
ggplot(precip_hargeisa, aes(x = date, y = chirps)) +
  geom_line(color = "dodgerblue") +
  labs(title = "Daily Precipitation in Hargeisa (2020-2022)",
       y = "Precipitation (mm)",
       x = "Date") +
  theme_minimal()
Time series of daily precipitation in Hargeisa from the CHIRPS dataset.

Time series of daily precipitation in Hargeisa from the CHIRPS dataset.

Interpretation: This time-series plot allows us to identify the rainy seasons (Gu and Deyr), dry spells (Jilaal), and extreme rainfall events. This type of data is fundamental for drought monitoring, flood risk assessment, and food security analysis.

5. Elevation Data (elevatr)

Topography is a key driver of many environmental processes. The elevatr package provides an easy interface to download digital elevation models (DEMs) from various sources, including Amazon Web Services (AWS) Terrain Tiles.

Example: Get an elevation raster for Somaliland

We can provide an sf object (like the country boundary we downloaded earlier) to define the area of interest.

library(elevatr)
library(terra)

# We use the somaliland_country sf object from Section 2
# z determines the zoom level (and thus resolution) of the data
# clip = "locations" will clip the raster to the exact boundary of our sf object
somaliland_elev <- get_elev_raster(locations = somaliland_country, z = 8, clip = "locations")

# The result is a RasterLayer, let's convert to SpatRaster and plot with terra
plot(rast(somaliland_elev), 
     main = "Elevation in Somaliland",
     plg = list(title = "Elevation (m)"))
Digital Elevation Model (DEM) for Somaliland from the elevatr package.

Digital Elevation Model (DEM) for Somaliland from the elevatr package.

Interpretation: The elevation map clearly shows the rugged highlands in the central and northern parts of Somaliland. This topography is directly linked to the higher rainfall and cooler temperatures we observed in the previous sections, a phenomenon known as orographic lift.

6. OpenStreetMap (OSM) Data (osmdata)

OpenStreetMap is a global, collaborative project to create a free, editable map of the world. The osmdata package allows you to query this massive database for features like roads, rivers, buildings, hospitals, schools, and more.

The workflow involves: 1. Defining a bounding box (getbb()). 2. Building a query (opq()). 3. Adding the desired feature (add_osm_feature()). 4. Downloading the data as an sf object (osmdata_sf()).

Example: Find hospitals in Mogadishu

library(osmdata)
library(sf)

# 1. Get the bounding box for Mogadishu
mog_bb <- getbb("Mogadishu")

# 2-4. Build query and download data for amenities tagged as 'hospital'
hospitals_mog <- opq(bbox = mog_bb) %>%
  add_osm_feature(key = "amenity", value = "hospital") %>%
  osmdata_sf()

# The result is a list of sf objects (points, lines, polygons, etc.)
# Let's look at the hospital points, which are stored in the $osm_points element
print(hospitals_mog$osm_points)
## Simple feature collection with 170 features and 9 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 45.29562 ymin: 2.013057 xmax: 45.38787 ymax: 2.091325
## Geodetic CRS:  WGS 84
## First 10 features:
##                osm_id name addr:street amenity barrier fixme healthcare name:en
## 1223371658 1223371658 <NA>        <NA>    <NA>    <NA>  <NA>       <NA>    <NA>
## 1388586669 1388586669 <NA>        <NA>    <NA>    <NA>  <NA>       <NA>    <NA>
## 1388586696 1388586696 <NA>        <NA>    <NA>    <NA>  <NA>       <NA>    <NA>
## 1388586755 1388586755 <NA>        <NA>    <NA>    <NA>  <NA>       <NA>    <NA>
## 1388586844 1388586844 <NA>        <NA>    <NA>    <NA>  <NA>       <NA>    <NA>
## 1388586850 1388586850 <NA>        <NA>    <NA>    <NA>  <NA>       <NA>    <NA>
## 1391087964 1391087964 <NA>        <NA>    <NA>    <NA>  <NA>       <NA>    <NA>
## 1391087967 1391087967 <NA>        <NA>    <NA>    <NA>  <NA>       <NA>    <NA>
## 1391087971 1391087971 <NA>        <NA>    <NA>    <NA>  <NA>       <NA>    <NA>
## 1391087976 1391087976 <NA>        <NA>    <NA>    <NA>  <NA>       <NA>    <NA>
##            phone                  geometry
## 1223371658  <NA> POINT (45.29665 2.029179)
## 1388586669  <NA> POINT (45.33368 2.039844)
## 1388586696  <NA>  POINT (45.3334 2.040118)
## 1388586755  <NA> POINT (45.33312 2.040742)
## 1388586844  <NA> POINT (45.33337 2.041466)
## 1388586850  <NA>  POINT (45.3336 2.041726)
## 1391087964  <NA> POINT (45.30667 2.045059)
## 1391087967  <NA> POINT (45.30843 2.045142)
## 1391087971  <NA> POINT (45.30918 2.045181)
## 1391087976  <NA> POINT (45.30665 2.045619)

Interpretation: This data provides the precise locations of critical health infrastructure. For a health data scientist, this is invaluable for calculating travel times to care, assessing service coverage, or planning resource allocation during a public health emergency.

7. Socio-Economic Data from the World Bank (wbstats)

The World Bank is a primary source for global and national-level development indicators. The wbstats package provides a direct interface to the World Bank’s API, allowing you to search for and download thousands of indicators.

Example: Compare female labor force participation in East Africa

First, we can search for relevant indicators, then download the data for a specific set of countries and years.

library(wbstats)
library(tidyverse)

# Search for indicators related to labor force
# We will only show the first few results for brevity
head(wb_search(pattern = "labor force participation rate"))
## # A tibble: 6 × 3
##   indicator_id  indicator                                         indicator_desc
##   <chr>         <chr>                                             <chr>         
## 1 9.0.Labor.All Labor Force Participation Rate (%)                Share of the …
## 2 9.0.Labor.B40 Labor Force Participation Rate (%)-Bottom 40 Per… Share of the …
## 3 9.0.Labor.T60 Labor Force Participation Rate (%)-Top 60 Percent Share of the …
## 4 9.1.Labor.All Labor Force Participation Rate (%), Male          Share of the …
## 5 9.1.Labor.B40 Labor Force Participation Rate (%)-Bottom 40 Per… Share of the …
## 6 9.1.Labor.T60 Labor Force Participation Rate (%)-Top 60 Percen… Share of the …
# We will use "SL.TLF.CACT.FM.ZS" - Ratio of female to male labor force participation rate
# Download data for East African countries
east_africa_lfp <- wb_data(country = c("SOM", "ETH", "KEN", "TZA", "DJI"), 
                           indicator = "SL.TLF.CACT.FM.ZS", 
                           start_date = 1990, end_date = 2021)

# Plot the trends over time
ggplot(east_africa_lfp, aes(x = date, y = SL.TLF.CACT.FM.ZS, color = country)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  labs(title = "Female to Male Labor Force Participation Ratio in East Africa",
       subtitle = "Modeled ILO Estimate (%)",
       y = "Ratio (Female/Male %)",
       x = "Year",
       color = "Country") +
  theme_minimal()
Time series of the female-to-male labor force participation ratio in East Africa.

Time series of the female-to-male labor force participation ratio in East Africa.

Interpretation: This chart reveals distinct trends in gender parity in the labor market across the region. Such data is essential for research in economics, development studies, and public policy, helping to track progress towards gender equality goals.

8. Health and Demographic Data (rdhs)

The Demographic and Health Surveys (DHS) Program provides some of the most important and widely used datasets in global health. The rdhs package allows users to query the DHS API, find surveys, and download datasets directly into R.

Important Note: Accessing DHS data requires you to register for an account on the DHS Program website and get your project approved. You will use your credentials to authenticate within R.

Since DHS data for Somalia is not currently available through the API, we will demonstrate how to find surveys for neighboring Kenya and Ethiopia.

Example: Find available DHS surveys for Kenya and Ethiopia

# NOTE: The following code will not run without authentication.
# You must first register on the DHS website and then run the
# set_rdhs_config() command with your credentials.
    
library(rdhs)

# You would first authenticate with your credentials (run this once per session)
# set_rdhs_config(email = "your_email@example.com", project = "Your Project Title")

# Find available surveys for Kenya and Ethiopia
surveys <- dhs_surveys(countryIds = c("KE", "ET"))

# View the key information about the surveys found
print(surveys[, c("CountryName", "SurveyYear", "SurveyType", "DHS_CountryCode")])

Below is the example output you would see after running the code above:

     CountryName SurveyYear SurveyType DHS_CountryCode
1       Ethiopia       2019        MIS              ET
2       Ethiopia       2016        DHS              ET
3       Ethiopia       2011        DHS              ET
4       Ethiopia       2005        DHS              ET
5       Ethiopia       2000        DHS              ET
6          Kenya       2022        DHS              KE
7          Kenya       2015        MIS              KE
8          Kenya       2014        DHS              KE
9          Kenya       2010        AIS              KE
10         Kenya    2008-09        DHS              KE
11         Kenya       2007        MIS              KE
12         Kenya       2003        DHS              KE

Interpretation: This output shows us all the surveys (DHS, Malaria Indicator Survey - MIS, etc.) available for these countries. From here, a researcher could use other rdhs functions like dhs_datasets() to find specific data files (e.g., the women’s recode, household recode) and get_datasets() to download them for analysis. This provides a reproducible pathway to accessing rich microdata on fertility, child mortality, nutrition, and many other key health topics.

9. Conclusion

This module has demonstrated the immense power and convenience of accessing open spatial data directly within the R environment. By leveraging packages like rnaturalearth, geodata, chirps, elevatr, osmdata, wbstats, and rdhs, we can build complex, multi-layered datasets for sophisticated analysis without ever leaving our R script.

This programmatic approach is the cornerstone of modern, reproducible research. It allows us to seamlessly integrate administrative, environmental, socio-economic, and health data to answer pressing questions in data science, public health, and beyond. As you move forward, we encourage you to explore the detailed documentation for each of these packages to unlock their full potential. ````