Introduction

This document presents a spatial analysis of iron supplement usage among pregnant women in Somalia. Utilizing data from the Demographic and Health Survey (DHS), we investigate the spatial distribution of iron supplementation and examine patterns of spatial autocorrelation. The analysis employs various spatial techniques, including spatial mapping, as well as spatial autocorrelation statistics such as Global and Local Moran’s I, and the Getis-Ord Gi* statistic. Understanding the geographic distribution of iron supplement usage and the associated spatial factors is critical for the development of targeted public health interventions to improve maternal health.

Data and Methods

Libraries

First, We Load the Required R Libraries for Data Manipulation, Spatial Analysis, and Visualization.

# sf: For handling spatial vector data (shape-files).
library(sf)
## Warning: package 'sf' was built under R version 4.3.2
## Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
# ggplot2: For creating statistic and publication-quality graphics.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.3
# spdep: For spatial analysis and spatial autocorrelation.
library(spdep)
## Warning: package 'spdep' was built under R version 4.3.3
## Loading required package: spData
## Warning: package 'spData' was built under R version 4.3.3
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
# `tmap`: For creating thematic maps.
library(tmap)
## Warning: package 'tmap' was built under R version 4.3.3
## Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
## remotes::install_github('r-tmap/tmap')
# haven: For reading data files from various statistical packages (including .dta).
library(haven)
## Warning: package 'haven' was built under R version 4.3.2
# survey: For complex survey analysis
library(survey)
## Warning: package 'survey' was built under R version 4.3.3
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
## 
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
## 
##     dotchart
# `RColorBrewer`: Provides color schemes for maps and charts.
library(RColorBrewer)

Loading Data

We Loaded the Somalia Shape-file and the DHS Data, Followed by Data Cleaning and Pre-processing.

# Loading shape-file
# Specifying the file path to the shape-file representing the administrative boundaries of Somalia.
filepath<-"~/gadm41_SOM_1.shp"
# Read the shape-file using st_read() from the sf package.
somalia<-st_read(filepath)
## Reading layer `gadm41_SOM_1' from data source 
##   `C:\Users\Dr. Abdalla\Documents\gadm41_SOM_1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 18 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 40.9785 ymin: -1.647082 xmax: 51.4157 ymax: 11.98931
## Geodetic CRS:  WGS 84
# Printing the names of the first-level administrative units (regions) in Somalia.
somalia$NAME_1
##  [1] "Awdal"             "Bakool"            "Banaadir"         
##  [4] "Bari"              "Bay"               "Galguduud"        
##  [7] "Gedo"              "Hiiraan"           "Jubbada Dhexe"    
## [10] "Jubbada Hoose"     "Mudug"             "Nugaal"           
## [13] "Sanaag"            "Shabeellaha Dhexe" "Shabeellaha Hoose"
## [16] "Sool"              "Togdheer"          "Woqooyi Galbeed"
# Loading the  DHS data
# Specifying the file path to the DHS data set (.dta file).
ari<- read_dta("~/iron_supplement.dta")
# Print the structure of the 'V024' column, which likely contains region codes.
table(ari$Region,ari$dv)
##     
##        0   1
##   11 364  14
##   12 543  26
##   13 542  58
##   14 614  51
##   15 600  67
##   16 405  30
##   17 404  23
##   18 409  13
##   19 359  35
##   20 364  21
##   21 381  21
##   22 765  78
##   24 135   7
##   25 465  24
##   26 459  14
##   28 432  25

Calculating Iron Supplement Usage Proportion

We Calculated the Proportion of Acute Respiratory Infection Proportion by Region, Using Survey Weights from the DHS Data.

# Calculating the proportion iron supplement usage by region
# Applying the function to subsets of the survey data using svyby
# Creating a survey design object
options(survey.lonely.psu = "remove")
design <- svydesign(id = ~V001, strata = ~strata, weights = ~V005, data = ari,nest=TRUE)
result <- svyby(~dv, ~Region, FUN = svyciprop, design = design, vartype = NULL)
# Printing the resulting data frame showing the ARI proportion per region.
print(result)
##    Region         dv se.as.numeric(dv)
## 11     11 0.03707391       0.007723611
## 12     12 0.04945374       0.020752710
## 13     13 0.07526237       0.013620723
## 14     14 0.07644767       0.012353170
## 15     15 0.11146332       0.014702889
## 16     16 0.03996272       0.010319417
## 17     17 0.04087708       0.013633493
## 18     18 0.03068000       0.007578060
## 19     19 0.06642998       0.014663044
## 20     20 0.04702224       0.010831259
## 21     21 0.06429351       0.017160979
## 22     22 0.09279603       0.015700370
## 24     24 0.04929577       0.016290078
## 25     25 0.05589389       0.010792092
## 26     26 0.03062573       0.009791850
## 28     28 0.07993757       0.020571733
# Adding region names to the result data frame
# Defining a character vector with the names of the regions, ordered to match the codes in the result.
labels <- c("Awdal", "Woqooyi Galbeed", "Togdheer", "Sool", "Sanaag", "Bari", "Nugaal", "Mudug",
            "Galguduud", "Hiiraan", "Shabeellaha Dhexe", "Banaadir",  "Bay",
            "Bakool", "Gedo",  "Jubbada Hoose")
# Adding a 'Region' column to the result data frame and mapping region names to region codes.
result$Region <- labels
# Sorting the result by the 'Region' column
result <- result[order(result$Region), ]
# Printing the resulting data frame that is now labeled and sorted.
print(result)
##               Region         dv se.as.numeric(dv)
## 11             Awdal 0.03707391       0.007723611
## 25            Bakool 0.05589389       0.010792092
## 22          Banaadir 0.09279603       0.015700370
## 16              Bari 0.03996272       0.010319417
## 24               Bay 0.04929577       0.016290078
## 19         Galguduud 0.06642998       0.014663044
## 26              Gedo 0.03062573       0.009791850
## 20           Hiiraan 0.04702224       0.010831259
## 28     Jubbada Hoose 0.07993757       0.020571733
## 18             Mudug 0.03068000       0.007578060
## 17            Nugaal 0.04087708       0.013633493
## 15            Sanaag 0.11146332       0.014702889
## 21 Shabeellaha Dhexe 0.06429351       0.017160979
## 14              Sool 0.07644767       0.012353170
## 13          Togdheer 0.07526237       0.013620723
## 12   Woqooyi Galbeed 0.04945374       0.020752710

Merging Data

The Shape-file and the Calculated Iron Supplement Usage Proportions are Merged Using a Common Identifier, Region Name. We also Calculated the Centroids of Each Region, which will be used for Placing Labels on the Spatial Map.

# Merging the shape-file and result data frame
# Merging the shape-file data (`somalia`) with the calculated mean ARI data (`result`).
merged_data <- merge(somalia, result, by.x = "NAME_1", by.y = "Region")

# Calculating centroids for label placement
# Calculating the centroids (geometric center points) of each polygon in the merged data set.
centroid <- st_centroid(merged_data)
## Warning: st_centroid assumes attributes are constant over geometries
# Extracting x-coordinates of centroids and adding them as column named 'x_position'
merged_data$x_position <- centroid[, 1]
# Extracting y-coordinates of centroids and adding them as column named 'y_position'
merged_data$y_position <- centroid[, 2]

# Converting the Spatial-polygons data frame to sf object
# Converting merged_data which is a Spatial-polygons data frame into an sf object for use in ggplot
merged_data_sf <- st_as_sf(merged_data)

Spatial Autocorrelation Analysis

Spatial Weights Matrix

We Construct a Spatial Weights Matrix Based on Queen Contiguity to Delineate the Neighborhood Structure of the Spatial Units Under Study. We Excluded Records with Missing Values for the Iron Supplement usage, as They do not Contribute to the Analysis.

# Exploratory Spatial Data Analysis
# Spatial Autocorrelation Analysis

# 1. Creating spatial weights matrix: Queen Contiguity
# Removing rows with NA values in the `dv` column (mean fuel use)
merged_data <- merged_data[!is.na(merged_data$dv), ]
# Creating a neighborhood object based on queen contiguity between regions.
nb <- poly2nb(merged_data, queen = TRUE)
# Createing a spatial weights list from the neighborhood object using the 'W' style (row-standardized weights).
nbw <- nb2listw(nb, style = "W")

Global Moran’s I

We Calculated the Global Moran’s I Statistic to Assess Overall Spatial Autocorrelation. This Step Allows us to Determine Whether the Data is Randomly Distributed or Spatially Clustered. A Statistically Significant Moran’s I Indicates the Presence of Significant Spatial Clustering.

# 2. Calculating Global Moran's I
set.seed(123)
# Calculating Global Moran's I statistic for the ARI data.
gmoran <- moran.test(merged_data$dv, nbw, alternative = "greater")
# Printing the output of the Moran's I test.
print(gmoran)
## 
##  Moran I test under randomisation
## 
## data:  merged_data$dv  
## weights: nbw    
## 
## Moran I statistic standard deviate = 0.43003, p-value = 0.3336
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.02142375       -0.06666667        0.04196241
# Extracting and print the Moran's I statistic.
print(gmoran[["estimate"]][["Moran I statistic"]])
## [1] 0.02142375
# Extracting and print the z-score of the test.
print(gmoran[["statistic"]])
## Moran I statistic standard deviate 
##                          0.4300297
# Extracting and print the p-value of the test.
print(gmoran[["p.value"]])
## [1] 0.333587
# Interpretation:
# - If p-value < 0.05 (significance level), reject the null hypothesis of no spatial autocorrelation.
# - Conclusion: There is evidence of spatial autocorrelation in ARI.

# Moran's for MC
# Conducting a Monte Carlo simulation for Moran's I to assess significance.
gmoranMC <- moran.mc(merged_data$dv, nbw, nsim = 999)
# Printing the output of the Monte Carlo simulation
print(gmoranMC)
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  merged_data$dv 
## weights: nbw  
## number of simulations + 1: 1000 
## 
## statistic = 0.021424, observed rank = 691, p-value = 0.309
## alternative hypothesis: greater
# Open a new graphics window.
windows()
# Creating a histogram of simulated Moran's I values.
hist(gmoranMC$res)
# Adding a vertical line at the observed Moran's I statistic.
abline(v = gmoranMC$statistic, col = "red")

Local Moran’s I

Next, We Calculated Local Moran’s I to Identify Specific Regions with Statistically Significant Local Clusters. This Statistic Helps us Pinpoint Both “Hot Spots” and “Cold Spots” within the Data.

# 4. Calculating Local Moran's I
# Calculating Local Moran's I for each region.
lmoran <- localmoran(merged_data$dv, nbw,
                     alternative = "greater")
# Printing the first few rows of the local Moran's I results
print(head(lmoran))
##           Ii         E.Ii      Var.Ii       Z.Ii Pr(z > E(Ii))
## 1  0.4271349 -0.064572790 0.966450315  0.5001695     0.3084779
## 2  0.1110391 -0.001456349 0.006647899  1.3797254     0.0838356
## 3  0.3364516 -0.148433630 2.022417400  0.3409601     0.3665668
## 4 -0.6481736 -0.048825139 0.212302831 -1.3007731     0.9033319
## 5  0.3128040 -0.012966868 0.095076266  1.0565163     0.1453662
## 6 -0.1692819 -0.006844923 0.031076892 -0.9214378     0.8215891
# 5. Adding Local Moran's results to merged_data
# Adding the local Moran's I statistic to the merged data set.
merged_data$lmI <- lmoran[, "Ii"]
# Adding the z-scores of local Moran's I to the merged data set.
merged_data$lmZ <- lmoran[, "Z.Ii"]
# Adding p-values of local Moran's I to the merged data set.
merged_data$lmp <- lmoran[, "Pr(z > E(Ii))"]

Getis-Ord Gi*

We computed the Getis-Ord Gi* statistic to identify spatial clusters of both high and low values within the data set.

# 6. Calculating Getis-Ord Gi*
# Calculating the Getis-Ord Gi* statistic for each region.
local_gi <- localG(merged_data$dv, nbw)
# Adding Getis-Ord Gi* statistic to the merged data set
merged_data$gi <- local_gi

# Converting the Getis-Ord Gi* results into z-score to use for plotting
# Calculating z-scores for the Getis-Ord Gi* statistic.
z_gi <- (local_gi - mean(local_gi)) / sd(local_gi)
# Adding the z-scores of Gi* to the merged data set.
merged_data$zgi <- z_gi

# Calculating P-values for Getis-Ord statistic:
# The original localG function doesn't directly give p-values, we need to calculate them based on the z-scores:

merged_data$p_gi <- pnorm(abs(merged_data$zgi), lower.tail = FALSE)*2

Spatial Visualization

We visualized the spatial data using tmap to display the spatial patterns of UBA and the associated statistics.

# Creating maps using tmap
# Set tmap mode to "view" for interactive map viewing.
tmap_mode("view")
## tmap mode set to interactive viewing

Figure 1: Spatial Distribution of Iron Supplement Usage and Spatial Lag

The first visualization illustrates the spatial distribution of Iron supplement usage, emphasizing the spatial variations in the proportion of ARI across Somalia. Additionally, the spatial lag map will highlight the influence of neighboring regions on the ARI distribution.

#################################################
# Figure 1 : Spatial Map of dv and its Spatial Lag Map
# (Visualizing Data & Spatial Context)
#################################################


# Map 1: Iron Supplement Usage by region
p1 <- tm_shape(merged_data) +
  # Plotting polygons filled based on `dv` (Iron Supplement Usage) using quantile breaks.
  tm_polygons(col = "dv", title = "Iron Supplement Usage", style = "quantile") +
# Adjusting layout so that the legend is outside of the plot.
  tm_layout(legend.outside = TRUE)

# Calculate spatial lag variable
merged_data$dv_lag <- lag.listw(nbw, merged_data$dv)
# Creating the spatial lag map using tmap
p2 <- tm_shape(merged_data) +
  tm_polygons("dv_lag", title = "Spatial Lag of Iron Supplement Usage", style = "quantile") +
  tm_layout(legend.outside = TRUE)

tmap_arrange(p1, p2, ncol = 2)

Figure 2: Global Moran’s I

The second visualization is the Normal Distribution Curve Plot, which displays the significance level of the global Moran’s I value. This plot helps the reader assess the likelihood that the spatial distribution of the data is random.

###################################################
# Figure 2 Global Moran's I (Overall Spatial Dependence)
###################################################

# Creating the Normal Distribution Curve Plot
# Generating data for the normal distribution curve
set.seed(123)
x <- seq(-4, 4, length = 200)
y <- dnorm(x)
df <- data.frame(x, y)

# Defining the critical z-scores
critical_z <- c(-2.58, -1.96, -1.65, 1.65, 1.96, 2.58)

# Defining colors for the legend
legend_colors <- c("blue", "lightblue", "yellow", "lightcoral", "red")
legend_labels <- c("< -2.58", "-2.58 - -1.96", "-1.96 - -1.65", "1.65 - 1.96", "1.96 - 2.58", "> 2.58")

# Creating the plot
p3 <- ggplot(df, aes(x, y)) +
  geom_line(color = "black") +
  # Add shaded areas for different significance levels
  geom_area(aes(x = ifelse(x < critical_z[1], x, NA)), fill = "blue", alpha = 0.5) +
  geom_area(aes(x = ifelse(x > critical_z[1] & x < critical_z[2], x, NA)), fill = "lightblue", alpha = 0.5) +
  geom_area(aes(x = ifelse(x > critical_z[2] & x < critical_z[3], x, NA)), fill = "yellow", alpha = 0.5) +
  geom_area(aes(x = ifelse(x > critical_z[3] & x < critical_z[4], x, NA)), fill = "yellow", alpha = 0.5) +
  geom_area(aes(x = ifelse(x > critical_z[4] & x < critical_z[5], x, NA)), fill = "lightcoral", alpha = 0.5) +
  geom_area(aes(x = ifelse(x > critical_z[5], x, NA)), fill = "red", alpha = 0.5) +
  geom_vline(xintercept = gmoran[["statistic"]], color = "red", linetype = "dashed", linewidth = 1) +
  # Add critical value lines and labels
  geom_vline(xintercept = critical_z, color = "gray", linetype = "dotted") +
  annotate("text", x = critical_z, y = 0, label = critical_z, vjust = 1.5, size = 3, color = "black") +
  # Add text annotations for z-score and p-value
  annotate("text", x = gmoran[["statistic"]], y = max(y)/2,
           label = paste("z-score:", round(gmoran[["statistic"]], 2)),
           size = 3, color = "red", hjust = 0, vjust = -1) +
  annotate("text", x = 0, y = max(y) * 0.3,
           label = "Random",
           size = 3, color = "black", hjust = 0.5, vjust = -1) +
  annotate("text", x = -3, y = max(y) * 0.3,
           label = "Significant",
           size = 3, color = "black", hjust = 0.5, vjust = -1) +
  annotate("text", x = 3, y = max(y) * 0.3,
           label = "Significant",
           size = 3, color = "black", hjust = 0.5, vjust = -1) +
  # Add text annotations for Moran's I, z-score, and p-value
  annotate("text", x = -3.5, y = max(y) * 0.8,
           label = paste("Moran's Index:", round(gmoran[["estimate"]][["Moran I statistic"]], 5)),
           size = 2.5, color = "black", hjust = 0) +
  annotate("text", x = -3.5, y = max(y) * 0.7,
           label = paste("z-score:", round(gmoran[["statistic"]], 5)),
           size = 2.5, color = "black", hjust = 0) +
  annotate("text", x = -3.5, y = max(y) * 0.6,
           label = paste("p-value:", format(gmoran[["p.value"]], scientific = FALSE, digits = 5)),
           size = 2.5, color = "black", hjust = 0) +
  labs(title = "Normal Distribution Curve", x = "Z-score", y = "Density") +
  theme_minimal() +
  theme(plot.title = element_text(size=9,face = "bold", color = "blue"),
        legend.position = "right",
        panel.spacing = unit(0.05, "lines"),
        plot.margin = margin(t = 0.2, r = 0.2, b = 0.2, l = 0.2, "cm")
  ) +
  # Adding the legend using annotate
  annotate("rect", xmin = 4.1, xmax = 4.3, ymin = seq(0.05, 0.45, length.out = length(legend_colors)),
           ymax = seq(0.1, 0.5, length.out = length(legend_colors)), fill = legend_colors, alpha = 0.5) +
  annotate("text", x = 4.4, y = seq(0.075, 0.475, length.out = length(legend_labels)),
           label = legend_labels, hjust = 0, size = 2.5) +
  annotate("text", x = 4.2, y = 0.55, label = "Critical Value (z-score)", hjust = 0.5, size = 2.5, fontface = "bold")+
  coord_cartesian(xlim = c(-4,5), ylim = c(0,0.6))


# Displaying the plot
windows()
print(p3)
## Warning: Removed 164 rows containing non-finite outside the scale range
## (`stat_align()`).
## Warning: Removed 185 rows containing non-finite outside the scale range
## (`stat_align()`).
## Warning: Removed 192 rows containing non-finite outside the scale range
## (`stat_align()`).
## Warning: Removed 118 rows containing non-finite outside the scale range
## (`stat_align()`).
## Warning: Removed 192 rows containing non-finite outside the scale range
## (`stat_align()`).
## Warning: Removed 149 rows containing non-finite outside the scale range
## (`stat_align()`).

Figure 3: Local Moran’s I and Significance

The third visualization presents the Local Moran’s I and its p-values, identifying regions with statistically significant clusters of high and low iron supplement usage

#################################################
# Figure 3 Local Moran's I and its p-values
# (Pinpointing Local Clusters)
#################################################


# Map 4: Local Moran's I statistic
p4 <- tm_shape(merged_data) +
  # Plotting the polygons filled based on the `lmI` variable using quantile breaks.
  tm_polygons(col = "lmI", title = "Local Moran's I",
              style = "quantile") +
  # Adjusting the  layout so that the legend is outside of the plot.
  tm_layout(legend.outside = TRUE)



# Map 5: P-value for Local Moran's I
p5 <- tm_shape(merged_data) +
  # Plotting polygons filled based on `lmp` (p-values) with specified breaks
  tm_polygons(col = "lmp", title = "p-value",
              breaks = c(-Inf, 0.05, Inf)) +
  # Adjusting layout so that the legend is outside of the plot.
  tm_layout(legend.outside = TRUE)

tmap_arrange(p4,p5, ncol = 2)
## Variable(s) "lmI" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Figure 4: Getis-Ord Gi* and Significance

The fourth visualization displays the Getis-Ord Gi* statistic and its p-values, providing an alternative method to evaluate clustering by identifying statistically significant hot spots and cold spots.

#####################################################
# Figure 4: Getis-Ord Gi* and its p-values
# (Hot and Cold Spots, and Significance)
#####################################################

# Map 5: Getis-Ord Gi* statistic
p6 <- tm_shape(merged_data) +
  # Plotting polygons filled based on `gi` (Getis-Ord Gi*) using quantile breaks
  tm_polygons(col = "gi", title = "Getis-Ord Gi*",
              style = "quantile") +
  # Adjusting the layout so that the legend is outside of the plot.
  tm_layout(legend.outside = TRUE)


p7 <- tm_shape(merged_data) +
  # Plotting polygons filled based on `p_gi` (p-values of Getis-Ord Gi*) with specified breaks
  tm_polygons(col = "p_gi", title = "P-value of Getis-Ord Gi*",
              breaks = c(-Inf, 0.05, Inf))+
  # Adjusting the layout so that the legend is outside of the plot.
  tm_layout(legend.outside = TRUE)

tmap_arrange(p6,p7, ncol = 2)
## Variable(s) "gi" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Conclusion

This document presents a comprehensive spatial analysis of iron supplement usage among pregnant women in Somalia. By employing spatial mapping, the spatial lag variable, the Global Moran’s I statistic, Local Moran’s I statistics, and the Getis-Ord Gi* statistic, significant patterns of spatial dependence and clusters of high and low iron supplement usage are identified. The results of this spatial analysis can inform public health interventions aimed at improving maternal health and access to iron supplements in Somalia.

Future analyses will involve multi-level modeling to explore both individual and community-level determinants of iron supplement usage, while accounting for the spatial autocorrelation observed in this study.

References

```