Introduction

This document presents a spatial analysis of Acute Respiratory Infection (ARI) in Somali children under five, using data from the Demographic and Health Survey (DHS). The study investigates ARI spatial distribution and examines spatial autocorrelation patterns through techniques such as Global, Local Moran’s I, and the Getis-Ord Gi* statistic. Understanding the geographic distribution and associated spatial factors is vital for developing targeted public health interventions.

Data and Methods

Libraries

First, we loaded the necessary R libraries for data manipulation, spatial analysis, and visualization.

# `sf`: For handling spatial vector data (shape-files).
library(sf)
# `ggplot2`: For creating static and publication-quality graphics.
library(ggplot2)
# `spdep`: For spatial analysis and spatial autocorrelation.
library(spdep)
# `tmap`: For creating thematic maps.
library(tmap)
# `haven`: For reading data files from various statistical packages (including .dta).
library(haven)
# `survey`: For complex survey analysis
library(survey)
# `RColorBrewer`: Provides color schemes for maps and charts.
library(RColorBrewer)

Loading Data

We loaded the Somalia shape-file and the DHS data, followed by data cleaning and data pre-processing.

# Loading the  shape-file
# Specifying the file path to the shape-file representing the administrative boundaries of Somalia.
filepath<-"~/gadm41_SOM_1.shp"
# Reading the shape-file using st_read() from the sf package.
somalia<-st_read(filepath)
## Reading layer `gadm41_SOM_1' from data source 
##   `C:\Users\Dr. Abdalla\Documents\gadm41_SOM_1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 18 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 40.9785 ymin: -1.647082 xmax: 51.4157 ymax: 11.98931
## Geodetic CRS:  WGS 84
# Printing the names of the first-level administrative units (regions) in Somalia.
somalia$NAME_1
##  [1] "Awdal"             "Bakool"            "Banaadir"         
##  [4] "Bari"              "Bay"               "Galguduud"        
##  [7] "Gedo"              "Hiiraan"           "Jubbada Dhexe"    
## [10] "Jubbada Hoose"     "Mudug"             "Nugaal"           
## [13] "Sanaag"            "Shabeellaha Dhexe" "Shabeellaha Hoose"
## [16] "Sool"              "Togdheer"          "Woqooyi Galbeed"
# Loading DHS data
# Specifying the file path to the DHS data-set (.dta file).
ari<- read_dta("~/ARI.dta")
# Printing the structure of the 'V024' column, which likely contains region codes.
table(ari$V024,ari$dv)
##     
##         0    1
##   11  571   30
##   12  861   45
##   13  836   59
##   14 1003   34
##   15 1074   50
##   16  762   48
##   17  769   33
##   18  789   27
##   19  679   27
##   20  622   32
##   21  667   26
##   22 1532   57
##   24  303   10
##   25  817   87
##   26  762   75
##   28  755   92

Calculating Acute Respiratory Infection Proportion

We calculated the proportion of Acute Respiratory Infection (ARI) by region, using survey weights from the DHS data.

# Calculating the proportion unimproved water source by region
# Applying the function to subsets of the survey data using svyby
# Creating a survey design object
options(survey.lonely.psu = "remove")
design <- svydesign(id = ~V001, strata = ~STRU_CODE, weights = ~V005, data = ari,nest=TRUE)
result <- svyby(~dv, ~V024, FUN = svyciprop, design = design, vartype = NULL)
# Printing the resulting data frame showing the ARI proportion per region.
print(result)
##    V024         dv se.as.numeric(dv)
## 11   11 0.04789906       0.010975159
## 12   12 0.03426205       0.007752170
## 13   13 0.05113859       0.007048281
## 14   14 0.03271930       0.005243034
## 15   15 0.04674556       0.007837674
## 16   16 0.04888963       0.006853024
## 17   17 0.04148807       0.012457335
## 18   18 0.03273003       0.008366219
## 19   19 0.03636179       0.009772543
## 20   20 0.06228777       0.012308087
## 21   21 0.05400930       0.011452274
## 22   22 0.03587162       0.006094142
## 24   24 0.03194888       0.006712641
## 25   25 0.11739602       0.018216502
## 26   26 0.08291289       0.022771427
## 28   28 0.12414902       0.051361348
# Adding region names to the result data frame
# Defining a character vector with the names of the regions, ordered to match the codes in the result.
labels <- c("Awdal", "Woqooyi Galbeed", "Togdheer", "Sool", "Sanaag", "Bari", "Nugaal", "Mudug",
            "Galguduud", "Hiiraan", "Shabeellaha Dhexe", "Banaadir",  "Bay",
            "Bakool", "Gedo",  "Jubbada Hoose")
# Adding a 'Region' column to the result data frame, mapping region names to region codes.
result$Region <- labels
# Sorting the result by the 'Region' column
result <- result[order(result$Region), ]
# Printing the resulting data-frame that is now labeled and sorted.
print(result)
##    V024         dv se.as.numeric(dv)            Region
## 11   11 0.04789906       0.010975159             Awdal
## 25   25 0.11739602       0.018216502            Bakool
## 22   22 0.03587162       0.006094142          Banaadir
## 16   16 0.04888963       0.006853024              Bari
## 24   24 0.03194888       0.006712641               Bay
## 19   19 0.03636179       0.009772543         Galguduud
## 26   26 0.08291289       0.022771427              Gedo
## 20   20 0.06228777       0.012308087           Hiiraan
## 28   28 0.12414902       0.051361348     Jubbada Hoose
## 18   18 0.03273003       0.008366219             Mudug
## 17   17 0.04148807       0.012457335            Nugaal
## 15   15 0.04674556       0.007837674            Sanaag
## 21   21 0.05400930       0.011452274 Shabeellaha Dhexe
## 14   14 0.03271930       0.005243034              Sool
## 13   13 0.05113859       0.007048281          Togdheer
## 12   12 0.03426205       0.007752170   Woqooyi Galbeed

Merging Data

The shape-file and the calculated ARI proportions were merged using a common identifier, the region name. Additionally, we calculated the centroids of each region, which will be used to place labels on the spatial map.

# Merging the shape-file and result data frame
# Merging the shape-file data (`somalia`) with the calculated mean fuel use data (`result`).
merged_data <- merge(somalia, result, by.x = "NAME_1", by.y = "Region")

# Calculating centroids for label placement
# Calculating the centroids (geometric center points) of each polygon in the merged data-set.
centroid <- st_centroid(merged_data)
# Extracting x-coordinates of centroids and add them as column named 'x_position'
merged_data$x_position <- centroid[, 1]
# Extracting y-coordinates of centroids and add them as column named 'y_position'
merged_data$y_position <- centroid[, 2]

# Converting Spatial-PolygonsData-Frame to sf object
# Converting merged_data which is a Spatial-PolygonsData-Frame into an sf object for use in ggplot
merged_data_sf <- st_as_sf(merged_data)
# Calculate centroids for label placement
# Calculate the centroids (geometric center points) of each polygon in the merged data-set.
centroid <- st_centroid(merged_data_sf)
# Extracting x-coordinates of centroids and add them as column named 'x_position'
merged_data_sf$x_position <- st_coordinates(centroid)[, 1]
# Extracting y-coordinates of centroids and add them as column named 'y_position'
merged_data_sf$y_position <- st_coordinates(centroid)[, 2]

Spatial Autocorrelation Analysis

Spatial Weights Matrix

We constructed a spatial weights matrix based on queen contiguity to delineate the neighborhood structure of the spatial units under study. Records with missing values for ARI were excluded, as they do not contribute to the analysis.

# Exploratory Spatial Data Analysis
# Spatial Autocorrelation Analysis

# 1. Creating spatial weights matrix: Queen contiguity
# Removing rows with NA values in the `dv` column (mean fuel use)
merged_data <- merged_data[!is.na(merged_data$dv), ]
# Create a neighborhood object based on queen contiguity between regions.
nb <- poly2nb(merged_data, queen = TRUE)
# Creating a spatial weights list from the neighborhood object using the 'W' style (row-standardized weights).
nbw <- nb2listw(nb, style = "W")

Global Moran’s I

We calculated the Global Moran’s I statistic to assess overall spatial autocorrelation. This step allows us to determine whether the data is randomly distributed or spatially clustered. A statistically significant Moran’s I indicates the presence of significant spatial clustering.

# 2. Calculating Global Moran's I
set.seed(123)
# Calculating Global Moran's I statistic for the ARI data.
gmoran <- moran.test(merged_data$dv, nbw, alternative = "greater")
# Printing the output of the Moran's I test.
print(gmoran)
## 
##  Moran I test under randomisation
## 
## data:  merged_data$dv  
## weights: nbw    
## 
## Moran I statistic standard deviate = 1.9371, p-value = 0.02636
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.30582220       -0.06666667        0.03697450
# Extracting and printing the Moran's I statistic.
print(gmoran[["estimate"]][["Moran I statistic"]])
## [1] 0.3058222
# Extracting and printing the z-score of the test.
print(gmoran[["statistic"]])
## Moran I statistic standard deviate 
##                           1.937145
# Extract and print the p-value of the test.
print(gmoran[["p.value"]])
## [1] 0.0263638
# Interpretation:
# - If p-value < 0.05 (significance level), reject the null hypothesis of no spatial autocorrelation.
# - Conclusion: There is evidence of spatial autocorrelation in ARI.

# Moran's for MC
# Conducting a Monte Carlo simulation for Moran's I to assess significance.
gmoranMC <- moran.mc(merged_data$dv, nbw, nsim = 999)
# Printing the output of the Monte Carlo simulation
print(gmoranMC)
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  merged_data$dv 
## weights: nbw  
## number of simulations + 1: 1000 
## 
## statistic = 0.30582, observed rank = 945, p-value = 0.055
## alternative hypothesis: greater
# Open a new graphics window.
windows()
# Creating a histogram of simulated Moran's I values.
hist(gmoranMC$res)
# Adding a vertical line at the observed Moran's I statistic.
abline(v = gmoranMC$statistic, col = "red")

Local Moran’s I

Next, we calculated Local Moran’s I to identify specific regions with statistically significant local clusters. This statistic helps us pinpoint both “hot spots” and “cold spots” within the data.

# 4. Calculating Local Moran's I
# Calculating Local Moran's I for each region.
lmoran <- localmoran(merged_data$dv, nbw,
                     alternative = "greater")
# Printing the first few rows of the local Moran's I results
print(head(lmoran))
##            Ii         E.Ii     Var.Ii        Z.Ii Pr(z > E(Ii))
## 1  0.18986503 -0.004354405 0.06936711  0.73742169     0.2304330
## 2  0.31842293 -0.330932402 1.01218810  0.64543394     0.2593230
## 3  0.02550477 -0.031317080 0.48538112  0.08155939     0.4674985
## 4  0.11592047 -0.003231677 0.01472564  0.98189442     0.1630759
## 5 -1.33069759 -0.045437946 0.32220195 -2.26426259     0.9882210
## 6  0.12828425 -0.029736750 0.13189703  0.43510822     0.3317419
# 5. Adding Local Moran's results to merged_data
# Adding the local Moran's I statistic to the merged data-set.
merged_data$lmI <- lmoran[, "Ii"]
# Adding the z-scores of local Moran's I to the merged data-set.
merged_data$lmZ <- lmoran[, "Z.Ii"]
# Adding p-values of local Moran's I to the merged data-set.
merged_data$lmp <- lmoran[, "Pr(z > E(Ii))"]

Getis-Ord Gi*

We computed the Getis-Ord Gi* statistic to identify spatial clusters of both high and low values within the data-set.

# 6. Calculating Getis-Ord Gi*
# Calculating Getis-Ord Gi* statistic for each region.
local_gi <- localG(merged_data$dv, nbw)
# Adding Getis-Ord Gi* statistic to the merged data-set
merged_data$gi <- local_gi

# Converting Getis-Ord Gi* results into z-score to use for plotting
# Calculating z-scores for the Getis-Ord Gi* statistic.
z_gi <- (local_gi - mean(local_gi)) / sd(local_gi)
# Adding the z-scores of Gi* to the merged data-set.
merged_data$zgi <- z_gi

# Calculating P-values for Getis-Ord statistic:
# The original localG function doesn't directly give p-values, we need to calculate them based on the z-scores:

merged_data$p_gi <- pnorm(abs(merged_data$zgi), lower.tail = FALSE)*2

Spatial Visualization

We visualized the spatial data using the ‘tmap’ package to display the spatial patterns of ARI and the associated statistics.

# Create maps using tmap
# Set tmap mode to "view" for interactive map viewing.
tmap_mode("view")

Figure 1: Spatial Distribution of ARI and Spatial Lag

The first visualization shows the spatial distribution of ARI, highlighting the spatial differences in the proportion of ARI across Somalia. Along with that, the spatial lag map will reveal the influence of the neighbours for the ARI distribution.

#################################################
# Figure 1 : Spatial Map of ARI
# (Visualizing Data)
#################################################

# Map 1: ARI by region

p1 <- ggplot(merged_data_sf) +
  geom_sf(aes(fill = dv)) +
  geom_text(aes(x = x_position, y = y_position, label = NAME_1), size = 3) + # Added region name labels
  scale_fill_gradientn(
    colours = brewer.pal(n = 9, name = "YlGnBu"),  # Use a palette for fill
      breaks = quantile(merged_data_sf$dv, probs = seq(0,1, by = 0.2)), # Quantile breaks for fill
    labels =  round(quantile(merged_data_sf$dv, probs = seq(0,1, by = 0.2)),2), #Quantile labels for fill
    name = "ARI" #Title
  ) +
  labs(title = "Spatial Patterns of ARI among Under-five Children in Somalia") +
  theme_void() + # remove grid lines
  theme(legend.position="right")

print(p1)

Figure 2: Global Moran’s I

The second visualization is the normal distribution curve plot, which displays the significance level of the global Moran’s I value. This plot allows the reader to assess the likelihood that the spatial distribution of the data is random.

###################################################
# Figure 2 Global Moran's I (Overall Spatial Dependence)
###################################################

# Creating the Normal Distribution Curve Plot
# Generating data for the normal distribution curve
set.seed(123)
x <- seq(-4, 4, length = 200)
y <- dnorm(x)
df <- data.frame(x, y)

# Defining critical z-scores
critical_z <- c(-2.58, -1.96, -1.65, 1.65, 1.96, 2.58)

# Defining colors for the legend
legend_colors <- c("blue", "lightblue", "yellow", "lightcoral", "red")
legend_labels <- c("< -2.58", "-2.58 - -1.96", "-1.96 - -1.65", "1.65 - 1.96", "1.96 - 2.58", "> 2.58")

# Creating the plot
p3 <- ggplot(df, aes(x, y)) +
  geom_line(color = "black") +
  # Add shaded areas for different significance levels
  geom_area(aes(x = ifelse(x < critical_z[1], x, NA)), fill = "blue", alpha = 0.5) +
  geom_area(aes(x = ifelse(x > critical_z[1] & x < critical_z[2], x, NA)), fill = "lightblue", alpha = 0.5) +
  geom_area(aes(x = ifelse(x > critical_z[2] & x < critical_z[3], x, NA)), fill = "yellow", alpha = 0.5) +
  geom_area(aes(x = ifelse(x > critical_z[3] & x < critical_z[4], x, NA)), fill = "yellow", alpha = 0.5) +
  geom_area(aes(x = ifelse(x > critical_z[4] & x < critical_z[5], x, NA)), fill = "lightcoral", alpha = 0.5) +
  geom_area(aes(x = ifelse(x > critical_z[5], x, NA)), fill = "red", alpha = 0.5) +
  geom_vline(xintercept = gmoran[["statistic"]], color = "red", linetype = "dashed", linewidth = 1) +
  # Add critical value lines and labels
  geom_vline(xintercept = critical_z, color = "gray", linetype = "dotted") +
  annotate("text", x = critical_z, y = 0, label = critical_z, vjust = 1.5, size = 3, color = "black") +
  # Add text annotations for z-score and p-value
  annotate("text", x = gmoran[["statistic"]], y = max(y)/2,
           label = paste("z-score:", round(gmoran[["statistic"]], 2)),
           size = 3, color = "red", hjust = 0, vjust = -1) +
  annotate("text", x = 0, y = max(y) * 0.3,
           label = "Random",
           size = 3, color = "black", hjust = 0.5, vjust = -1) +
  annotate("text", x = -3, y = max(y) * 0.3,
           label = "Significant",
           size = 3, color = "black", hjust = 0.5, vjust = -1) +
  annotate("text", x = 3, y = max(y) * 0.3,
           label = "Significant",
           size = 3, color = "black", hjust = 0.5, vjust = -1) +
  # Add text annotations for Moran's I, z-score, and p-value
  annotate("text", x = -3.5, y = max(y) * 0.8,
           label = paste("Moran's Index:", round(gmoran[["estimate"]][["Moran I statistic"]], 5)),
           size = 2.5, color = "black", hjust = 0) +
  annotate("text", x = -3.5, y = max(y) * 0.7,
           label = paste("z-score:", round(gmoran[["statistic"]], 5)),
           size = 2.5, color = "black", hjust = 0) +
  annotate("text", x = -3.5, y = max(y) * 0.6,
           label = paste("p-value:", format(gmoran[["p.value"]], scientific = FALSE, digits = 5)),
           size = 2.5, color = "black", hjust = 0) +
  labs(title = "Normal Distribution Curve", x = "Z-score", y = "Density") +
  theme_minimal() +
  theme(plot.title = element_text(size=9,face = "bold", color = "blue"),
        legend.position = "right",
        panel.spacing = unit(0.05, "lines"),
        plot.margin = margin(t = 0.2, r = 0.2, b = 0.2, l = 0.2, "cm")
  ) +
  # Add legend using annotate
  annotate("rect", xmin = 4.1, xmax = 4.3, ymin = seq(0.05, 0.45, length.out = length(legend_colors)),
           ymax = seq(0.1, 0.5, length.out = length(legend_colors)), fill = legend_colors, alpha = 0.5) +
  annotate("text", x = 4.4, y = seq(0.075, 0.475, length.out = length(legend_labels)),
           label = legend_labels, hjust = 0, size = 2.5) +
  annotate("text", x = 4.2, y = 0.55, label = "Critical Value (z-score)", hjust = 0.5, size = 2.5, fontface = "bold")+
  coord_cartesian(xlim = c(-4,5), ylim = c(0,0.6))


# Display the plot
#windows()
print(p3)

Figure 3: Local Moran’s I and Significance

he third visualization presents Local Moran’s I and its p-values, identifying regions with statistically significant clusters of both high and low Acute Respiratory Infection (ARI).

#################################################
# Figure 3 Local Moran's I and its p-values
# (Pinpointing Local Clusters)
#################################################


# Map 4: Local Moran's I statistic
p4 <- tm_shape(merged_data) +
  # Plot polygons filled based on the `lmI` variable using quantile breaks.
  tm_polygons(col = "lmI", title = "Local Moran's I",
              style = "quantile") +
  # Adjust layout so that the legend is outside of the plot.
  tm_layout(legend.outside = TRUE)



# Map 5: P-value for Local Moran's I
p5 <- tm_shape(merged_data) +
  # Plot polygons filled based on `lmp` (p-values) with specified breaks
  tm_polygons(col = "lmp", title = "p-value",
              breaks = c(-Inf, 0.05, Inf)) +
  # Adjust layout so that the legend is outside of the plot.
  tm_layout(legend.outside = TRUE)

tmap_arrange(p4,p5, ncol = 2)

Figure 4: Getis-Ord Gi* and Significance

ThThe fourth visualization displays the Getis-Ord Gi* statistic and its p-values, offering an alternative method to evaluate clustering by identifying statistically significant hot spots and cold spots.

#####################################################
# Figure 4: Getis-Ord Gi* and its p-values
# (Hot and Cold Spots, and Significance)
#####################################################

# Map 5: Getis-Ord Gi* statistic
p6 <- tm_shape(merged_data) +
  # Plot polygons filled based on `gi` (Getis-Ord Gi*) using quantile breaks
  tm_polygons(col = "gi", title = "Getis-Ord Gi*",
              style = "quantile") +
  # Adjust layout so that the legend is outside of the plot.
  tm_layout(legend.outside = TRUE)




p7 <- tm_shape(merged_data) +
  # Plot polygons filled based on `p_gi` (p-values of Getis-Ord Gi*) with specified breaks
  tm_polygons(col = "p_gi", title = "P-value of Getis-Ord Gi*",
              breaks = c(-Inf, 0.05, Inf))+
  # Adjust layout so that the legend is outside of the plot.
  tm_layout(legend.outside = TRUE)

tmap_arrange(p6,p7, ncol = 2)

Conclusion

This document presents a comprehensive spatial analysis of acute respiratory infections (ARI) in children under five in Somalia. By employing spatial mapping, the spatial lag variable, the Global Moran’s I statistic, Local Moran’s I statistics, and the Getis-Ord Gi* statistic, significant patterns of spatial dependence and clusters of high and low ARI are identified. The results of this spatial analysis can inform public health interventions aimed at reducing the incidence of ARI in Somalia.

Future analyses will involve Multi-level modeling to explore both individual and community-level determinants of ARI, while accounting for the spatial autocorrelation observed in this study.y.

References

```