This document presents a spatial analysis of Acute Respiratory Infection (ARI) in Somali children under five, using data from the Demographic and Health Survey (DHS). The study investigates ARI spatial distribution and examines spatial autocorrelation patterns through techniques such as Global, Local Moran’s I, and the Getis-Ord Gi* statistic. Understanding the geographic distribution and associated spatial factors is vital for developing targeted public health interventions.
First, we loaded the necessary R libraries for data manipulation, spatial analysis, and visualization.
# `sf`: For handling spatial vector data (shape-files).
library(sf)
# `ggplot2`: For creating static and publication-quality graphics.
library(ggplot2)
# `spdep`: For spatial analysis and spatial autocorrelation.
library(spdep)
# `tmap`: For creating thematic maps.
library(tmap)
# `haven`: For reading data files from various statistical packages (including .dta).
library(haven)
# `survey`: For complex survey analysis
library(survey)
# `RColorBrewer`: Provides color schemes for maps and charts.
library(RColorBrewer)
We loaded the Somalia shape-file and the DHS data, followed by data cleaning and data pre-processing.
# Loading the shape-file
# Specifying the file path to the shape-file representing the administrative boundaries of Somalia.
filepath<-"~/gadm41_SOM_1.shp"
# Reading the shape-file using st_read() from the sf package.
somalia<-st_read(filepath)
## Reading layer `gadm41_SOM_1' from data source
## `C:\Users\Dr. Abdalla\Documents\gadm41_SOM_1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 18 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 40.9785 ymin: -1.647082 xmax: 51.4157 ymax: 11.98931
## Geodetic CRS: WGS 84
# Printing the names of the first-level administrative units (regions) in Somalia.
somalia$NAME_1
## [1] "Awdal" "Bakool" "Banaadir"
## [4] "Bari" "Bay" "Galguduud"
## [7] "Gedo" "Hiiraan" "Jubbada Dhexe"
## [10] "Jubbada Hoose" "Mudug" "Nugaal"
## [13] "Sanaag" "Shabeellaha Dhexe" "Shabeellaha Hoose"
## [16] "Sool" "Togdheer" "Woqooyi Galbeed"
# Loading DHS data
# Specifying the file path to the DHS data-set (.dta file).
ari<- read_dta("~/ARI.dta")
# Printing the structure of the 'V024' column, which likely contains region codes.
table(ari$V024,ari$dv)
##
## 0 1
## 11 571 30
## 12 861 45
## 13 836 59
## 14 1003 34
## 15 1074 50
## 16 762 48
## 17 769 33
## 18 789 27
## 19 679 27
## 20 622 32
## 21 667 26
## 22 1532 57
## 24 303 10
## 25 817 87
## 26 762 75
## 28 755 92
We calculated the proportion of Acute Respiratory Infection (ARI) by region, using survey weights from the DHS data.
# Calculating the proportion unimproved water source by region
# Applying the function to subsets of the survey data using svyby
# Creating a survey design object
options(survey.lonely.psu = "remove")
design <- svydesign(id = ~V001, strata = ~STRU_CODE, weights = ~V005, data = ari,nest=TRUE)
result <- svyby(~dv, ~V024, FUN = svyciprop, design = design, vartype = NULL)
# Printing the resulting data frame showing the ARI proportion per region.
print(result)
## V024 dv se.as.numeric(dv)
## 11 11 0.04789906 0.010975159
## 12 12 0.03426205 0.007752170
## 13 13 0.05113859 0.007048281
## 14 14 0.03271930 0.005243034
## 15 15 0.04674556 0.007837674
## 16 16 0.04888963 0.006853024
## 17 17 0.04148807 0.012457335
## 18 18 0.03273003 0.008366219
## 19 19 0.03636179 0.009772543
## 20 20 0.06228777 0.012308087
## 21 21 0.05400930 0.011452274
## 22 22 0.03587162 0.006094142
## 24 24 0.03194888 0.006712641
## 25 25 0.11739602 0.018216502
## 26 26 0.08291289 0.022771427
## 28 28 0.12414902 0.051361348
# Adding region names to the result data frame
# Defining a character vector with the names of the regions, ordered to match the codes in the result.
labels <- c("Awdal", "Woqooyi Galbeed", "Togdheer", "Sool", "Sanaag", "Bari", "Nugaal", "Mudug",
"Galguduud", "Hiiraan", "Shabeellaha Dhexe", "Banaadir", "Bay",
"Bakool", "Gedo", "Jubbada Hoose")
# Adding a 'Region' column to the result data frame, mapping region names to region codes.
result$Region <- labels
# Sorting the result by the 'Region' column
result <- result[order(result$Region), ]
# Printing the resulting data-frame that is now labeled and sorted.
print(result)
## V024 dv se.as.numeric(dv) Region
## 11 11 0.04789906 0.010975159 Awdal
## 25 25 0.11739602 0.018216502 Bakool
## 22 22 0.03587162 0.006094142 Banaadir
## 16 16 0.04888963 0.006853024 Bari
## 24 24 0.03194888 0.006712641 Bay
## 19 19 0.03636179 0.009772543 Galguduud
## 26 26 0.08291289 0.022771427 Gedo
## 20 20 0.06228777 0.012308087 Hiiraan
## 28 28 0.12414902 0.051361348 Jubbada Hoose
## 18 18 0.03273003 0.008366219 Mudug
## 17 17 0.04148807 0.012457335 Nugaal
## 15 15 0.04674556 0.007837674 Sanaag
## 21 21 0.05400930 0.011452274 Shabeellaha Dhexe
## 14 14 0.03271930 0.005243034 Sool
## 13 13 0.05113859 0.007048281 Togdheer
## 12 12 0.03426205 0.007752170 Woqooyi Galbeed
The shape-file and the calculated ARI proportions were merged using a common identifier, the region name. Additionally, we calculated the centroids of each region, which will be used to place labels on the spatial map.
# Merging the shape-file and result data frame
# Merging the shape-file data (`somalia`) with the calculated mean fuel use data (`result`).
merged_data <- merge(somalia, result, by.x = "NAME_1", by.y = "Region")
# Calculating centroids for label placement
# Calculating the centroids (geometric center points) of each polygon in the merged data-set.
centroid <- st_centroid(merged_data)
# Extracting x-coordinates of centroids and add them as column named 'x_position'
merged_data$x_position <- centroid[, 1]
# Extracting y-coordinates of centroids and add them as column named 'y_position'
merged_data$y_position <- centroid[, 2]
# Converting Spatial-PolygonsData-Frame to sf object
# Converting merged_data which is a Spatial-PolygonsData-Frame into an sf object for use in ggplot
merged_data_sf <- st_as_sf(merged_data)
# Calculate centroids for label placement
# Calculate the centroids (geometric center points) of each polygon in the merged data-set.
centroid <- st_centroid(merged_data_sf)
# Extracting x-coordinates of centroids and add them as column named 'x_position'
merged_data_sf$x_position <- st_coordinates(centroid)[, 1]
# Extracting y-coordinates of centroids and add them as column named 'y_position'
merged_data_sf$y_position <- st_coordinates(centroid)[, 2]
We constructed a spatial weights matrix based on queen contiguity to delineate the neighborhood structure of the spatial units under study. Records with missing values for ARI were excluded, as they do not contribute to the analysis.
# Exploratory Spatial Data Analysis
# Spatial Autocorrelation Analysis
# 1. Creating spatial weights matrix: Queen contiguity
# Removing rows with NA values in the `dv` column (mean fuel use)
merged_data <- merged_data[!is.na(merged_data$dv), ]
# Create a neighborhood object based on queen contiguity between regions.
nb <- poly2nb(merged_data, queen = TRUE)
# Creating a spatial weights list from the neighborhood object using the 'W' style (row-standardized weights).
nbw <- nb2listw(nb, style = "W")
We calculated the Global Moran’s I statistic to assess overall spatial autocorrelation. This step allows us to determine whether the data is randomly distributed or spatially clustered. A statistically significant Moran’s I indicates the presence of significant spatial clustering.
# 2. Calculating Global Moran's I
set.seed(123)
# Calculating Global Moran's I statistic for the ARI data.
gmoran <- moran.test(merged_data$dv, nbw, alternative = "greater")
# Printing the output of the Moran's I test.
print(gmoran)
##
## Moran I test under randomisation
##
## data: merged_data$dv
## weights: nbw
##
## Moran I statistic standard deviate = 1.9371, p-value = 0.02636
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.30582220 -0.06666667 0.03697450
# Extracting and printing the Moran's I statistic.
print(gmoran[["estimate"]][["Moran I statistic"]])
## [1] 0.3058222
# Extracting and printing the z-score of the test.
print(gmoran[["statistic"]])
## Moran I statistic standard deviate
## 1.937145
# Extract and print the p-value of the test.
print(gmoran[["p.value"]])
## [1] 0.0263638
# Interpretation:
# - If p-value < 0.05 (significance level), reject the null hypothesis of no spatial autocorrelation.
# - Conclusion: There is evidence of spatial autocorrelation in ARI.
# Moran's for MC
# Conducting a Monte Carlo simulation for Moran's I to assess significance.
gmoranMC <- moran.mc(merged_data$dv, nbw, nsim = 999)
# Printing the output of the Monte Carlo simulation
print(gmoranMC)
##
## Monte-Carlo simulation of Moran I
##
## data: merged_data$dv
## weights: nbw
## number of simulations + 1: 1000
##
## statistic = 0.30582, observed rank = 945, p-value = 0.055
## alternative hypothesis: greater
# Open a new graphics window.
windows()
# Creating a histogram of simulated Moran's I values.
hist(gmoranMC$res)
# Adding a vertical line at the observed Moran's I statistic.
abline(v = gmoranMC$statistic, col = "red")
Next, we calculated Local Moran’s I to identify specific regions with statistically significant local clusters. This statistic helps us pinpoint both “hot spots” and “cold spots” within the data.
# 4. Calculating Local Moran's I
# Calculating Local Moran's I for each region.
lmoran <- localmoran(merged_data$dv, nbw,
alternative = "greater")
# Printing the first few rows of the local Moran's I results
print(head(lmoran))
## Ii E.Ii Var.Ii Z.Ii Pr(z > E(Ii))
## 1 0.18986503 -0.004354405 0.06936711 0.73742169 0.2304330
## 2 0.31842293 -0.330932402 1.01218810 0.64543394 0.2593230
## 3 0.02550477 -0.031317080 0.48538112 0.08155939 0.4674985
## 4 0.11592047 -0.003231677 0.01472564 0.98189442 0.1630759
## 5 -1.33069759 -0.045437946 0.32220195 -2.26426259 0.9882210
## 6 0.12828425 -0.029736750 0.13189703 0.43510822 0.3317419
# 5. Adding Local Moran's results to merged_data
# Adding the local Moran's I statistic to the merged data-set.
merged_data$lmI <- lmoran[, "Ii"]
# Adding the z-scores of local Moran's I to the merged data-set.
merged_data$lmZ <- lmoran[, "Z.Ii"]
# Adding p-values of local Moran's I to the merged data-set.
merged_data$lmp <- lmoran[, "Pr(z > E(Ii))"]
We computed the Getis-Ord Gi* statistic to identify spatial clusters of both high and low values within the data-set.
# 6. Calculating Getis-Ord Gi*
# Calculating Getis-Ord Gi* statistic for each region.
local_gi <- localG(merged_data$dv, nbw)
# Adding Getis-Ord Gi* statistic to the merged data-set
merged_data$gi <- local_gi
# Converting Getis-Ord Gi* results into z-score to use for plotting
# Calculating z-scores for the Getis-Ord Gi* statistic.
z_gi <- (local_gi - mean(local_gi)) / sd(local_gi)
# Adding the z-scores of Gi* to the merged data-set.
merged_data$zgi <- z_gi
# Calculating P-values for Getis-Ord statistic:
# The original localG function doesn't directly give p-values, we need to calculate them based on the z-scores:
merged_data$p_gi <- pnorm(abs(merged_data$zgi), lower.tail = FALSE)*2
We visualized the spatial data using the ‘tmap’ package to display the spatial patterns of ARI and the associated statistics.
# Create maps using tmap
# Set tmap mode to "view" for interactive map viewing.
tmap_mode("view")
The first visualization shows the spatial distribution of ARI, highlighting the spatial differences in the proportion of ARI across Somalia. Along with that, the spatial lag map will reveal the influence of the neighbours for the ARI distribution.
#################################################
# Figure 1 : Spatial Map of ARI
# (Visualizing Data)
#################################################
# Map 1: ARI by region
p1 <- ggplot(merged_data_sf) +
geom_sf(aes(fill = dv)) +
geom_text(aes(x = x_position, y = y_position, label = NAME_1), size = 3) + # Added region name labels
scale_fill_gradientn(
colours = brewer.pal(n = 9, name = "YlGnBu"), # Use a palette for fill
breaks = quantile(merged_data_sf$dv, probs = seq(0,1, by = 0.2)), # Quantile breaks for fill
labels = round(quantile(merged_data_sf$dv, probs = seq(0,1, by = 0.2)),2), #Quantile labels for fill
name = "ARI" #Title
) +
labs(title = "Spatial Patterns of ARI among Under-five Children in Somalia") +
theme_void() + # remove grid lines
theme(legend.position="right")
print(p1)
The second visualization is the normal distribution curve plot, which displays the significance level of the global Moran’s I value. This plot allows the reader to assess the likelihood that the spatial distribution of the data is random.
###################################################
# Figure 2 Global Moran's I (Overall Spatial Dependence)
###################################################
# Creating the Normal Distribution Curve Plot
# Generating data for the normal distribution curve
set.seed(123)
x <- seq(-4, 4, length = 200)
y <- dnorm(x)
df <- data.frame(x, y)
# Defining critical z-scores
critical_z <- c(-2.58, -1.96, -1.65, 1.65, 1.96, 2.58)
# Defining colors for the legend
legend_colors <- c("blue", "lightblue", "yellow", "lightcoral", "red")
legend_labels <- c("< -2.58", "-2.58 - -1.96", "-1.96 - -1.65", "1.65 - 1.96", "1.96 - 2.58", "> 2.58")
# Creating the plot
p3 <- ggplot(df, aes(x, y)) +
geom_line(color = "black") +
# Add shaded areas for different significance levels
geom_area(aes(x = ifelse(x < critical_z[1], x, NA)), fill = "blue", alpha = 0.5) +
geom_area(aes(x = ifelse(x > critical_z[1] & x < critical_z[2], x, NA)), fill = "lightblue", alpha = 0.5) +
geom_area(aes(x = ifelse(x > critical_z[2] & x < critical_z[3], x, NA)), fill = "yellow", alpha = 0.5) +
geom_area(aes(x = ifelse(x > critical_z[3] & x < critical_z[4], x, NA)), fill = "yellow", alpha = 0.5) +
geom_area(aes(x = ifelse(x > critical_z[4] & x < critical_z[5], x, NA)), fill = "lightcoral", alpha = 0.5) +
geom_area(aes(x = ifelse(x > critical_z[5], x, NA)), fill = "red", alpha = 0.5) +
geom_vline(xintercept = gmoran[["statistic"]], color = "red", linetype = "dashed", linewidth = 1) +
# Add critical value lines and labels
geom_vline(xintercept = critical_z, color = "gray", linetype = "dotted") +
annotate("text", x = critical_z, y = 0, label = critical_z, vjust = 1.5, size = 3, color = "black") +
# Add text annotations for z-score and p-value
annotate("text", x = gmoran[["statistic"]], y = max(y)/2,
label = paste("z-score:", round(gmoran[["statistic"]], 2)),
size = 3, color = "red", hjust = 0, vjust = -1) +
annotate("text", x = 0, y = max(y) * 0.3,
label = "Random",
size = 3, color = "black", hjust = 0.5, vjust = -1) +
annotate("text", x = -3, y = max(y) * 0.3,
label = "Significant",
size = 3, color = "black", hjust = 0.5, vjust = -1) +
annotate("text", x = 3, y = max(y) * 0.3,
label = "Significant",
size = 3, color = "black", hjust = 0.5, vjust = -1) +
# Add text annotations for Moran's I, z-score, and p-value
annotate("text", x = -3.5, y = max(y) * 0.8,
label = paste("Moran's Index:", round(gmoran[["estimate"]][["Moran I statistic"]], 5)),
size = 2.5, color = "black", hjust = 0) +
annotate("text", x = -3.5, y = max(y) * 0.7,
label = paste("z-score:", round(gmoran[["statistic"]], 5)),
size = 2.5, color = "black", hjust = 0) +
annotate("text", x = -3.5, y = max(y) * 0.6,
label = paste("p-value:", format(gmoran[["p.value"]], scientific = FALSE, digits = 5)),
size = 2.5, color = "black", hjust = 0) +
labs(title = "Normal Distribution Curve", x = "Z-score", y = "Density") +
theme_minimal() +
theme(plot.title = element_text(size=9,face = "bold", color = "blue"),
legend.position = "right",
panel.spacing = unit(0.05, "lines"),
plot.margin = margin(t = 0.2, r = 0.2, b = 0.2, l = 0.2, "cm")
) +
# Add legend using annotate
annotate("rect", xmin = 4.1, xmax = 4.3, ymin = seq(0.05, 0.45, length.out = length(legend_colors)),
ymax = seq(0.1, 0.5, length.out = length(legend_colors)), fill = legend_colors, alpha = 0.5) +
annotate("text", x = 4.4, y = seq(0.075, 0.475, length.out = length(legend_labels)),
label = legend_labels, hjust = 0, size = 2.5) +
annotate("text", x = 4.2, y = 0.55, label = "Critical Value (z-score)", hjust = 0.5, size = 2.5, fontface = "bold")+
coord_cartesian(xlim = c(-4,5), ylim = c(0,0.6))
# Display the plot
#windows()
print(p3)
he third visualization presents Local Moran’s I and its p-values, identifying regions with statistically significant clusters of both high and low Acute Respiratory Infection (ARI).
#################################################
# Figure 3 Local Moran's I and its p-values
# (Pinpointing Local Clusters)
#################################################
# Map 4: Local Moran's I statistic
p4 <- tm_shape(merged_data) +
# Plot polygons filled based on the `lmI` variable using quantile breaks.
tm_polygons(col = "lmI", title = "Local Moran's I",
style = "quantile") +
# Adjust layout so that the legend is outside of the plot.
tm_layout(legend.outside = TRUE)
# Map 5: P-value for Local Moran's I
p5 <- tm_shape(merged_data) +
# Plot polygons filled based on `lmp` (p-values) with specified breaks
tm_polygons(col = "lmp", title = "p-value",
breaks = c(-Inf, 0.05, Inf)) +
# Adjust layout so that the legend is outside of the plot.
tm_layout(legend.outside = TRUE)
tmap_arrange(p4,p5, ncol = 2)
ThThe fourth visualization displays the Getis-Ord Gi* statistic and its p-values, offering an alternative method to evaluate clustering by identifying statistically significant hot spots and cold spots.
#####################################################
# Figure 4: Getis-Ord Gi* and its p-values
# (Hot and Cold Spots, and Significance)
#####################################################
# Map 5: Getis-Ord Gi* statistic
p6 <- tm_shape(merged_data) +
# Plot polygons filled based on `gi` (Getis-Ord Gi*) using quantile breaks
tm_polygons(col = "gi", title = "Getis-Ord Gi*",
style = "quantile") +
# Adjust layout so that the legend is outside of the plot.
tm_layout(legend.outside = TRUE)
p7 <- tm_shape(merged_data) +
# Plot polygons filled based on `p_gi` (p-values of Getis-Ord Gi*) with specified breaks
tm_polygons(col = "p_gi", title = "P-value of Getis-Ord Gi*",
breaks = c(-Inf, 0.05, Inf))+
# Adjust layout so that the legend is outside of the plot.
tm_layout(legend.outside = TRUE)
tmap_arrange(p6,p7, ncol = 2)
This document presents a comprehensive spatial analysis of acute respiratory infections (ARI) in children under five in Somalia. By employing spatial mapping, the spatial lag variable, the Global Moran’s I statistic, Local Moran’s I statistics, and the Getis-Ord Gi* statistic, significant patterns of spatial dependence and clusters of high and low ARI are identified. The results of this spatial analysis can inform public health interventions aimed at reducing the incidence of ARI in Somalia.
Future analyses will involve Multi-level modeling to explore both individual and community-level determinants of ARI, while accounting for the spatial autocorrelation observed in this study.y.
```