Geospatial Analysis of Affected Adolescents in DRC

Author

Mulumba Kalonji Alain

Published

June 10, 2025

Click to expand R code
```{r message=FALSE}
setwd("C:/Users/Alain/Documents")


options(repos = c(CRAN = "https://cloud.r-project.org/"))

requiredPackages <- c("sf","RColorBrewer", "spatstat.geom", "spatstat.utils", "ggplot2",
              "spatstat", "maptools", "httr","ggspatial", "viridis", "dplyr", "readxl",
              "forcats","tidyr", "stringr")
for(i in requiredPackages)
{if(!require(i,character.only = TRUE)) install.packages(i)}
```

1 Geospatial Analysis of Affected Adolescents in the DRC

1.1 1. Introduction and Literature Review

Adolescent health disparities remain a major concern in sub-Saharan Africa. In the Democratic Republic of the Congo (DRC), limited healthcare infrastructure and uneven service delivery have contributed to spatially heterogeneous outcomes among youth. Recent studies emphasize the importance of spatial analytics for understanding and addressing geographic inequality in health services. This study builds upon such literature by mapping and modeling the spatial distribution of adolescents affected by health conditions in the DRC, with a special focus on gender disparities and regional intensity.

1.2 2. Description of the Data

1.2.1 Data Sources

  • Geospatial layers: Administrative boundaries (provinces, communes, and districts), railway infrastructure, and protected areas were sourced from geoBoundaries, Natural Earth, and OpenStreetMap datasets.

  • Health data: A dataset from “ADO & JEUNES 2023” provided counts of affected adolescents disaggregated by gender (male/female) and province. Estimated population sizes by province were used to standardize reported values.

1.2.2 Data Preprocessing

Data preprocessing involved renaming variables, converting to numeric types, eliminating missing values, and reshaping the data into long format for gender-disaggregated analysis. Each province was assigned a unique code, and simulated centroid coordinates were generated for visualization and point pattern analysis.

1.3 3. Research Problem and Objectives

1.3.1 Problem Statement

Health service delivery for adolescents in the DRC is marked by disparities in access and outcome, influenced by geography and gender. Yet, few studies have comprehensively mapped this inequality.

1.3.2 Objectives

  • Map the absolute and standardized burden of adolescent health cases by province.

  • Identify provinces with significant gender disparities.

  • Evaluate spatial distribution patterns (random, clustered, or dispersed).

  • Determine the intensity of health needs by combining population ratios and gender proportions.

1.4 4. Estimation and Model Diagnostics

1.4.1 Spatial Aggregation and Mapping

Administrative boundaries were transformed to a consistent CRS (EPSG:3857) and joined with the health dataset. Key visualizations included:

  • Choropleths showing total affected adolescents per province.

  • Faceted maps to compare male vs. female distributions.

  • Bivariate maps combining intensity and gender proportion.

1.4.2 Point Pattern Analysis

Simulated coordinates allowed for:

  • Construction of unmarked and marked point patterns using the spatstat package.

  • Nearest-neighbor distance (NND) and pairwise distance analysis to assess spatial clustering.

1.4.3 Ratio Calculations (Sous-sous-titre)

Ratios of affected adolescents per 1,000 inhabitants and female proportions per province were computed. These allowed for standardized comparisons and composite metrics for policy prioritization.

1.5 5. Conclusions

This study reveals marked spatial disparities in adolescent health conditions across the DRC. Provinces such as Kinshasa, Kasaï-Oriental, and Nord-Kivu show both high absolute numbers and significant gender imbalances. Spatial point analysis indicates non-random clustering of affected cases, suggesting underlying regional factors such as urban density, health infrastructure, and sociopolitical context.

The findings underscore the necessity for spatially targeted interventions, especially in provinces with both high intensity and low female representation. Future work should incorporate temporal dynamics and more granular demographic data (e.g., age groups, urban/rural segmentation).

Keywords: DRC, adolescents, health disparities, spatial analysis, gender, choropleth, point pattern, spatstat

2 Code Appendix

Click to expand R code
```{r}
banderies <- st_read("C:/Users/Alain/Downloads/province26/provinces26/Province26.shp")
banderies_nad83 <- st_transform(banderies, crs = "+proj=longlat +datum=NAD83")

library(rnaturalearth)
rdc <- ne_states(country = "Democratic Republic of the Congo", returnclass = "sf")
rdc_83<- st_transform(rdc, crs = 3857)

ggplot() + 
  geom_sf(rdc_83, mapping = aes(geometry=geometry))
```
Reading layer `Province26' from data source 
  `C:\Users\Alain\Downloads\province26\provinces26\Province26.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 26 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1358728 ymin: -1501941 xmax: 3481773 ymax: 596454.8
Projected CRS: World_Mercator

Click to expand R code
```{r}
provinces <- st_read("C:/Users/Alain/Downloads/geoBoundaries-COD-ADM2-all/geoBoundaries-COD-ADM2.shp")
provinces_nad83 <- st_transform(provinces, crs = "+proj=longlat +datum=NAD83")

ggplot() + 
  geom_sf(provinces_nad83, mapping = aes(geometry=geometry))



crds <- st_centroid(st_make_valid(provinces_nad83)) 
head(crds)

plot(st_geometry(provinces_nad83))  # rysuje kontur / plotting the contour
plot(st_geometry(crds), pch=21, bg='red', add=TRUE)
```
Reading layer `geoBoundaries-COD-ADM2' from data source 
  `C:\Users\Alain\Downloads\geoBoundaries-COD-ADM2-all\geoBoundaries-COD-ADM2.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 189 features and 5 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 12.20566 ymin: -13.456 xmax: 31.30522 ymax: 5.386098
Geodetic CRS:  WGS 84
Simple feature collection with 6 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 23.7547 ymin: 0.7354574 xmax: 30.70698 ymax: 4.471862
Geodetic CRS:  +proj=longlat +datum=NAD83
   shapeName shapeISO                 shapeID shapeGroup shapeType
1        Aba     <NA> 63176286B20411552692002        COD      ADM2
2      Aketi     <NA> 63176286B77771623677962        COD      ADM2
3       Ango     <NA> 63176286B24678106401017        COD      ADM2
4    Ariwara     <NA> 63176286B55306571020797        COD      ADM2
5        Aru     <NA> 63176286B14719092717616        COD      ADM2
6 Bafwasende     <NA> 63176286B50174012793828        COD      ADM2
                    geometry
1  POINT (30.23818 3.858422)
2    POINT (23.7547 2.98004)
3  POINT (26.07273 4.471862)
4  POINT (30.70698 3.136862)
5  POINT (30.53241 3.065347)
6 POINT (26.99695 0.7354574)

Click to expand R code
```{r}
library(rnaturalearth)
rdc <- ne_states(country = "Democratic Republic of the Congo", returnclass = "sf")
rdc_83<- st_transform(rdc, crs = 3857)
```

3 The province hosting the most extensive kimberlite diamond fields

Click to expand R code
```{r}
unique(rdc$name)
Kasai_Oriental <- rdc[rdc$name == "Kasaï-Oriental", ]
Kasai_Oriental <- rdc[grepl("Kasaï-Oriental", rdc$name, ignore.case = TRUE), ]
nrow(Kasai_Oriental)


Kasai_Oriental_proj <- st_transform(Kasai_Oriental, crs = 32733)
Kasai_Oriental_geom <- st_geometry(Kasai_Oriental_proj)[[1]]  
Kasai_Oriental_win <- as.owin(Kasai_Oriental_geom)
```
 [1] "Équateur"         "Bandundu"         "Kinshasa City"    "Bas-Congo"       
 [5] "Orientale"        "Sud-Kivu"         "Katanga"          "Nord-Kivu"       
 [9] "Kasaï-Occidental" "Kasaï-Oriental"   "Maniema"         
[1] 1

4 4. Plot to confirm

Click to expand R code
```{r}
plot(Kasai_Oriental_win, main = "Observation window - Kasaï-Oriental")
ggplot() + 
  geom_sf(Kasai_Oriental_proj, mapping = aes(geometry=geometry))

Kasai_Oriental_proj <- st_transform(Kasai_Oriental, crs = 32733)
st_bbox(Kasai_Oriental_proj)
```
   xmin    ymin    xmax    ymax 
1268838 9113480 1757329 9804526 

#st_bbox(banderies_nad83)

5 contour map with centroids – in sf and data.frame class

Click to expand R code
```{r}
crds <- st_centroid(st_make_valid(Kasai_Oriental_proj)) 
head(crds)

plot(st_geometry(Kasai_Oriental_proj))  
plot(st_geometry(crds), pch=21, bg='red', add=TRUE)
```
Simple feature collection with 1 feature and 121 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 1496407 ymin: 9491290 xmax: 1496407 ymax: 9491290
Projected CRS: WGS 84 / UTM zone 33S
                         featurecla scalerank adm1_code diss_me iso_3166_2
3789 Admin-1 states provinces lakes         5  COD-1896    1896      CD-KE
     wikipedia iso_a2 adm0_sr           name              name_alt name_local
3789      <NA>     CD       1 Kasaï-Oriental East Kasai|Kasai East       <NA>
         type  type_en code_local code_hasc note hasc_maybe region region_cod
3789 Province Province       <NA>     CD.KR <NA>       <NA>   <NA>       <NA>
     provnum_ne gadm_level check_me datarank abbrev postal area_sqkm sameascity
3789          9          1        0        3   <NA>     KR         0        -99
     labelrank name_len mapcolor9 mapcolor13 fips fips_alt  woe_id
3789         6       14         4          7 CG04     <NA> 2344980
                                            woe_label       woe_name latitude
3789 Kasai-Oriental, CD, Democratic Republic of Congo Kasaï-Oriental -4.40511
     longitude sov_a3 adm0_a3 adm0_label                            admin
3789   24.0842    COD     COD          2 Democratic Republic of the Congo
                             geonunit gu_a3  gn_id                    gn_name
3789 Democratic Republic of the Congo   COD 214138 Province du Kasai-Oriental
       gns_id       gns_name gn_level gn_region gn_a1_code region_sub sub_code
3789 -2046899 Kasai-Oriental        1      <NA>      CD.04       <NA>     <NA>
     gns_level gns_lang gns_adm1 gns_region min_label max_label min_zoom
3789         1      fra     CG04       <NA>         6        11        6
     wikidataid       name_ar         name_bn        name_de        name_en
3789     Q80953 كاساي الشرقية কাসাই-ওরিয়েন্টাল Kasaï-Oriental Kasaï-Oriental
            name_es        name_fr        name_el          name_hi     name_hu
3789 Kasai Oriental Kasaï-Oriental Κασάι-Οριεντάλ कासाइ-पूर्वी प्रान्त Kelet-Kasai
            name_id         name_it    name_ja        name_ko    name_nl
3789 Kasai-Oriental Kasai Orientale 東カサイ州 카사이오리앙탈 Oost-Kasaï
             name_pl        name_pt         name_ru        name_sv    name_tr
3789 Kasai Wschodnie Kasaï-Oriental Восточное Касаи Kasaï-Oriental Doğu Kasai
            name_vi  name_zh      ne_id        name_he      name_uk
3789 Kasai-Oriental 东开赛省 1159311343 קאסאי-אוריינטל Східне Касаї
          name_ur         name_fa name_zht FCLASS_ISO FCLASS_US FCLASS_FR
3789 کاسائی-مشرقی کاسای- اورینتال 东开赛省       <NA>      <NA>      <NA>
     FCLASS_RU FCLASS_ES FCLASS_CN FCLASS_TW FCLASS_IN FCLASS_NP FCLASS_PK
3789      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
     FCLASS_DE FCLASS_GB FCLASS_BR FCLASS_IL FCLASS_PS FCLASS_SA FCLASS_EG
3789      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
     FCLASS_MA FCLASS_PT FCLASS_AR FCLASS_JP FCLASS_KO FCLASS_VN FCLASS_TR
3789      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
     FCLASS_ID FCLASS_PL FCLASS_GR FCLASS_IT FCLASS_NL FCLASS_SE FCLASS_BD
3789      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
     FCLASS_UA FCLASS_TLC                geometry
3789      <NA>       <NA> POINT (1496407 9491290)

6 The Congolese railway network: an underdeveloped colonial legacy with limited modern connectivity

Click to expand R code
```{r}
railways <- st_read("C:/Users/Alain/Downloads/congo-democratic-republic-latest-free.shp/gis_osm_railways_free_1.shp")
railways_nad83 <- st_transform(railways, crs = "+proj=longlat +datum=NAD83")
plot(railways$geometry)

district <- st_read("C:/Users/Alain/Downloads/district/District.shp")
district_nad83 <- st_transform(district, crs = "+proj=longlat +datum=NAD83")

plot(district$geometry)
```
Reading layer `gis_osm_railways_free_1' from data source 
  `C:\Users\Alain\Downloads\congo-democratic-republic-latest-free.shp\gis_osm_railways_free_1.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 1900 features and 7 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 13.43202 ymin: -13.49397 xmax: 30.71915 ymax: 3.36139
Geodetic CRS:  WGS 84
Reading layer `District' from data source 
  `C:\Users\Alain\Downloads\district\District.shp' using driver `ESRI Shapefile'
Simple feature collection with 48 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1358728 ymin: -1501941 xmax: 3481773 ymax: 596454.8
Projected CRS: World_Mercator

7 DRC’s parks harbor some of Earth’s rarest biodiversity, including endemic species like the Okapi, Bonobo, and Congo Peacock.

Click to expand R code
```{r}
parc <- st_read("C:/Users/Alain/Downloads/parc/Parc.shp")
parc_nad83 <- st_transform(parc, crs = "+proj=longlat +datum=NAD83")
table(parc$NOM)
plot(parc$geometry)
```
Reading layer `Parc' from data source `C:\Users\Alain\Downloads\parc\Parc.shp' using driver `ESRI Shapefile'
Simple feature collection with 44 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 12.35354 ymin: -11.19897 xmax: 30.20467 ymax: 5.388336
Geodetic CRS:  WGS 84

       Domaine de chasse de Bili-Uere     Domaine de Chasse de Bombo Lumene 
                                    1                                     1 
            Domaine de Chasse de Bomu        Domaine de Chasse de Bushimaie 
                                    1                                     1 
Domaine de Chasse de Gangala-na Bodio    Domaine de Chasse de Luama-Katanga 
                                    1                                     1 
      Domaine de chasse de Luama-Kivu    Domaine de chasse de Lubudi Sampwe 
                                    1                                     1 
     Domaine de chasse de Maika-Penge           Domaine de chasse de Mangai 
                                    1                                     1 
       Domaine de chasse de Rubi-Tele         Domaine de chasse de Rutshuru 
                                    1                                     1 
      Domaine de Chasse de Swa-Kibula      Domaine de chasse de Tshangalele 
                                    1                                     1 
                     Massif d'Itombwe                     Parc de la N'Sele 
                                    1                                     1 
             Parc Marin des Mangroves         Parc National de Kahuzi-Biega 
                                    1                                     1 
          Parc National de Kundelungu             Parc National de l'Upemba 
                                    1                                     1 
          Parc National de la Garamba             Parc National de la Maiko 
                                    1                                     1 
          Parc National de la Salonga             Parc National des Virunga 
                                    2                                     1 
                Réserve de Abumonbazi     Réserve de biosphère de la Lufira 
                                    1                                     1 
      Réserve de biosphere de la Luki      Réserve de biosphère de Yangambi 
                                    1                                     1 
                      Réserve de Bomu                        Réserve de Epi 
                                    1                                     1 
            Réserve de faune à okapis             Réserve de Lomami-Lualaba 
                                    1                                     1 
                 Réserve de Mai Mpili                    Réserve de Maniema 
                                    1                                     1 
            Réserve de Shaba Elephant          Réserve du Lac Tumba-Lediima 
                                    1                                     1 
               Réserve du Mont Kabobo                 Réserve du Sud Masisi 
                                    1                                     1 
      Réserve du triangle de la Ngiri Réserve forestière de Lomako-Yokokala 
                                    1                                     1 
   Reserve Naturelle de Kisimba Ikobo            Reserve Naturelle de Tayna 
                                    1                                     1 
          Réserve Scientifique de Luo 
                                    1 

8 Geospatial profile of Kinshasa, Democratic Republic of Congo’s capital

Click to expand R code
```{r}
plot(st_geometry(st_transform(district, crs = st_crs(provinces_nad83))))
plot(st_geometry(provinces_nad83[provinces$shapeName == "Kinshasa", ]), 
     add = TRUE, 
     col = "red")
```

9 A stark urban divide: Kinshasa’s core areas (Lukunga/Tshangu) operate at full capacity (90-100%) versus Maluku’s sub-30% occupancy in eastern outskirts.”

Click to expand R code
```{r}
b0 <- st_read("C:/Users/Alain/Downloads/CD-KN-b7bd0eae-20250601-fr-gpkg/data/boundary-polygon.gpkg")
b2 <- st_read("C:/Users/Alain/Downloads/CD-KN-b7bd0eae-20250601-fr-gpkg/data/boundary-polygon-lvl2.gpkg")
b4 <- st_read("C:/Users/Alain/Downloads/CD-KN-b7bd0eae-20250601-fr-gpkg/data/boundary-polygon-lvl4.gpkg")
b7 <- st_read("C:/Users/Alain/Downloads/CD-KN-b7bd0eae-20250601-fr-gpkg/data/boundary-polygon-lvl7.gpkg")
b8 <- st_read("C:/Users/Alain/Downloads/CD-KN-b7bd0eae-20250601-fr-gpkg/data/boundary-polygon-lvl8.gpkg")
b9 <- st_read("C:/Users/Alain/Downloads/CD-KN-b7bd0eae-20250601-fr-gpkg/data/boundary-polygon-lvl9.gpkg")

target_crs <- 3857  

b0 <- st_transform(b0, crs = target_crs)
b2 <- st_transform(b2, crs = target_crs)
b4 <- st_transform(b4, crs = target_crs)
b7 <- st_transform(b7, crs = target_crs)
b8 <- st_transform(b8, crs = target_crs)
b9 <- st_transform(b9, crs = target_crs)

boundaries <- rbind(b0, b2, b4, b7, b8, b9)
nb_col <- length(unique(boundaries$NAME))
palette <- colorRampPalette(brewer.pal(9, "Set3"))(nb_col)
```
Reading layer `boundary-polygon' from data source 
  `C:\Users\Alain\Downloads\CD-KN-b7bd0eae-20250601-fr-gpkg\data\boundary-polygon.gpkg' 
  using driver `GPKG'
Simple feature collection with 234 features and 25 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 15.12706 ymin: -5.032536 xmax: 16.53412 ymax: -3.927611
Geodetic CRS:  WGS 84
Reading layer `boundary-polygon-lvl2' from data source 
  `C:\Users\Alain\Downloads\CD-KN-b7bd0eae-20250601-fr-gpkg\data\boundary-polygon-lvl2.gpkg' 
  using driver `GPKG'
Simple feature collection with 1 feature and 25 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 15.12706 ymin: -5.032536 xmax: 16.53412 ymax: -3.927611
Geodetic CRS:  WGS 84
Reading layer `boundary-polygon-lvl4' from data source 
  `C:\Users\Alain\Downloads\CD-KN-b7bd0eae-20250601-fr-gpkg\data\boundary-polygon-lvl4.gpkg' 
  using driver `GPKG'
Simple feature collection with 1 feature and 25 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 15.12706 ymin: -5.032536 xmax: 16.53412 ymax: -3.927611
Geodetic CRS:  WGS 84
Reading layer `boundary-polygon-lvl7' from data source 
  `C:\Users\Alain\Downloads\CD-KN-b7bd0eae-20250601-fr-gpkg\data\boundary-polygon-lvl7.gpkg' 
  using driver `GPKG'
Simple feature collection with 24 features and 25 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 15.12982 ymin: -5.032536 xmax: 16.53412 ymax: -3.927611
Geodetic CRS:  WGS 84
Reading layer `boundary-polygon-lvl8' from data source 
  `C:\Users\Alain\Downloads\CD-KN-b7bd0eae-20250601-fr-gpkg\data\boundary-polygon-lvl8.gpkg' 
  using driver `GPKG'
Simple feature collection with 197 features and 25 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 15.20606 ymin: -4.440094 xmax: 15.42763 ymax: -4.296267
Geodetic CRS:  WGS 84
Reading layer `boundary-polygon-lvl9' from data source 
  `C:\Users\Alain\Downloads\CD-KN-b7bd0eae-20250601-fr-gpkg\data\boundary-polygon-lvl9.gpkg' 
  using driver `GPKG'
Simple feature collection with 11 features and 25 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 15.21345 ymin: -4.398089 xmax: 15.37933 ymax: -4.320823
Geodetic CRS:  WGS 84
Click to expand R code
```{r}
ggplot(boundaries) +
  geom_sf(aes(fill = NAME), color = "black", size = 0.2, show.legend = FALSE) +
  scale_fill_manual(values = palette) +
  labs(title = "Communes, quartiers et districts de Kinshasa",
       subtitle = "Chaque entité est représentée par une couleur distincte") +
  annotation_scale(location = "bl") +
  annotation_north_arrow(location = "tl", which_north = "true",
                         style = north_arrow_fancy_orienteering()) +
  theme_minimal() +
  
  # 🏷️ Affichage des noms des entités
  geom_sf_text(aes(label = NAME), size = 2.5, color = "black", check_overlap = TRUE)
```

Click to expand R code
```{r}
commune_zoom <- boundaries %>%
  filter(NAME %in% c("Gombe", "Limete", "Kintambo", "Bandalungwa"))

# Create a manual color palette for the 4 targeted communes
palette_zoom <- setNames(c("red", "blue", "green", "orange"),
                         c("Gombe", "Limete", "Kintambo", "Bandalungwa"))

ggplot(commune_zoom) +
  geom_sf(aes(fill = NAME), color = "black", size = 0.2, show.legend = FALSE) +
  scale_fill_manual(values = palette_zoom) +
  labs(
    title = "Zoom on Selected Communes of Kinshasa",
    subtitle = "Communes: Gombe, Limete, Kintambo, Bandalungwa"
  ) +
  annotation_scale(location = "bl") +
  annotation_north_arrow(location = "tl", which_north = "true",
                         style = north_arrow_fancy_orienteering()) +
  theme_minimal() +
  geom_sf_text(aes(label = NAME), size = 2.5, color = "black", check_overlap = TRUE)
```

Click to expand R code
```{r}
# 📥 Charger les données adolescents affectes par les Infections carabines
ado <- read_excel("C:/Users/Alain/Downloads/ADO&JEUNES 2023.xlsx")
str(ado)
```
tibble [28 × 4] (S3: tbl_df/tbl/data.frame)
 $ Indicateur: chr [1:28] NA "Somme de Valeur" "Étiquettes de lignes" "Bas-Uele" ...
 $ (Tous)    : chr [1:28] NA "Étiquettes de colonnes" "Féminin" "13993" ...
 $ ...3      : chr [1:28] NA NA "Masculin" "10896" ...
 $ ...4      : chr [1:28] NA NA "Total général" "24889" ...
Click to expand R code
```{r}
ado_large <- ado %>%
  rename(
  Province = Indicateur,
    Femme = `(Tous)`,
    Homme = `...3`,
    Total_general = `...4`
  ) %>%
  filter(!is.na(Province)) %>%
  mutate(
    Femme = as.numeric(Femme),
    Homme = as.numeric(Homme),
    Total_general = as.numeric(Total_general)
  ) %>%
  drop_na(Femme, Homme, Total_general) %>%  # ✅ Supprimer les lignes avec au moins un NA
  distinct() %>%
  arrange(Province)

# Affichage final
print(ado_large)
glimpse(ado_large)



# Transformation en format long avec gestion des doublons
ado_long <- ado_large %>%
  distinct() %>%  # Élimine les doublons complets
  pivot_longer(
    cols = c(Femme, Homme),
    names_to = "Gender",
    values_to = "valeur"
  ) %>%
  mutate(
    Gender = as_factor(Gender) %>%  # Convertir en facteur
      fct_relevel("Femme", "Homme") %>%  # Ordonner les niveaux
      fct_recode("Female" = "Femme", "Male" = "Homme"),  # Renommer en anglais
    .after = Province  # Placer la colonne Gender après Province
  ) %>%
  distinct(Province, Gender, .keep_all = TRUE) %>%  # Éviter les doublons par combinaison Province-Gender
  select(Province, Gender, valeur, Total_general)  # Sélection des colonnes

# 👀 Afficher le résultat
glimpse(ado_long)
View(ado_long)

ado_clean <- ado_long[complete.cases(ado_long), ]
```
# A tibble: 25 × 4
   Province         Femme   Homme Total_general
   <chr>            <dbl>   <dbl>         <dbl>
 1 Bas-Uele         13993   10896         24889
 2 Haut-Katanga   2435551 2124191       4559742
 3 Haut-Lomami     111958   35086        147044
 4 Haut-Uele        80677   54107        134784
 5 Ituri            40843   36467         77310
 6 Kasaï           139262  112649        251911
 7 Kasaï-Central    78996   66261        145257
 8 Kasaï-Oriental  520251  315834        836085
 9 Kinshasa        207106  233067        440173
10 Kongo-Central    86683   77415        164098
# ℹ 15 more rows
Rows: 25
Columns: 4
$ Province      <chr> "Bas-Uele", "Haut-Katanga", "Haut-Lomami", "Haut-Uele", …
$ Femme         <dbl> 13993, 2435551, 111958, 80677, 40843, 139262, 78996, 520…
$ Homme         <dbl> 10896, 2124191, 35086, 54107, 36467, 112649, 66261, 3158…
$ Total_general <dbl> 24889, 4559742, 147044, 134784, 77310, 251911, 145257, 8…
Rows: 50
Columns: 4
$ Province      <chr> "Bas-Uele", "Bas-Uele", "Haut-Katanga", "Haut-Katanga", …
$ Gender        <fct> Female, Male, Female, Male, Female, Male, Female, Male, …
$ valeur        <dbl> 13993, 10896, 2435551, 2124191, 111958, 35086, 80677, 54…
$ Total_general <dbl> 24889, 24889, 4559742, 4559742, 147044, 147044, 134784, …
Click to expand R code
```{r}
pop_total_Rdc <- c(
  "Bas-Uele" = 1419000,
  "Équateur" = 1856000,
  "Haut-Katanga" = 5378000,
  "Haut-Lomami" = 2842000,
  "Haut-Uele" = 2614000,
  "Ituri" = 4392000,
  "Kasaï" = 3199000,
  "Kasaï-Central" = 3743000,
  "Kasaï-Oriental" = 3145000,
  "Kinshasa" = 17071000,
  "Kongo-Central" = 6365000,
  "Kwango" = 2416000,
  "Kwilu" = 6149000,
  "Lomami" = 2842000,
  "Lualaba" = 3138000,
  "Mai-Ndombe" = 2482000,
  "Mongala" = 2358000,
  "Nord-Kivu" = 8103000,
  "Nord-Ubangi" = 1482000,
  "Sankuru" = 2478000,
  "Sud-Kivu" = 6565000,
  "Sud-Ubangi" = 2614000,
  "Tanganyika" = 3561000,
  "Tshopo" = 3113000,
  "Tshuapa" = 1887000
)

# 1. Converting Vector to Data Frame in R
pop_data <- data.frame(
  Province = names(pop_total_Rdc),
  pop_total_Rdc = pop_total_Rdc,
  row.names = NULL
)

View(pop_data)
# 2. Data Merging with ado_clean
# Adding the "Adolescents per 1,000 Inhabitants" Column
```

10 Cleaning province names to avoid formatting differences

Click to expand R code
```{r}
pop_data <- pop_data %>%
  mutate(Province = str_trim(str_to_title(Province)))  # Standardizing Names in the Dataset

ado_clean <- ado_clean %>%
  mutate(Province = str_trim(str_to_title(Province)))  
# Name Harmonization

#Merging the Two Data Frames on the "Province" Column
merged_data <- full_join(pop_data, ado_clean, by = "Province") %>%
  distinct()  # Supprimer les doublons

# Data Structure Verification After Merging
glimpse(merged_data)


# Displaying a Data Preview
head(merged_data)
View(merged_data)
```
Rows: 53
Columns: 5
$ Province      <chr> "Bas-Uele", "Bas-Uele", "Équateur", "Équateur", "Haut-Ka…
$ pop_total_Rdc <dbl> 1419000, 1419000, 1856000, 1856000, 5378000, 5378000, 28…
$ Gender        <fct> Female, Male, Female, Male, Female, Male, Female, Male, …
$ valeur        <dbl> 13993, 10896, 12622, 16048, 2435551, 2124191, 111958, 35…
$ Total_general <dbl> 24889, 24889, 28670, 28670, 4559742, 4559742, 147044, 14…
      Province pop_total_Rdc Gender  valeur Total_general
1     Bas-Uele       1419000 Female   13993         24889
2     Bas-Uele       1419000   Male   10896         24889
3     Équateur       1856000 Female   12622         28670
4     Équateur       1856000   Male   16048         28670
5 Haut-Katanga       5378000 Female 2435551       4559742
6 Haut-Katanga       5378000   Male 2124191       4559742
Click to expand R code
```{r}
province_code <- c(
  "Kinshasa" = 10, "Kongo-Central" = 20, "Kwango" = 302, "Kwilu" = 303, "Mai-Ndombe" = 305,
  "Tshuapa" = 402, "Mongala" = 403, "Nord-Ubangi" = 405, "Sud-Ubangi" = 406, "Équateur" = 407,
  "Tshopo" = 502, "Haut-Uele" = 503, "Bas-Uele" = 504, "Ituri" = 505, "Haut-Lomami" = 61,
  "Lomami" = 62, "Kasaï" = 63, "Kasaï-Central" = 704, "Kasaï-Oriental" = 705, "Sankuru" = 706,
  "Maniema" = 707, "Tanganyika" = 802, "Haut-Katanga" = 803, "Lualaba" = 804, "Sud-Kivu" = 904, 
  "Nord-Kivu" = 903
)

# Updating merged_data with Province Codes
#Adding the "code" Column Using cbind()
merged_data <- cbind(merged_data, code = province_code[merged_data$Province])

#Removing Rows with NA Values to Prevent Errors
merged_data <- merged_data[complete.cases(merged_data), ]
```

11 Solution using unique()

Click to expand R code
```{r}
ado_ratios <- merged_data %>%
  # Step 1: Keeping Only Unique Province-Gender Combinations
  group_by(Province, Gender) %>%
  filter(row_number() == 1) %>%  
  ungroup() %>%
  
  
  group_by(Province, Gender) %>%
  mutate(
    ratio_per_1000 = (valeur / first(Total_general)) * 1000
  ) %>%
  ungroup()

stopifnot(
  "There are still duplicates" = !any(duplicated(ado_ratios[, c("Province", "Gender")]))
)
```

13 Data preparation (calculation of the proportion of females)

Click to expand R code
```{r}
carte_bivariee <- ado_ratios %>%
  group_by(code) %>%
  summarise(
    prop_female = sum(valeur[Gender == "Female"]) / sum(valeur),
    valeur_tot = sum(valeur)
  ) %>%
  left_join(banderies_nad83, by = c("code" = "CODE_INS")) %>%
  st_as_sf()


ggplot() +
  geom_sf(
    data = carte_bivariee,
    aes(fill = prop_female, alpha = valeur_tot),
    color = "white", size = 0.3
  ) +
  

  scale_fill_gradient2(
    low = "#E41A1C", mid = "#F7F7F7", high = "#377EB8",
    midpoint = 0.5, name = "Percentage of Females"
  ) +
  scale_alpha_continuous(range = c(0.3, 0.9), name = "Total Intensity") +
  

  theme_void() +
  labs(title = "Intensity and Gender Distribution by Province")
```

Click to expand R code
```{r}
# 1. 📥 Load and clean the adolescent health data
ado <- read_excel("C:/Users/Alain/Downloads/ADO&JEUNES 2023.xlsx") %>%
  rename(
    Province = Indicateur,
    Female = `(Tous)`,
    Male = `...3`,
    Total_general = `...4`
  ) %>%
  filter(!is.na(Province)) %>%
  mutate(
    Female = as.numeric(Female),
    Male = as.numeric(Male),
    Total_general = as.numeric(Total_general)
  ) %>%
  drop_na(Female, Male, Total_general)

# 2. 🌍 Add simulated coordinates for provinces (e.g., approximate centroids)
set.seed(123)
ado$lon <- runif(nrow(ado), min = 21, max = 31)  
ado$lat <- runif(nrow(ado), min = -13, max = 5)  

# 3. 🎯 Select a 20% random sample of provinces
set.seed(456)
ado$u <- runif(nrow(ado))
ado_small <- ado %>% filter(u <= 0.2)

# 4. 🔄 Convert to sf (projected geographic coordinate system)
ado_small.sf <- st_as_sf(ado_small, coords = c("lon", "lat"), crs = 4326)
ado_small.sf <- st_transform(ado_small.sf, crs = 3857)

# 5. 🗺️ Define the spatial window (based on observed coordinates)
W <- as.owin(st_bbox(ado_small.sf))
rsc <- 1000  # rescaling factor (1 unit = 1 km)

# 6. 🔘 Create spatial patterns
# Unmarked pattern
pattern.um <- ppp(
  x = st_coordinates(ado_small.sf)[,1],
  y = st_coordinates(ado_small.sf)[,2],
  window = W
)
pattern.um <- rescale(pattern.um, rsc, "km")
plot(pattern.um, main = "Affected adolescents (unmarked pattern)")

# Marked pattern: pivot_longer to split by gender
ado_long <- ado_small %>%
  pivot_longer(cols = c(Female, Male), names_to = "Gender", values_to = "value")

ado_long.sf <- st_as_sf(ado_long, coords = c("lon", "lat"), crs = 4326) %>%
  st_transform(crs = 3857)

pattern.m <- ppp(
  x = st_coordinates(ado_long.sf)[,1],
  y = st_coordinates(ado_long.sf)[,2],
  window = W,
  marks = as.factor(ado_long.sf$Gender)
)
pattern.m <- rescale(pattern.m, rsc, "km")
plot(pattern.m, main = "Affected adolescents by gender (marked pattern)")

# 7. 📏 Distance analysis
between_dist <- pairdist(pattern.um)
between_dist_df <- as.data.frame(between_dist)
between_dist_df$min_d <- apply(between_dist_df, 1, function(x) sort(x)[2])

first_neib <- nndist(pattern.um)

# 8. 🗺️ Display map of DRC with case intensity
rdc <- ne_states(country = "Democratic Republic of the Congo", returnclass = "sf")
rdc_83 <- st_transform(rdc, crs = 3857)

# Join total cases to map
carte_rdc <- left_join(rdc_83, ado, by = c("name" = "Province"))

# 9. 🖼️ Choropleth visualization
ggplot() +
  geom_sf(data = carte_rdc, aes(fill = Total_general), color = "white") +
  scale_fill_viridis_c(name = "Reported cases") +
  labs(
    title = "Distribution of affected adolescents in DRC",
    subtitle = "Total number of cases per province",
    caption = "Source: ADO & JEUNES 2023"
  ) +
  theme_minimal()
```