This is a Markdown document to assign UN’s regional classification to DHS countries (or any group of countries). In multi-country studies, regional classification is often an important variable. Different classification methods are used per various purposes, but a widely used standard classification is from UN Statistics Division’s “standard country or area codes for statistical use (M49 standard)”. The M49 standard classification is used as reference in this markdown. The markdown file is available at GitHub.

There are three sections in this document: 1. getting DHS country list, 2. getting UNSD’s list, and 3. merging the two.

1. DHS country list

See DHS website for countries that have conducted DHS surveys: (https://www.dhsprogram.com/Where-We-Work/).

Figure 1. Countries where DHS surveys have been conducted. Alt text

Access the list of countries from DHS API (http://api.dhsprogram.com/rest/dhs/countries?f=html). The API data also include regional classification variables (“region” and “sub-region”) that are used for the survey program purposes, which largely match with the UN classification.

url<-("http://api.dhsprogram.com/rest/dhs/countries?f=json")

suppressPackageStartupMessages(library(jsonlite)) # for fromJSON
suppressPackageStartupMessages(library(data.table)) # for data.table
suppressPackageStartupMessages(library(dplyr))

# read DHS API country list 
jsondata<-fromJSON(url) 
# create data frame with countries 
ctry_DHS<-data.table(jsondata$Data)
# tidy up
ctry_DHS<-ctry_DHS %>%
    rename (country =   CountryName) %>% 
    rename (DHSregion1  =   RegionName) %>%
    rename (DHSregion2  =   SubregionName) %>%
    select (country, DHSregion1, DHSregion2)

2. UNSD country list

Access the UNSD’s list and classification: (https://unstats.un.org/unsd/methodology/m49/). Then, scrape the web table. See more information about this scraping in the third section of this RPubs.

url<-("https://unstats.un.org/unsd/methodology/m49/")

suppressPackageStartupMessages(library(rvest))
suppressPackageStartupMessages(library(dplyr))

# data table scraping from the web 
ctry_UNSD<-read_html(url) %>% 
    html_nodes("table") %>%
    .[[7]] %>% 
    html_table(header = TRUE)     
# tidy up
ctry_UNSD<-ctry_UNSD %>%
    rename (country =   "Country or Area") %>% 
    rename (M49 =   "M49 code") %>% 
    rename (ISOalpha3   =   "ISO-alpha3 code") %>% 
    select(country, M49, ISOalpha3) 

Because the web table presents the region and country names in different row (see Figure 2), assign the lowest-level classification to individual countries. From there, higher level aggregation/grouping can be made as needed. Finally, replace county names as needed to merge with DHS country names.

Figure 2. Geographic regions table on the UNSD website (snapstop) Alt text

# Assign sub-region names
head(ctry_UNSD, 20)
str(ctry_UNSD)
ctry_UNSD$country<-as.character(ctry_UNSD$country)
ctry_UNSD<-ctry_UNSD %>% 
    mutate(
    UNSDsubregion=country,
    UNSDsubregion=ifelse(ISOalpha3!="", "", UNSDsubregion)
    )
for (i in 1:nrow(ctry_UNSD)){
    if (ctry_UNSD[i,4]==""){
    ctry_UNSD[i,4]=ctry_UNSD[i-1,4]
    }}
# Keep only country rows and replace country names as needed
ctry_UNSD<-ctry_UNSD %>% 
    filter(ISOalpha3!="") %>% 
    select(country, UNSDsubregion) %>% 
    mutate(
    country = ifelse(country == "Bolivia (Plurinational State of)", "Bolivia", country) ,
    country = ifelse(country == "Cabo Verde", "Cape Verde", country) , 
    country = ifelse(country == "Democratic Republic of the Congo", "Congo Democratic Republic", country) ,
    country = ifelse(country == "Côte d'Ivoire", "Cote d'Ivoire", country) ,
    country = ifelse(country == "Kyrgyzstan", "Kyrgyz Republic", country) , 
    country = ifelse(country == "Republic of Moldova", "Moldova", country) , 
    country = ifelse(country == "United Republic of Tanzania", "Tanzania", country) ,
    country = ifelse(country == "Viet Nam", "Vietnam", country) 
    )

suppressPackageStartupMessages(library(Hmisc))
label(ctry_UNSD$UNSDsubregion) <- "Sub-region, UNSD Methodology 49"

3. Merge DHS and UNSD M49 lists

Assess the two lists/data.

dim(ctry_DHS)
dim(ctry_UNSD)
names(ctry_DHS)
names(ctry_UNSD)
obsDHS<-nrow(ctry_DHS)
obsUNSD<-nrow(ctry_UNSD)

There are 91 “DHS countries”, and they need to be merged with the UNSD’s list of 248 countries.

ctry<-left_join(ctry_DHS, ctry_UNSD, by = "country")

Make sure there the merged data have 91 countries, and no missing “UNSDsubregion”. Especially, check ‘Cote d’Ivoire’ (which often seems to have problems in merging based on the country name), has UNSD subregion value. If not, assign it.

nrow(ctry)
[1] 91
table(ctry$UNSDsubregion, exclude = NULL)#there should be no <NA>

         Caribbean    Central America       Central Asia 
                 3                  5                  5 
    Eastern Africa     Eastern Europe          Melanesia 
                13                  2                  1 
     Middle Africa    Northern Africa          Polynesia 
                 9                  4                  1 
South-eastern Asia      South America    Southern Africa 
                 8                  7                  5 
     Southern Asia    Southern Europe     Western Africa 
                 7                  1                 14 
      Western Asia               <NA> 
                 5                  1 
    test<-filter(ctry, is.na(UNSDsubregion))
    table(test$country)#there should be none! Why does this problem persist?? 

Cote d'Ivoire 
            1 
# replace UNSDsubregion 
ctry<- mutate(ctry, 
    UNSDsubregion=ifelse(country=="Cote d'Ivoire", 
                         "Western Africa", 
                         UNSDsubregion) )
table(ctry$UNSDsubregion, exclude = NULL)#there should be no <NA>

         Caribbean    Central America       Central Asia 
                 3                  5                  5 
    Eastern Africa     Eastern Europe          Melanesia 
                13                  2                  1 
     Middle Africa    Northern Africa          Polynesia 
                 9                  4                  1 
South-eastern Asia      South America    Southern Africa 
                 8                  7                  5 
     Southern Asia    Southern Europe     Western Africa 
                 7                  1                 15 
      Western Asia 
                 5 

Finally, a higher-level classification can be created depending on study purposes. In this example, a three-category grouping is done.

ctry<-ctry %>%
    mutate(
    studyregion="", 
    studyregion=ifelse(UNSDsubregion=="Middle Africa" | 
                          UNSDsubregion=="Western Africa", 
                     "Centera and Western Africa", studyregion), 
    studyregion=ifelse(UNSDsubregion=="Eastern Africa" | 
                          UNSDsubregion=="Southern Africa", 
                     "Southern and Eastern Africa", studyregion), 
    studyregion=ifelse(studyregion=="", 
                     "Other Regions", studyregion)
    )
addmargins(table(ctry$UNSDsubregion, ctry$studyregion))    
                    
                     Centera and Western Africa Other Regions
  Caribbean                                   0             3
  Central America                             0             5
  Central Asia                                0             5
  Eastern Africa                              0             0
  Eastern Europe                              0             2
  Melanesia                                   0             1
  Middle Africa                               9             0
  Northern Africa                             0             4
  Polynesia                                   0             1
  South-eastern Asia                          0             8
  South America                               0             7
  Southern Africa                             0             0
  Southern Asia                               0             7
  Southern Europe                             0             1
  Western Africa                             15             0
  Western Asia                                0             5
  Sum                                        24            49
                    
                     Southern and Eastern Africa Sum
  Caribbean                                    0   3
  Central America                              0   5
  Central Asia                                 0   5
  Eastern Africa                              13  13
  Eastern Europe                               0   2
  Melanesia                                    0   1
  Middle Africa                                0   9
  Northern Africa                              0   4
  Polynesia                                    0   1
  South-eastern Asia                           0   8
  South America                                0   7
  Southern Africa                              5   5
  Southern Asia                                0   7
  Southern Europe                              0   1
  Western Africa                               0  15
  Western Asia                                 0   5
  Sum                                         18  91

Acknowledgement: Trevor Croft at ICF International made helpful suggestions for the UNSD source and the web scraping code.