This is a Markdown document to assign UN’s regional classification to DHS countries (or any group of countries). In multi-country studies, regional classification is often an important variable. Different classification methods are used per various purposes, but a widely used standard classification is from UN Statistics Division’s “standard country or area codes for statistical use (M49 standard)”. The M49 standard classification is used as reference in this markdown. The markdown file is available at GitHub.
There are three sections in this document: 1. getting DHS country list, 2. getting UNSD’s list, and 3. merging the two.
See DHS website for countries that have conducted DHS surveys: (https://www.dhsprogram.com/Where-We-Work/).
Figure 1. Countries where DHS surveys have been conducted.
Access the list of countries from DHS API (http://api.dhsprogram.com/rest/dhs/countries?f=html). The API data also include regional classification variables (“region” and “sub-region”) that are used for the survey program purposes, which largely match with the UN classification.
url<-("http://api.dhsprogram.com/rest/dhs/countries?f=json")
suppressPackageStartupMessages(library(jsonlite)) # for fromJSON
suppressPackageStartupMessages(library(data.table)) # for data.table
suppressPackageStartupMessages(library(dplyr))
# read DHS API country list
jsondata<-fromJSON(url)
# create data frame with countries
ctry_DHS<-data.table(jsondata$Data)
# tidy up
ctry_DHS<-ctry_DHS %>%
rename (country = CountryName) %>%
rename (DHSregion1 = RegionName) %>%
rename (DHSregion2 = SubregionName) %>%
select (country, DHSregion1, DHSregion2)
Access the UNSD’s list and classification: (https://unstats.un.org/unsd/methodology/m49/). Then, scrape the web table. See more information about this scraping in the third section of this RPubs.
url<-("https://unstats.un.org/unsd/methodology/m49/")
suppressPackageStartupMessages(library(rvest))
suppressPackageStartupMessages(library(dplyr))
# data table scraping from the web
ctry_UNSD<-read_html(url) %>%
html_nodes("table") %>%
.[[7]] %>%
html_table(header = TRUE)
# tidy up
ctry_UNSD<-ctry_UNSD %>%
rename (country = "Country or Area") %>%
rename (M49 = "M49 code") %>%
rename (ISOalpha3 = "ISO-alpha3 code") %>%
select(country, M49, ISOalpha3)
Because the web table presents the region and country names in different row (see Figure 2), assign the lowest-level classification to individual countries. From there, higher level aggregation/grouping can be made as needed. Finally, replace county names as needed to merge with DHS country names.
Figure 2. Geographic regions table on the UNSD website (snapstop)
# Assign sub-region names
head(ctry_UNSD, 20)
str(ctry_UNSD)
ctry_UNSD$country<-as.character(ctry_UNSD$country)
ctry_UNSD<-ctry_UNSD %>%
mutate(
UNSDsubregion=country,
UNSDsubregion=ifelse(ISOalpha3!="", "", UNSDsubregion)
)
for (i in 1:nrow(ctry_UNSD)){
if (ctry_UNSD[i,4]==""){
ctry_UNSD[i,4]=ctry_UNSD[i-1,4]
}}
# Keep only country rows and replace country names as needed
ctry_UNSD<-ctry_UNSD %>%
filter(ISOalpha3!="") %>%
select(country, UNSDsubregion) %>%
mutate(
country = ifelse(country == "Bolivia (Plurinational State of)", "Bolivia", country) ,
country = ifelse(country == "Cabo Verde", "Cape Verde", country) ,
country = ifelse(country == "Democratic Republic of the Congo", "Congo Democratic Republic", country) ,
country = ifelse(country == "Côte d'Ivoire", "Cote d'Ivoire", country) ,
country = ifelse(country == "Kyrgyzstan", "Kyrgyz Republic", country) ,
country = ifelse(country == "Republic of Moldova", "Moldova", country) ,
country = ifelse(country == "United Republic of Tanzania", "Tanzania", country) ,
country = ifelse(country == "Viet Nam", "Vietnam", country)
)
suppressPackageStartupMessages(library(Hmisc))
label(ctry_UNSD$UNSDsubregion) <- "Sub-region, UNSD Methodology 49"
Assess the two lists/data.
dim(ctry_DHS)
dim(ctry_UNSD)
names(ctry_DHS)
names(ctry_UNSD)
obsDHS<-nrow(ctry_DHS)
obsUNSD<-nrow(ctry_UNSD)
There are 91 “DHS countries”, and they need to be merged with the UNSD’s list of 248 countries.
ctry<-left_join(ctry_DHS, ctry_UNSD, by = "country")
Make sure there the merged data have 91 countries, and no missing “UNSDsubregion”. Especially, check ‘Cote d’Ivoire’ (which often seems to have problems in merging based on the country name), has UNSD subregion value. If not, assign it.
nrow(ctry)
[1] 91
table(ctry$UNSDsubregion, exclude = NULL)#there should be no <NA>
Caribbean Central America Central Asia
3 5 5
Eastern Africa Eastern Europe Melanesia
13 2 1
Middle Africa Northern Africa Polynesia
9 4 1
South-eastern Asia South America Southern Africa
8 7 5
Southern Asia Southern Europe Western Africa
7 1 14
Western Asia <NA>
5 1
test<-filter(ctry, is.na(UNSDsubregion))
table(test$country)#there should be none! Why does this problem persist??
Cote d'Ivoire
1
# replace UNSDsubregion
ctry<- mutate(ctry,
UNSDsubregion=ifelse(country=="Cote d'Ivoire",
"Western Africa",
UNSDsubregion) )
table(ctry$UNSDsubregion, exclude = NULL)#there should be no <NA>
Caribbean Central America Central Asia
3 5 5
Eastern Africa Eastern Europe Melanesia
13 2 1
Middle Africa Northern Africa Polynesia
9 4 1
South-eastern Asia South America Southern Africa
8 7 5
Southern Asia Southern Europe Western Africa
7 1 15
Western Asia
5
Finally, a higher-level classification can be created depending on study purposes. In this example, a three-category grouping is done.
ctry<-ctry %>%
mutate(
studyregion="",
studyregion=ifelse(UNSDsubregion=="Middle Africa" |
UNSDsubregion=="Western Africa",
"Centera and Western Africa", studyregion),
studyregion=ifelse(UNSDsubregion=="Eastern Africa" |
UNSDsubregion=="Southern Africa",
"Southern and Eastern Africa", studyregion),
studyregion=ifelse(studyregion=="",
"Other Regions", studyregion)
)
addmargins(table(ctry$UNSDsubregion, ctry$studyregion))
Centera and Western Africa Other Regions
Caribbean 0 3
Central America 0 5
Central Asia 0 5
Eastern Africa 0 0
Eastern Europe 0 2
Melanesia 0 1
Middle Africa 9 0
Northern Africa 0 4
Polynesia 0 1
South-eastern Asia 0 8
South America 0 7
Southern Africa 0 0
Southern Asia 0 7
Southern Europe 0 1
Western Africa 15 0
Western Asia 0 5
Sum 24 49
Southern and Eastern Africa Sum
Caribbean 0 3
Central America 0 5
Central Asia 0 5
Eastern Africa 13 13
Eastern Europe 0 2
Melanesia 0 1
Middle Africa 0 9
Northern Africa 0 4
Polynesia 0 1
South-eastern Asia 0 8
South America 0 7
Southern Africa 5 5
Southern Asia 0 7
Southern Europe 0 1
Western Africa 0 15
Western Asia 0 5
Sum 18 91
Acknowledgement: Trevor Croft at ICF International made helpful suggestions for the UNSD source and the web scraping code.