(Last updated on 2024-09-30)

Analysis flow

The objective of this file is to describe how various folders and files work to create data visualization outputs for the portfolio reviews. All outputs are fully reproducible, and automated as much as possible.

A. Folder structure

The main folder, Analysis_MCHN_PR, has 4 first-level subfolders. This markdown file, PCMD_portfolio_review_data_analysis_viz_note.Rmd (in blue circle above), is located in the main folder. The subfolders are:

1_MCHN_PR_guide

It has an excel file, MCHN Portfolio Review Indicators & Use Cases.xlsx, the Google spreadsheet that Vanessa has developed (and now transferred to/owned by Sue, as of September 26, 2024) (https://docs.google.com/spreadsheets/d/1h4PvflAa4xFhkKU03PmBMDFGvHAWmgv0xI2z07PW9nQ/edit?gid=68868151#gid=68868151). It is downloaded to a local machine, because remotely accessing Google sheet can be error prone with the security setting. Data visualization is guided by and aligned with the spreadsheet, specifically “Draft 2-DEVELOP use cases” tab. In addition, two specific columns are used in this markdown to create the title and text of each “slide”.

2_MCHN_PR_data

  • 1_data_source: It has source data that cannot be accessed via API (or so) and/or data that are updated only annually (thus, it is easy to download/save once a year).
    NOTE: For sources that are revised annually (e.g., IGME, WPP, MMEIG estimates), keep the latest version in the top directory and archive older versions. Annex shows version/date of access for each of them.
  • 2_code_to_create_data_for_viz: It has markdown files that (1) access source data (see appendix for more note on data sources) and/or (2) create analysis data that are ready for visualization. All markdown files in this subfolder can be run by this markdown file (see below “C. Running prep markdown files”) - i.e., no need to run individual files separately.
  • 3_data_for_viz: It has analysis data files created by various markdown files in 2_code_to_create_data_for_viz. Only these data files are used to create output slides.
  • 4_references: Finally, this folder has select references about quality of care, the use case 5. Since there are no/few publicly available or standardized data, we do not have much data visualization like in use cases 1-4. Therefore, for the use case 5, we include research study findings - which may or may not be the country of review - and available tools to measure the metrics. Screenshots of various studies and tools are saved in this subfolder.

3_MCHN_PR_code
This is a folder where code files (R markdown files) are saved (in red circle above):

  • PCMD_portfolio_review_PPT_[number].Rmd: They generate slides, our main output. Each data use case has its own slide set and markdown file. The [number] represent data use case number in the google spreadsheet.
  • PCMD_portfolio_review_X_file_organization.Rmd: This moves the main output from where they are generated (i.e., same folder where the above markdown is located) to where we want to save them (see below). It also renames output files with the country name. NOTE: It is strongly recommended to use one code for all countries and just define/change country name in the code.
  • Finally, it also has a USAID PPT template to use if we produce PPT with the markdown files. NOTE: Unfortunately, still further formatting is needed, as R has limited formatting ability when producing PPT. We should edit a slide master rather than individual slides. The template is currently not used, since it appears to require more changes in the slide master than not using the template.

NOTE: You can/will adapt these markdown files to recreate the outputs and scale up to more countries. See Section B.

4_MCHN_PR_output

  • 1_figures: It is designed to have figure image files that are included in the slides. Of note, it takes substantially longer time to export image files (see Section F, Step 10).
    NOTE: Nevertheless, this is useful if one wants to automatically pull images directly to other documents/outputs. Figure names are saved in MCHN Portfolio Review Indicators & Use Cases.xlsx, the Google spreadsheet.
  • 2_slides: Slidesets are saved here by country.

B. Adapting markdown file

You can download the main folder to your computer, and reproduce all files in:
* 2_MCHN_PR_data/3_data_for_viz/,
* 4_MCHN_PR_ouput/1_figures/,
* 4_MCHN_PR_ouput/2_slides/.
BUT, YOU MUST MAKE THE FOLLOWING EDITS in all markdown files - especially those in 3_MCHN_PR_code folder.

EDIT 1: REPLACE DIRECTORY FOR THE MAIN FOLDER PATHS

##### MAIN FOLDER IN YOUR COMPUTER #####
maindir<-c("~/Dropbox/0iSquared/iSquared_GHTAMS/GHTAMS_MCH/Analysis_MCHN_PR/")
# DO NOT EDIT ANYTHING HERE - AS LONG AS THE SUBFOLDER STRUCTURE IS SAME
sourcedatadir<-c(paste0(maindir,"2_MCHN_PR_data/1_data_source/"))
outdatadir<-c(paste0(maindir, "2_MCHN_PR_data/3_data_for_viz/"))
outfigdir<-c(paste0(maindir,"4_MCHN_PR_ouput/1_figures/"))
outslidedir<-c(paste0(maindir,"4_MCHN_PR_ouput/2_slides/"))

EDIT 2: DEFINE THE COUNTRY

##### THE REVIEW COUNTRY ##### 
countryname<-c("Ghana")
# countryname<-c("Kenya")

##### NAME OF THE LEVEL-1 ADMINISTRATIVE UNIT IN LOWER CASE #####
admin1unit<-c("region") #Ghana
admin1unitpl<-c("regions") #Ghana
# admin1unit<-c("county") #Kenya
# admin1unitpl<-c("counties") #Kenya

C. Running prep markdown files

The following code chunk runs each of markdown files in 2_MCHN_PR_data/2_code_to_create_data_for_viz/ to create data that will be used for visualization. All markdown files are annotated and structured similarly (see corresponding HTML files). All resulting data files are saved in 2_MCHN_PR_data/3_data_for_viz/. NOTE: Markdown files will be run in alphabetical order, and certain files are named with specific prefix to ensure they run first or last among all. Do not change markdown file names in the folder.

# NOTE: RUN THIS CHUNK PERIODICALLY TO REFRESH "3_data_for_viz" FOLDER

##### 1. Run "prep" markdown files #####
rmd_files<-list.files(path = "./2_MCHN_PR_data/2_code_to_create_data_for_viz",
                      pattern="*.Rmd") 
rmd_files

# Render/run all RMD files  
for (i in 1:length(rmd_files)) {
    render(paste0("./2_MCHN_PR_data/2_code_to_create_data_for_viz/", rmd_files[i]))
}

##### 2. Check all CSV datasets and date/time of creation ##### 
data_files<-list.files(path = "./2_MCHN_PR_data/3_data_for_viz",
                       pattern="*.csv")
data_files

# Check ctime 
for (i in 1:length(data_files)) {
    print(file.info(paste0("./2_MCHN_PR_data/3_data_for_viz/", data_files[i]))$ctime)
}

It is suppressed for now, but run this chunk periodically to refresh analysis data in Analysis/3_data_for_viz. Currently, it uses data that were last created/updated on 2024-09-06 20:06:36.

(NOTE: working on duplicate code chunk name issues, which may or may not be worth fixing…)

D. Define colors and functions for figures

Define USAID colors and more colors as well as commonly used functions for figures. Apply them for each figure, as needed.

Each markdown in 3_MCHN_PR_code folder has this section. No edit is needed.

USAID colors and their variations

#<color-palette name="USAID Colors" type="regular">
ucblue<-c("#002F6C")  #<!-- USAID Blue -->
ucmediumblue<-c("#0067B9")  #<!-- Medium Blue -->
uclightblue<-c("#A7C6ED")  #<!-- Light Blue -->
    
ucred<-c("#BA0C2F")  #<!-- USAID Red -->
ucdarkred<-c("#651D32")  #<!-- Dark Red -->
uclightred<-c("#9D6E7C") # Two tints down of ucdarkred
ucdarkgreen<-c("#1D6532") # GRB variation of USAID Dark Red
# ucdarkgreen<-c("#32651D") # BRG variation of USAID Dark Red
uclightgreen<-c("#8BB196") # Two tints down of ucdarkgreen https://www.htmlcsscolor.com/hex/1D6532

ucrichblack<-c("#212721")  #<!-- Rich Black -->
ucdarkgray<-c("#6C6463") #<!-- Dark Gray -->
ucmediumgray<-c("#8C8985")  #<!-- Medium Gray -->
uclightgray<-c("#CFCDC9")  #<!-- Light Gray -->

More colors from https://colorbrewer2.org

##### 1. Define color list #####
greycolors <- brewer.pal(7,"Greys")
bluecolors <- brewer.pal(7,"Blues")
greencolors <- brewer.pal(7,"Greens")
orangecolors <- brewer.pal(7,"Oranges")
redcolors <- brewer.pal(7,"Reds")
purplecolors <- brewer.pal(7,"Purples")
divcolors<-brewer.pal(9,"RdYlBu")
qualcolors<-brewer.pal(9,"Paired")

basic options/functions for figures

##### 2. Define functions for plot #####
hline <- function(y = 0, color = uclightgray) {
    list(
        type = "line",
        x0 = 0,
        x1 = 1,
        xref = "paper",
        y0 = y,
        y1 = y,
        line = list(color = color)
    )
}

vline <- function(x = 0, color = uclightgray) {
    list(
        type = "line",
        x0 = x,
        x1 = x,
        y0 = 0,
        y1 = 1,
        yref = "paper",
        line = list(color = color)
    )
}
##### 3. Define options for plot #####
marginlist <- list(l = 10, r = 10, b = 100, t = 100, pad = 0) 

E. Importing text for slides

In order to automate/standardize text to the extent possible, we are using specific columns in the the Google spreadsheet, MCHN Portfolio Review Indicators & Use Cases.xlsx in 1_MCHN_PR_guide.

NOTE: If the googlesheet is public (i.e., “Anyone on the internet with the link can view”), it can be imported directly to R without going through local computer. Currently, the access is limited, and, thus, we have to download the googlesheet to this local folder.

Each markdown in 3_MCHN_PR_code folder has this section. No edit is needed.

dtatext<-read_excel(paste0(maindir,
                           "1_MCHN_PR_guide/MCHN Portfolio Review Indicators & Use Cases.xlsx"),
                    sheet = "Use Case Template")
colnames(dtatext)<-tolower(colnames(dtatext))
dtatext<-dtatext%>%clean_names()

colnames(dtatext)

dtatext<-dtatext%>%
    # define columns that will be used for slides
    rename(
        title = data_search_questions,
        text = standard_text
    )%>%
    # select the country-specific context interpretation 
    mutate(    
        country = countryname, 
        context = "To be drafted",
        context = ifelse(country=="Ghana", ghana_context, context),
        context = ifelse(country=="Kenya", kenya_context, context)
    )%>%
    # replace [COUNTRY} with the country name, assigned above
    mutate(
        title = gsub("COUNTRYNAME", countryname, title),
        text  = gsub("COUNTRYNAME", countryname, text)
    )%>%
    select(title, text, context, contains(c("number", "image_name", "context")))

F. Mechanics of how each slide/figure is created

Now, it is time to produce each slide! Again, markdown files in 3_MCHN_PR_code create slides.

For all figures, the basic idea is same - presented step-by-step below, using an example from PCMD_portfolio_review_PPT_1.Rmd. Output text is shown in yellow boxes.

Step 1

Find the image_name from dtatext. This is wrangled guide excel - again, i.e., MCHN Portfolio Review Indicators & Use Cases.xlsx, in 1_MCHN_PR_guide folder. Filter only the image data/row, which includes title, standard text, etc. It is now called temp.
NOTE: image name is case sensitive. So, to reduce errors, do not mix upper and lower case for the name. Currently, only lower case is used.

# IMAGE NAME IN THE SPREADSHEET 
imagename<-c("cm_nmr_u5mr_trend_projection") 

# FILTER 
temptext<-dtatext%>%filter(image_name==imagename)

Step 2

Print the title for the image, temptext$title

What is the gap to the SDG target for U5MR and NMR?

Step 3

Print any standard introduction text for the image, temptext$text. Not every slide has this text.

[STANDARD TEXT]
SDG aims to reduce U5MR to 25 and NMR to 12 per 1000 by 2030.

Step 4

Import the appropriate data for the image from 2_MCHN_PR_data/1_data_source/ and filter only the country’s data. It is now called dta.

NOTE: dta is valid/used through out a markdown file - as long as only it uses only one source data. Compare it with dtafig below.

Also see Annex for detailed information on source data and data management for each of them

# IMPORT THE SOURCE DATA    
dta<-read.csv(paste0(outdatadir, 'dta_igme_mort_all_cause.csv'), header = TRUE)

# FILTER 
dta<-dta%>%
    filter(country %in% countryname)

# DEFINE ANY VALUES THAT MAY BE USED FOR PROGRAMMING       
igme_lastest_year<-dta%>%filter(is.na(u5mr)==FALSE)%>%select(year)%>%max()

Step 5

As needed, further manage dta to create a figure. In this example, we add SDG targets for child mortality. It is now called dtafig. (Note that in some figures it would be simply same with dta.)

NOTE: Unlike dta, dtafig is specific to a particular figure. It comes from dta but processed/prepared specifically (often differently) for each figure and used in Steps 6 and 9.

dtafig<-dta

dtafig<-dtafig%>%
    group_by(country)%>%
    group_modify(~ add_row(.x, .before=0))%>%
    ungroup()%>%
    mutate(
        # SDG target by 2030
        year=ifelse(is.na(year)==TRUE, 2030, year), 
        target_u5mr=NA,
        target_u5mr=ifelse(year==2030, 25, target_u5mr), 
        target_nmr=NA,
        target_nmr=ifelse(year==2030, 12, target_nmr)
    )

Step 6

Define values that will be used in automated interpretation text.

u5mr_latest<-round(dtafig%>%
                     filter(year==igme_lastest_year)%>%
                     select(u5mr), 0)

nmr_latest<-round(dtafig%>%
                    filter(year==igme_lastest_year)%>%
                    select(nmr), 0)

u5mr_2030<-round(dtafig%>%
                     filter(year==2030)%>%
                     filter(is.na(u5mr_projection)==FALSE)%>%
                     select(u5mr_projection), 0)

nmr_2030<-round(dtafig%>%
                    filter(year==2030)%>%
                    filter(is.na(nmr_projection)==FALSE)%>%
                    select(nmr_projection), 0)

Step 7

Print the automated text.

[AUTOMATED TEXT]
In 2022, U5MR was 42 and NMR was 21 per 1000 live births. In 2030, it is projected that U5MR will be 31 and NMR will be 17 per 1000 live births

Step 8

Add potential text to discuss the country specific findings. Note that this must be reviewed and, as needed, manually edited once the PPT is created in each country. The current text is based on a sample country, which may or may not be relevant for other countries for the Portfolio Review.

[REVIEW/EDIT COUNTRY SPECIFIC FINDINGS]
Ghana is off-track to meet both the U5MR and NMR SDG targets by 2030

Step 9

Write a code for a figure.

fig <- dtafig%>%
    ##### U5MR ####
    plot_ly(
        x = ~year, 
        type="scatter", mode = 'lines+markers', 
        y = ~u5mr_projection, name="U5MR, projection", 
        marker= list(color = uclightblue, size=1), 
        line= list(color = uclightblue))%>%
    add_lines(
        y = ~u5mr, name="U5MR",  
        marker= list(color = ucmediumblue, size=5), 
        line= list(color = ucmediumblue))%>%
    add_trace(
        y = ~target_u5mr, name="U5MR SDG target by 2030",  
        marker= list(color = ucmediumblue, size=8), 
        line= list(color = ucmediumblue))%>%
        
    ##### NMR ####    
    add_lines(
        y = ~nmr_projection, name="NMR, projection", 
        marker= list(color = uclightgreen, size=1), 
        line= list(color = uclightgreen))%>%
    add_lines(
        y = ~nmr, name="NMR", 
        marker= list(color = ucdarkgreen, size=5), 
        line= list(color = ucdarkgreen))%>%
    add_trace(
        y = ~target_nmr, name="NMR SDG target by 2030",  
        marker= list(color = ucdarkgreen, size=8), 
        line= list(color = ucdarkgreen))%>%        

    ##### Footnote ####        
    add_annotations(
        x= 0, y= -0.1, xref = "paper", yref = "paper", xanchor = 'left',
        text = "(Source: IGME)",
        font = list(size = 10),
        showarrow = F)%>%
    
    ##### layout ####        
    layout(
        margin = marginlist, #DEFINED ABOVE OPTIONS
        title = paste0("Trend of U5MR and NMR"), 
        font = list(size = 10), 
        legend = list(x = 0.6, y = 0.95),
        yaxis = list(title = "Per 1000 live births",   
                     rangemode = 'tozero', 
                     showgrid = FALSE, 
                     showticklabels = TRUE),
        xaxis = list(title = " ", 
                     showgrid = FALSE  
        )
    )

Step 10

Print the figure and, as needed, save it as a png file.

Exporting png files takes substantially longer time. Currently, no images are exported. But if needed, simply replace all “# export(fig,” with “export(fig,” and run the code.

fig
# export(fig, file=paste0(outfigdir, countryname, "/", imagename, "_", countryname, ".png"))

G. Summary steps to reproduce slides

To reproduce outputs

  1. Download the main folder to your computer.
  2. In all markdown files in 3_MCHN_PR_code folder:
  • Edit the main directory. Look for a code chunk “edit_directory”
  • Edit/select the country name - either Ghana or Kenya. Look for a code chunk “edit_country”
  1. Knit/run each of PCMD_portfolio_review_PPT_[number].Rmd. Tip: change the output style to HTML, which is way faster to run.
  2. Check 3_MCHN_PR_code. You should see five PPT files.
  3. Knit/run PCMD_portfolio_review_X_file_organization.Rmd. Make sure the country is name is same with one in the PCMD_portfolio_review_PPT_[number].Rmd.
  4. Check the folder, 4_MCHN_PR_ouput/2_slides. You should see five PPT files - with the country name and date. There should be no more PPT in 3_MCHN_PR_code.

To revise the dataset used for data viz

Annex provides detailed detailed information on source data and data management for each of them. But, to create certain additional figures, you may need to edit the dataset itself (i.e., dta in Section F, Step 4, or csv files in 2_MCHN_PR_data/3_data_for_viz folder).

  1. Go to 2_MCHN_PR_data/2_code_to_create_data_for_viz and open the corresponding markdown. For example, if you have to revise WHO HIDR data, open PCMD_prep_WHO_HIDR.Rmd.
  2. Look for a code chunk “edit_directory” and edit the main directory.
  3. Revise the markdown as needed and run.

Tips for trouble shooting or revision

The following explains potential sources of error when producing the code for additional countries.

  • A section that may produce errors most likely is creating automated text (see Step 6 in Section F). The general idea is to filter only one row/observation and column (i.e., one cell) of dtafig. But, it may cause an error, if there are multiple observations (for example, if there are two regions with an identical value that is the highest or lowest among all subnational-estimates). Of note, the error will appear in Step 7, but Step 6 needs to be revised.
  • It is also possible that specific data do not exist for a country - e.g., U5MR sub-national estimates for Mozambique. Unless there are alternative sources, suppress code chunks for the particular slides.
  • Also in heatmaps, a same region may appear multiple times under different names, since we currently use WHO HIDR. This requires country-by-country solutions. See Section A.4.1.

Annex

A.1. Data sources

Data come from various sources. They are organized and saved in 7 subfolders in 1_data_source. Of note, DHS API data (See A.1.6) are called/imported directly via API, and, thus, do not have a separate subfolder. This section briefly describes all data sources that are used or considered for the review.

Importantly, there is Data_source_note.xlsx. It summarizes key information in data management, for example:

  • coverage indicator names used in analysis/visualization that are harmonized across different sources, which allows switching sources if desired, and
  • availability of sub-national data.

A.1.1. Child mortality estimates data files downloaded from UN IGME site:

https://childmortality.org/all-cause-mortality/data
https://childmortality.org/causes-of-death/data

  • UN_IGME_2023.csv

  • UNICEF-CME_CAUSE_OF_DEATH.csv

  • CME-Info_codebook_for_downloads.xlsx

  • UN-IGME-2023-Subnational-U5MR-and-NMR-estimates.xlsx

  • UNIGME-2023-Scenario-Based-Projections-2023-2030.xlsx

  • The current version is the latest one available as of (and was downloaded) on 2024-07-11.

A.1.2. Maternal mortality estimates files downloaded from UN MMEIG site:

https://www.who.int/publications/i/item/9789240068759

  • estimates_arr.csv

  • estimates.csv

  • The current version is the latest one available as of (and was downloaded) on 2024-07-11.

A.1.3. UNICEF

https://data.unicef.org/resources/data_explorer/unicef_f/

NOTE: For our purposes, this dataset do not add value much, since it is only national level, and DHS and WHO HIDR cover most (if not all) in the UNICEF data. We do not use this for visualization.

A.1.4. WHO Health Inequality Data Repository (HIDR)

https://www.who.int/data/inequality-monitor/data/hidr-api

WHO HIDR has a number of “datasets” (see hidr_datasetid.csv), and only select datasets were accessed via API (see “PCMD_prep_WHO_HIDR.Rmd” in 2_MCHN_PR_data/2_code_to_create_data_for_viz). Then, each dataset was saved in its own subfolder.

  • The current version is the latest one available as of (and was downloaded) on 2024-09-02.

A.1.5. Population and births estimates and U5MR projection file downloaded from World Population Prospects by UN Pop Division

https://population.un.org/wpp/Download/Standard/MostUsed/

  • WPP2024_GEN_F01_DEMOGRAPHIC_INDICATORS_COMPACT.xlsx

  • The current version is the latest one available as of (and was downloaded) on 2024-09-02.

A.1.6. Service coverage and select quality of care data from DHS API

https://api.dhsprogram.com/
https://api.dhsprogram.com/rest/dhs/indicators?f=html

All DHS data are accessed via API. There is no “raw” source data saved in the 1_data_source folder.

  • Last accessed on 2024-09-18.

A.1.7. Country code from ISO

Using standard country code makes it much easier to integrate/merge data across sources.
### A.1.8. Shapefiles

Shapefiles from:

  • GADM: subnational boundaries (level 1 and, if available, level 2) for the MCH priority countries [accessed on August 9, 2024].
  • ArcGIS: national boundaries for the world [accessed on August 7, 2024].
  • DHS: national boundaries and subnational boundaries that correspond with estimates calculated in select surveys in the MCH priority countries. Estimates of select common indicators are also included in the dataset [accessed on August 9, 2024].

NOTE: Subnational unit and names are not necessarily harmonized across various data sources (shapefiles as well as MCH data) - see Annex 4 for examples. The current output does not include any maps, as it would require intensive LOE to create standard flow/code.

A.2. Description of markdown files to produce CSV datasets for vizualization:

In the subfolder 2_code_to_create_data_for_viz, there are 13 Rmd files. All markdown files are annotated and structured similarly (see corresponding HTML files). All resulting data files are saved in 2_MCHN_PR_data/3_data_for_viz/ and used for visualization/PPT output. The following is the list:
1. PCMD_prep_A_UNSD_region.Rmd
2. PCMD_prep_DHS_coverage.Rmd
3. PCMD_prep_DHS_denominator_region.Rmd
4. PCMD_prep_DHS_mortality_region.Rmd
5. PCMD_prep_IGME_cause.Rmd
6. PCMD_prep_IGME_region.Rmd
7. PCMD_prep_IGME_wealth.Rmd
8. PCMD_prep_IGME.Rmd
9. PCMD_prep_UNICEF_coverage.Rmd
10. PCMD_prep_WHO_HIDR.Rmd
11. PCMD_prep_WPP.Rmd
12. PCMD_prep_Z_shapefiles.Rmd
13. subnational_data_investigation.Rmd This is an important file that takes care of different subnational unit names between DHS and IGME, which is required to create the number of deaths at the subnational-level, since IGME data do not include the number by region. This must be run after DHS and IGME files are ready. See details here: https://rpubs.com/YJ_Choi/PCMD_PR_subnational_data_investigation. Also see Annex A.4.

A.3. Description CSV datasets for vizualization:

In the subfolder 3_data_for_viz, there are 17 CSV files. The following is the list of csv files and the corresponding Rmd file (in 2_code_to_create_data_for_viz).

Of note,
* Some Rmd files create multiple CSV files.
* Wide format indicates that individual indicators are columns/variables. A majority of files are in a wide format.
* Long format indicates that there is a columns/variable called “indicator”.
* Some files have only MCH priority countries because of the file/data size - thus, either 24 or 25 countries, depending on the source.

  1. dta_dhs_coverage_region.csv created by PCMD_prep_DHS_coverage.Rmd
    – 2446 rows/observations from 24 countries
    – Wide-format
    – Unique ID: combination of country, year and grouplabel

  2. dta_dhs_coverage_wealth.csv created by PCMD_prep_DHS_coverage.Rmd
    – 162 rows/observations from 24 countries
    – Wide-format
    – Unique ID: combination of country and year

  3. dta_dhs_coverage.csv created by PCMD_prep_DHS_coverage.Rmd
    – 162 rows/observations from 24 countries
    – Wide-format
    – Unique ID: combination of country and year

  4. dta_dhs_denominator_by_region.csv created by PCMD_prep_DHS_denominator_region.Rmd
    – 4599 rows/observations from 84 countries
    – Wide-format
    – Unique ID: combination of country, year, type and grouplabel

  5. dta_dhs_mortality_by_region.csv created by PCMD_prep_DHS_mortality_region.Rmd
    – 4320 rows/observations from 87 countries
    – Wide-format
    – Unique ID: combination of country, year and grouplabel

  6. dta_igme_mort_all_cause_by_wealth.csv created by PCMD_prep_IGME_wealth.Rmd
    – 7755 rows/observations from 235 countries
    – Wide-format
    – Unique ID: combination of country and year

  7. dta_igme_mort_all_cause_region.csv created by two markdown files: PCMD_prep_IGME_region.Rmd and subnational_data_investigation.Rmd
    – 47520 rows/observations from 31 countries
    – Wide-format
    – Unique ID: combination of country, year, admin_level and area_name

  8. dta_igme_mort_all_cause.csv created by PCMD_prep_IGME.Rmd
    – 8135 rows/observations from 199 countries
    – Wide-format
    – Unique ID: combination of country and year

  9. dta_igme_mort_by_cause.csv created by PCMD_prep_IGME_cause.Rmd
    – 69168 rows/observations from 25 countries
    Long-format
    – Unique ID: combination of country, year, age_group, indicator and cause

  10. dta_unicef_coverage.csv created by PCMD_prep_UNICEF_coverage.Rmd
    – 26293 rows/observations from 196 countries
    Long-format
    – Unique ID: combination of country, year and indicator

  11. dta_unsd_region.csv created by PCMD_prep_A_UNSD_region.Rmd
    – 248 rows/observations from 248 countries
    – Wide-format
    – Unique ID: combination of country

  12. dta_who_hidr_dtst_dptdrop_quintile.csv created by PCMD_prep_WHO_HIDR.Rmd
    – 797 rows/observations from 25 countries
    – Wide-format
    – Unique ID: combination of country, year and grouplabel

  13. dta_who_hidr_long_detail.csv created by PCMD_prep_WHO_HIDR.Rmd
    – 135686 rows/observations from 25 countries
    Long-format
    – Unique ID: combination of dataset_id, country, year, indicator, grouplabel and source

  14. dta_who_hidr_region.csv created by PCMD_prep_WHO_HIDR.Rmd
    – 14890 rows/observations from 25 countries
    – Wide-format
    – Unique ID: combination of country, year and grouplabel

  15. dta_who_hidr_residence.csv created by PCMD_prep_WHO_HIDR.Rmd
    – 1092 rows/observations from 25 countries
    – Wide-format
    – Unique ID: combination of country, year and grouplabel

  16. dta_who_hidr_wealth.csv created by PCMD_prep_WHO_HIDR.Rmd
    – 1672 rows/observations from 25 countries
    – Wide-format
    – Unique ID: combination of country, year and grouplabel

  17. dta_world_population_prospects.csv created by PCMD_prep_WPP.Rmd – 9717 rows/observations from 237 countries
    – Wide-format
    – Unique ID: combination of country and year

A.4. How to work with un-harmonized subnational unit names

There is no standardized code for subnational units - like ISO code for countries or FIPS code in the US counties.

A.4.1. Within a source: WHO HIDR

WHO HIDR is comprised of a number of “datasets” (see hidr_datasetid.csv in 2_MCHN_PR_data/1_data_source/WHO_HIDR), but the datasets are not internally harmonized. For example, one region in Ghana is called Eastern, East, or Eastern region - depending on the indicators/datasets. This is a problem when assessing sub-national variation across indicators.

Unfortunately, a solution is to harmonize the region names on a county-by-country basis. See Section 1.3.A of PCMD_prep_WHO_HIDR.Rmd/html in 2_MCHN_PR_data/2_code_to_create_data_for_viz folder.

A.4.2. Across sources: IGME and DHS

For certain metrics, we have to use data from different sources - e.g., the number of deaths at the subnational level. This requires checking and ensuring that a same name is used for one region.

Again, unfortunately, a solution is to harmonize the subnational unit names on a county-by-country basis, which is done in subnational_data_investigation.Rmd (in 2_MCHN_PR_data/2_code_to_create_data_for_viz folder). See details here: https://rpubs.com/YJ_Choi/PCMD_PR_subnational_data_investigation.