(Last updated on 2024-09-30)
The objective of this file is to describe how various folders and files work to create data visualization outputs for the portfolio reviews. All outputs are fully reproducible, and automated as much as possible.
The main folder, Analysis_MCHN_PR
, has 4 first-level
subfolders. This markdown file, PCMD_portfolio_review_data_analysis_viz_note.Rmd
(in blue circle above), is located in
the main folder. The subfolders are:
1_MCHN_PR_guide
It has an excel file, MCHN Portfolio Review Indicators & Use Cases.xlsx, the Google spreadsheet that Vanessa has developed (and now transferred to/owned by Sue, as of September 26, 2024) (https://docs.google.com/spreadsheets/d/1h4PvflAa4xFhkKU03PmBMDFGvHAWmgv0xI2z07PW9nQ/edit?gid=68868151#gid=68868151). It is downloaded to a local machine, because remotely accessing Google sheet can be error prone with the security setting. Data visualization is guided by and aligned with the spreadsheet, specifically “Draft 2-DEVELOP use cases” tab. In addition, two specific columns are used in this markdown to create the title and text of each “slide”.
2_MCHN_PR_data
1_data_source
: It has source data that
cannot be accessed via API (or so) and/or data that are updated only
annually (thus, it is easy to download/save once a year).2_code_to_create_data_for_viz
: It has
markdown files that (1) access source data (see appendix for more note
on data sources) and/or (2) create analysis data that are ready for
visualization. All markdown files in this subfolder can be run by
this markdown file (see below “C. Running prep markdown
files”) - i.e., no need to run individual files separately.3_data_for_viz
: It has analysis data
files created by various markdown files in
2_code_to_create_data_for_viz
. Only these
data files are used to create output slides.4_references
: Finally, this folder has
select references about quality of care, the use case 5. Since there are
no/few publicly available or standardized data, we do not have much data
visualization like in use cases 1-4. Therefore, for the use case 5, we
include research study findings - which may or may not be the country of
review - and available tools to measure the metrics. Screenshots of
various studies and tools are saved in this subfolder.3_MCHN_PR_code
This is a folder where code files (R markdown files) are saved (in red circle above):
NOTE: You can/will adapt these markdown files to recreate the outputs and scale up to more countries. See Section B.
4_MCHN_PR_output
1_figures
: It is designed to have
figure image files that are included in the slides. Of note, it takes
substantially longer time to export image files (see Section F, Step
10).2_slides
: Slidesets are saved here by
country.You can download the main folder to your computer, and reproduce all
files in:
* 2_MCHN_PR_data/3_data_for_viz/
,
* 4_MCHN_PR_ouput/1_figures/
,
* 4_MCHN_PR_ouput/2_slides/
.
BUT, YOU MUST MAKE THE FOLLOWING
EDITS in all markdown files - especially those in
3_MCHN_PR_code
folder.
EDIT 1: REPLACE DIRECTORY FOR THE MAIN FOLDER PATHS
##### MAIN FOLDER IN YOUR COMPUTER #####
maindir<-c("~/Dropbox/0iSquared/iSquared_GHTAMS/GHTAMS_MCH/Analysis_MCHN_PR/")
# DO NOT EDIT ANYTHING HERE - AS LONG AS THE SUBFOLDER STRUCTURE IS SAME
sourcedatadir<-c(paste0(maindir,"2_MCHN_PR_data/1_data_source/"))
outdatadir<-c(paste0(maindir, "2_MCHN_PR_data/3_data_for_viz/"))
outfigdir<-c(paste0(maindir,"4_MCHN_PR_ouput/1_figures/"))
outslidedir<-c(paste0(maindir,"4_MCHN_PR_ouput/2_slides/"))
EDIT 2: DEFINE THE COUNTRY
##### THE REVIEW COUNTRY #####
countryname<-c("Ghana")
# countryname<-c("Kenya")
##### NAME OF THE LEVEL-1 ADMINISTRATIVE UNIT IN LOWER CASE #####
admin1unit<-c("region") #Ghana
admin1unitpl<-c("regions") #Ghana
# admin1unit<-c("county") #Kenya
# admin1unitpl<-c("counties") #Kenya
The following code chunk runs each of markdown files in
2_MCHN_PR_data/2_code_to_create_data_for_viz/
to create
data that will be used for visualization. All markdown files are
annotated and structured similarly (see corresponding HTML files). All
resulting data files are saved in
2_MCHN_PR_data/3_data_for_viz/
. NOTE: Markdown files will be run
in alphabetical order, and certain files are named with specific prefix
to ensure they run first or last among all. Do not change markdown file
names in the folder.
# NOTE: RUN THIS CHUNK PERIODICALLY TO REFRESH "3_data_for_viz" FOLDER
##### 1. Run "prep" markdown files #####
rmd_files<-list.files(path = "./2_MCHN_PR_data/2_code_to_create_data_for_viz",
pattern="*.Rmd")
rmd_files
# Render/run all RMD files
for (i in 1:length(rmd_files)) {
render(paste0("./2_MCHN_PR_data/2_code_to_create_data_for_viz/", rmd_files[i]))
}
##### 2. Check all CSV datasets and date/time of creation #####
data_files<-list.files(path = "./2_MCHN_PR_data/3_data_for_viz",
pattern="*.csv")
data_files
# Check ctime
for (i in 1:length(data_files)) {
print(file.info(paste0("./2_MCHN_PR_data/3_data_for_viz/", data_files[i]))$ctime)
}
It is suppressed for now, but run
this chunk periodically to refresh analysis data in
Analysis/3_data_for_viz
. Currently, it uses
data that were last created/updated on 2024-09-06
20:06:36.
(NOTE: working on duplicate code chunk name issues, which may or may not be worth fixing…)
Define USAID colors and more colors as well as commonly used functions for figures. Apply them for each figure, as needed.
Each markdown in
3_MCHN_PR_code
folder has this section. No edit is
needed.
USAID colors and their variations
#<color-palette name="USAID Colors" type="regular">
ucblue<-c("#002F6C") #<!-- USAID Blue -->
ucmediumblue<-c("#0067B9") #<!-- Medium Blue -->
uclightblue<-c("#A7C6ED") #<!-- Light Blue -->
ucred<-c("#BA0C2F") #<!-- USAID Red -->
ucdarkred<-c("#651D32") #<!-- Dark Red -->
uclightred<-c("#9D6E7C") # Two tints down of ucdarkred
ucdarkgreen<-c("#1D6532") # GRB variation of USAID Dark Red
# ucdarkgreen<-c("#32651D") # BRG variation of USAID Dark Red
uclightgreen<-c("#8BB196") # Two tints down of ucdarkgreen https://www.htmlcsscolor.com/hex/1D6532
ucrichblack<-c("#212721") #<!-- Rich Black -->
ucdarkgray<-c("#6C6463") #<!-- Dark Gray -->
ucmediumgray<-c("#8C8985") #<!-- Medium Gray -->
uclightgray<-c("#CFCDC9") #<!-- Light Gray -->
More colors from https://colorbrewer2.org
##### 1. Define color list #####
greycolors <- brewer.pal(7,"Greys")
bluecolors <- brewer.pal(7,"Blues")
greencolors <- brewer.pal(7,"Greens")
orangecolors <- brewer.pal(7,"Oranges")
redcolors <- brewer.pal(7,"Reds")
purplecolors <- brewer.pal(7,"Purples")
divcolors<-brewer.pal(9,"RdYlBu")
qualcolors<-brewer.pal(9,"Paired")
basic options/functions for figures
##### 2. Define functions for plot #####
hline <- function(y = 0, color = uclightgray) {
list(
type = "line",
x0 = 0,
x1 = 1,
xref = "paper",
y0 = y,
y1 = y,
line = list(color = color)
)
}
vline <- function(x = 0, color = uclightgray) {
list(
type = "line",
x0 = x,
x1 = x,
y0 = 0,
y1 = 1,
yref = "paper",
line = list(color = color)
)
}
##### 3. Define options for plot #####
marginlist <- list(l = 10, r = 10, b = 100, t = 100, pad = 0)
In order to automate/standardize text to the extent possible, we are
using specific columns in the the Google spreadsheet, MCHN
Portfolio Review Indicators & Use Cases.xlsx in
1_MCHN_PR_guide
.
NOTE: If the googlesheet is public (i.e., “Anyone on the internet with the link can view”), it can be imported directly to R without going through local computer. Currently, the access is limited, and, thus, we have to download the googlesheet to this local folder.
Each markdown in
3_MCHN_PR_code
folder has this section. No edit is
needed.
dtatext<-read_excel(paste0(maindir,
"1_MCHN_PR_guide/MCHN Portfolio Review Indicators & Use Cases.xlsx"),
sheet = "Use Case Template")
colnames(dtatext)<-tolower(colnames(dtatext))
dtatext<-dtatext%>%clean_names()
colnames(dtatext)
dtatext<-dtatext%>%
# define columns that will be used for slides
rename(
title = data_search_questions,
text = standard_text
)%>%
# select the country-specific context interpretation
mutate(
country = countryname,
context = "To be drafted",
context = ifelse(country=="Ghana", ghana_context, context),
context = ifelse(country=="Kenya", kenya_context, context)
)%>%
# replace [COUNTRY} with the country name, assigned above
mutate(
title = gsub("COUNTRYNAME", countryname, title),
text = gsub("COUNTRYNAME", countryname, text)
)%>%
select(title, text, context, contains(c("number", "image_name", "context")))
Now, it is time to produce each slide! Again, markdown files in
3_MCHN_PR_code
create slides.
For all figures, the basic idea is same - presented step-by-step
below, using an example from
PCMD_portfolio_review_PPT_1.Rmd
. Output
text is shown in yellow boxes.
Find the image_name from dtatext
. This
is wrangled guide excel - again, i.e., MCHN Portfolio Review
Indicators & Use Cases.xlsx, in
1_MCHN_PR_guide
folder. Filter only the image data/row,
which includes title, standard text, etc. It is now called
temp
.
NOTE: image name is case
sensitive. So, to reduce errors, do not mix upper and lower case for the
name. Currently, only lower case is used.
# IMAGE NAME IN THE SPREADSHEET
imagename<-c("cm_nmr_u5mr_trend_projection")
# FILTER
temptext<-dtatext%>%filter(image_name==imagename)
Print the title for the image, temptext$title
What is the gap to the SDG target for U5MR and NMR?
Print any standard introduction text for the image,
temptext$text
. Not every slide has this text.
[STANDARD TEXT]
SDG aims to reduce U5MR to 25 and NMR to 12 per 1000 by 2030.
Import the appropriate data for the image from
2_MCHN_PR_data/1_data_source/
and filter only the country’s
data. It is now called dta
.
NOTE: dta
is
valid/used through out a markdown file - as long as only it uses only
one source data. Compare it with dtafig
below.
Also see Annex for detailed information on source data and data management for each of them
# IMPORT THE SOURCE DATA
dta<-read.csv(paste0(outdatadir, 'dta_igme_mort_all_cause.csv'), header = TRUE)
# FILTER
dta<-dta%>%
filter(country %in% countryname)
# DEFINE ANY VALUES THAT MAY BE USED FOR PROGRAMMING
igme_lastest_year<-dta%>%filter(is.na(u5mr)==FALSE)%>%select(year)%>%max()
As needed, further manage dta
to create a figure. In
this example, we add SDG targets for child mortality. It is now called
dtafig
. (Note that in some figures it would be simply same
with dta
.)
NOTE: Unlike
dta
, dtafig
is specific to a particular
figure. It comes from dta
but processed/prepared
specifically (often differently) for each figure and used in Steps 6 and
9.
dtafig<-dta
dtafig<-dtafig%>%
group_by(country)%>%
group_modify(~ add_row(.x, .before=0))%>%
ungroup()%>%
mutate(
# SDG target by 2030
year=ifelse(is.na(year)==TRUE, 2030, year),
target_u5mr=NA,
target_u5mr=ifelse(year==2030, 25, target_u5mr),
target_nmr=NA,
target_nmr=ifelse(year==2030, 12, target_nmr)
)
Define values that will be used in automated interpretation text.
u5mr_latest<-round(dtafig%>%
filter(year==igme_lastest_year)%>%
select(u5mr), 0)
nmr_latest<-round(dtafig%>%
filter(year==igme_lastest_year)%>%
select(nmr), 0)
u5mr_2030<-round(dtafig%>%
filter(year==2030)%>%
filter(is.na(u5mr_projection)==FALSE)%>%
select(u5mr_projection), 0)
nmr_2030<-round(dtafig%>%
filter(year==2030)%>%
filter(is.na(nmr_projection)==FALSE)%>%
select(nmr_projection), 0)
Print the automated text.
[AUTOMATED TEXT]
In 2022, U5MR was 42 and NMR was 21 per 1000 live births. In
2030, it is projected that U5MR will be 31 and NMR will be 17 per 1000
live births
Add potential text to discuss the country specific findings. Note that this must be reviewed and, as needed, manually edited once the PPT is created in each country. The current text is based on a sample country, which may or may not be relevant for other countries for the Portfolio Review.
[REVIEW/EDIT COUNTRY SPECIFIC FINDINGS]
Ghana is off-track to meet both the U5MR and NMR SDG targets by 2030
Write a code for a figure.
fig <- dtafig%>%
##### U5MR ####
plot_ly(
x = ~year,
type="scatter", mode = 'lines+markers',
y = ~u5mr_projection, name="U5MR, projection",
marker= list(color = uclightblue, size=1),
line= list(color = uclightblue))%>%
add_lines(
y = ~u5mr, name="U5MR",
marker= list(color = ucmediumblue, size=5),
line= list(color = ucmediumblue))%>%
add_trace(
y = ~target_u5mr, name="U5MR SDG target by 2030",
marker= list(color = ucmediumblue, size=8),
line= list(color = ucmediumblue))%>%
##### NMR ####
add_lines(
y = ~nmr_projection, name="NMR, projection",
marker= list(color = uclightgreen, size=1),
line= list(color = uclightgreen))%>%
add_lines(
y = ~nmr, name="NMR",
marker= list(color = ucdarkgreen, size=5),
line= list(color = ucdarkgreen))%>%
add_trace(
y = ~target_nmr, name="NMR SDG target by 2030",
marker= list(color = ucdarkgreen, size=8),
line= list(color = ucdarkgreen))%>%
##### Footnote ####
add_annotations(
x= 0, y= -0.1, xref = "paper", yref = "paper", xanchor = 'left',
text = "(Source: IGME)",
font = list(size = 10),
showarrow = F)%>%
##### layout ####
layout(
margin = marginlist, #DEFINED ABOVE OPTIONS
title = paste0("Trend of U5MR and NMR"),
font = list(size = 10),
legend = list(x = 0.6, y = 0.95),
yaxis = list(title = "Per 1000 live births",
rangemode = 'tozero',
showgrid = FALSE,
showticklabels = TRUE),
xaxis = list(title = " ",
showgrid = FALSE
)
)
Print the figure and, as needed, save it as a png file.
Exporting png files takes substantially longer time. Currently, no images are exported. But if needed, simply replace all “# export(fig,” with “export(fig,” and run the code.
fig
# export(fig, file=paste0(outfigdir, countryname, "/", imagename, "_", countryname, ".png"))
3_MCHN_PR_code
folder:3_MCHN_PR_code
. You should see five PPT
files.4_MCHN_PR_ouput/2_slides
. You should
see five PPT files - with the country name and date. There should be no
more PPT in 3_MCHN_PR_code
.Annex provides detailed detailed information on source data and data
management for each of them. But, to create certain additional figures,
you may need to edit the dataset itself (i.e., dta
in
Section F, Step 4, or csv files in
2_MCHN_PR_data/3_data_for_viz
folder).
2_MCHN_PR_data/2_code_to_create_data_for_viz
and
open the corresponding markdown. For example, if you have to revise
WHO HIDR data, open
PCMD_prep_WHO_HIDR.Rmd.The following explains potential sources of error when producing the code for additional countries.
dtafig
. But,
it may cause an error, if there are multiple
observations (for example, if there are two regions with an
identical value that is the highest or lowest among all
subnational-estimates). Of note, the error will appear in Step 7, but
Step 6 needs to be revised.Data come from various sources. They are organized and saved in 7
subfolders in 1_data_source
. Of note, DHS
API data (See A.1.6) are called/imported directly via API, and, thus, do
not have a separate subfolder. This section briefly describes all data
sources that are used or considered for the review.
Importantly, there is Data_source_note.xlsx. It summarizes key information in data management, for example:
https://childmortality.org/all-cause-mortality/data
https://childmortality.org/causes-of-death/data
UN_IGME_2023.csv
UNICEF-CME_CAUSE_OF_DEATH.csv
CME-Info_codebook_for_downloads.xlsx
UN-IGME-2023-Subnational-U5MR-and-NMR-estimates.xlsx
UNIGME-2023-Scenario-Based-Projections-2023-2030.xlsx
The current version is the latest one available as of (and was downloaded) on 2024-07-11.
https://www.who.int/publications/i/item/9789240068759
estimates_arr.csv
estimates.csv
The current version is the latest one available as of (and was downloaded) on 2024-07-11.
https://data.unicef.org/resources/data_explorer/unicef_f/
fusion_GLOBAL_DATAFLOW_UNICEF_1.0_all.csv
The current version is the latest one available as of (and was downloaded) on 2024-08-15. However, it can be also accessed via SDMX API. https://data.unicef.org/sdmx-api-documentation/
NOTE: For our purposes, this dataset do not add value much, since it is only national level, and DHS and WHO HIDR cover most (if not all) in the UNICEF data. We do not use this for visualization.
https://www.who.int/data/inequality-monitor/data/hidr-api
WHO HIDR has a number of “datasets” (see
hidr_datasetid.csv), and only select datasets were
accessed via API (see “PCMD_prep_WHO_HIDR.Rmd” in
2_MCHN_PR_data/2_code_to_create_data_for_viz
). Then, each
dataset was saved in its own subfolder.
https://population.un.org/wpp/Download/Standard/MostUsed/
WPP2024_GEN_F01_DEMOGRAPHIC_INDICATORS_COMPACT.xlsx
The current version is the latest one available as of (and was downloaded) on 2024-09-02.
https://api.dhsprogram.com/
https://api.dhsprogram.com/rest/dhs/indicators?f=html
All DHS data are accessed via API. There is no “raw” source data
saved in the 1_data_source
folder.
Using standard country code makes it much easier to integrate/merge
data across sources.
### A.1.8. Shapefiles
Shapefiles from:
NOTE: Subnational unit and names are not necessarily harmonized across various data sources (shapefiles as well as MCH data) - see Annex 4 for examples. The current output does not include any maps, as it would require intensive LOE to create standard flow/code.
In the subfolder
2_code_to_create_data_for_viz
, there are
13 Rmd files. All markdown files are annotated and structured similarly
(see corresponding HTML files). All resulting data files are saved in
2_MCHN_PR_data/3_data_for_viz/
and used for
visualization/PPT output. The following is the list:
1. PCMD_prep_A_UNSD_region.Rmd
2. PCMD_prep_DHS_coverage.Rmd
3. PCMD_prep_DHS_denominator_region.Rmd
4. PCMD_prep_DHS_mortality_region.Rmd
5. PCMD_prep_IGME_cause.Rmd
6. PCMD_prep_IGME_region.Rmd
7. PCMD_prep_IGME_wealth.Rmd
8. PCMD_prep_IGME.Rmd
9. PCMD_prep_UNICEF_coverage.Rmd
10. PCMD_prep_WHO_HIDR.Rmd
11. PCMD_prep_WPP.Rmd
12. PCMD_prep_Z_shapefiles.Rmd
13. subnational_data_investigation.Rmd
This is an important file that takes care of
different subnational unit names between DHS and IGME, which is required
to create the number of deaths at the subnational-level, since
IGME data do not include the number by region. This must be run
after DHS and IGME files are ready. See details here: https://rpubs.com/YJ_Choi/PCMD_PR_subnational_data_investigation.
Also see Annex A.4.
In the subfolder 3_data_for_viz
, there
are 17 CSV files. The following is the list of csv files and the
corresponding Rmd file (in
2_code_to_create_data_for_viz
).
Of note,
* Some Rmd files create multiple CSV files.
* Wide format indicates that individual indicators are
columns/variables. A majority of files are in a wide format.
* Long format indicates that there is a columns/variable called
“indicator”.
* Some files have only MCH priority countries because of the file/data
size - thus, either 24 or 25 countries, depending on the source.
dta_dhs_coverage_region.csv
created
by PCMD_prep_DHS_coverage.Rmd
– 2446 rows/observations from 24 countries
– Wide-format
– Unique ID: combination of country, year and grouplabel
dta_dhs_coverage_wealth.csv
created
by PCMD_prep_DHS_coverage.Rmd
– 162 rows/observations from 24 countries
– Wide-format
– Unique ID: combination of country and year
dta_dhs_coverage.csv
created by
PCMD_prep_DHS_coverage.Rmd
– 162 rows/observations from 24 countries
– Wide-format
– Unique ID: combination of country and year
dta_dhs_denominator_by_region.csv
created by PCMD_prep_DHS_denominator_region.Rmd
– 4599 rows/observations from 84 countries
– Wide-format
– Unique ID: combination of country, year, type and
grouplabel
dta_dhs_mortality_by_region.csv
created by PCMD_prep_DHS_mortality_region.Rmd
– 4320 rows/observations from 87 countries
– Wide-format
– Unique ID: combination of country, year and grouplabel
dta_igme_mort_all_cause_by_wealth.csv
created by PCMD_prep_IGME_wealth.Rmd
– 7755 rows/observations from 235 countries
– Wide-format
– Unique ID: combination of country and year
dta_igme_mort_all_cause_region.csv
created by two markdown files:
PCMD_prep_IGME_region.Rmd
and
subnational_data_investigation.Rmd
– 47520 rows/observations from 31 countries
– Wide-format
– Unique ID: combination of country, year, admin_level and
area_name
dta_igme_mort_all_cause.csv
created
by PCMD_prep_IGME.Rmd
– 8135 rows/observations from 199 countries
– Wide-format
– Unique ID: combination of country and year
dta_igme_mort_by_cause.csv
created
by PCMD_prep_IGME_cause.Rmd
– 69168 rows/observations from 25 countries
– Long-format
– Unique ID: combination of country, year, age_group,
indicator and cause
dta_unicef_coverage.csv
created by
PCMD_prep_UNICEF_coverage.Rmd
– 26293 rows/observations from 196 countries
– Long-format
– Unique ID: combination of country, year and
indicator
dta_unsd_region.csv
created by
PCMD_prep_A_UNSD_region.Rmd
– 248 rows/observations from 248 countries
– Wide-format
– Unique ID: combination of country
dta_who_hidr_dtst_dptdrop_quintile.csv
created by PCMD_prep_WHO_HIDR.Rmd
– 797 rows/observations from 25 countries
– Wide-format
– Unique ID: combination of country, year and grouplabel
dta_who_hidr_long_detail.csv
created by PCMD_prep_WHO_HIDR.Rmd
– 135686 rows/observations from 25 countries
– Long-format
– Unique ID: combination of dataset_id, country, year,
indicator, grouplabel and source
dta_who_hidr_region.csv
created by
PCMD_prep_WHO_HIDR.Rmd
– 14890 rows/observations from 25 countries
– Wide-format
– Unique ID: combination of country, year and grouplabel
dta_who_hidr_residence.csv
created
by PCMD_prep_WHO_HIDR.Rmd
– 1092 rows/observations from 25 countries
– Wide-format
– Unique ID: combination of country, year and grouplabel
dta_who_hidr_wealth.csv
created by
PCMD_prep_WHO_HIDR.Rmd
– 1672 rows/observations from 25 countries
– Wide-format
– Unique ID: combination of country, year and grouplabel
dta_world_population_prospects.csv
created by PCMD_prep_WPP.Rmd
– 9717 rows/observations from
237 countries
– Wide-format
– Unique ID: combination of country and year
There is no standardized code for subnational units - like ISO code for countries or FIPS code in the US counties.
WHO HIDR is comprised of a number of “datasets” (see
hidr_datasetid.csv in
2_MCHN_PR_data/1_data_source/WHO_HIDR
), but the datasets
are not internally harmonized. For example, one region
in Ghana is called Eastern, East, or Eastern
region - depending on the indicators/datasets. This is a problem when assessing sub-national
variation across indicators.
Unfortunately, a solution is to harmonize the region names on a
county-by-country basis. See Section 1.3.A of
PCMD_prep_WHO_HIDR.Rmd/html in
2_MCHN_PR_data/2_code_to_create_data_for_viz
folder.
For certain metrics, we have to use data from different sources - e.g., the number of deaths at the subnational level. This requires checking and ensuring that a same name is used for one region.
Again, unfortunately, a solution is to harmonize the subnational unit
names on a county-by-country basis, which is done in
subnational_data_investigation.Rmd
(in
2_MCHN_PR_data/2_code_to_create_data_for_viz
folder). See
details here: https://rpubs.com/YJ_Choi/PCMD_PR_subnational_data_investigation.