• Author: YJ Choi
  • Date: 2019-06-10

Introduction

This is a markdown file to reproduce analysis and results for a published paper: Monitoring Progress in Equality for the Sustainable Development Goals: A Case Study of Meeting Demand for Family Planning (Global Health Science and Practice 2018 Jun 29;6(2):390-401). The abstract, link to the complete paper, and full publication information is available here.

There are three sections in this document for: accessing public data used for the study; preparing the analysis data file (i.e., data processing/manipulation); and conducting analysis. Analysis section contains code that is only relevant to reproduce results presented in the paper. The paper was prepared using Stata, and a Stata do file to reproduce the results is available at GitHub.

"R code is shown in a gray box"
Output is in white box. (Not all reults are shown in the output document. To see results, simply remove the argument "results=FALSE" from a code chunk)

1. Data Access

Data came from the Demographic Health Surveys (DHS) in 55 countries that have conducted at least 2 surveys since 1990. DHS data are available for the public via various formats, including indicator data via DHS API, available at (http://api.dhsprogram.com/). A total of 213 surveys were available as of September 2017 and used for the study.

1.1. Define relevant indicators for the study

First, get relevant indicator data from DHS API. More information on indicators available at DHS API is here. For this study, the following indicators were used.

#Percent of women using any contraceptive methods
FP_CUSA_W_ANY FP_CUSM_W_ANY FP_CUSU_W_ANY
#Percent of women using modern contraceptive methods
FP_CUSA_W_MOD FP_CUSM_W_MOD FP_CUSU_W_MOD
#Percent of women with unmet need
FP_NADA_W_UNT FP_NADM_W_UNT FP_NADU_W_UNT
#Education indicators (for Figure 5)
ED_EDAT_W_NED ED_EDAT_W_SPR ED_EDAT_W_CPR ED_EDAT_W_SSC         
    ED_EDAT_W_CSC ED_EDAT_W_HGH ED_EDAT_W_DKM

1.2. Call API data for each indicator

Note 1: See here for more information on calling DHS API data: http://rpubs.com/YJ_Choi/HowTo_DHSAPI_Introduction

Note 2. Though relatively rare, indicator estimates in DHS API (which shares the same base data with DHS STATcompiler) can be updated over time. For example, there was a notable update for a few family planning indicators in Mozambique DHS 2003 since the time of original analysis (September 2017). Thus, the reproduced dataset may not be identical with the original analysis dataset. However, impact of any updates should be minimal for overall study findings, interpretation, and conclusion.

suppressPackageStartupMessages(library(jsonlite)) 
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(dplyr))

#call and save individual indicator data here
#an example: FP_CUSM_W_ANY
url<-("http://api.dhsprogram.com/rest/dhs/data?f=json&indicatorIds=FP_CUSM_W_ANY&breakdown=all&perpage=1000&pagenum=1")
jsondata<-fromJSON(url) 
dta<-data.table(jsondata$Data)
names(dta)
dta<-select(dta, DHS_CountryCode, CountryName, RegionId, 
            SurveyId, SurveyYearLabel, SurveyType, SurveyYear, 
            Value, CharacteristicCategory, CharacteristicLabel)    
API_FP_CUSM_W_ANY<- dta %>% rename(FP_CUSM_W_ANY=Value)
dim(API_FP_CUSM_W_ANY)

1.3. Merge the individual indicator datasets

Then, merge all sixteen datasets into one dataset.

#merge all 16 indicator datasets here

Currently, looping code is being explored in order to complete this process more efficiently. Until it’s resolved, use a dataset created, using Stata following steps described above. See the GitHub repository for the Stata do file.

suppressPackageStartupMessages(library (haven)) 

setwd("C:/Users/YoonJoung Choi/Dropbox/0 Project/ReproResearch_Inequality_MetDemand")

dtaapi<-data.frame(read_dta("DHSAPI_inequality_metdemand.dta"))

As of April 5, 2019, the API calls generated 16921 observations - i.e., subgroup-specific estimates - from 294 surveys. Sort the data by surveyid, group, and grouplabel to explore the structure further.

2. Data Management

2.1. Tidy the dataset

Unit of current data set, dta, is a survey-subgroup specific estimate. In DHS, (most) family planning indicators (MCPR, CPR, and unmet need, in this study) are estimated for different denominators among: all women, married women, and sexually active unmarried women.

#colnames(dta)<-tolower(names(dta))
suppressPackageStartupMessages(library(dplyr)) 
suppressPackageStartupMessages(library(Hmisc))

#rename var
dta<- dtaapi %>% rename (cpr_all    =   FP_CUSA_W_ANY)
dta<- dta %>% rename (cpr_married   =   FP_CUSM_W_ANY)
dta<- dta %>% rename (cpr_unmarried =   FP_CUSU_W_ANY)
dta<- dta %>% rename (mcpr_all  =   FP_CUSA_W_MOD)
dta<- dta %>% rename (mcpr_married  =   FP_CUSM_W_MOD)
dta<- dta %>% rename (mcpr_unmarried    =   FP_CUSU_W_MOD)
dta<- dta %>% rename (unmet_all =   FP_NADA_W_UNT)
dta<- dta %>% rename (unmet_married =   FP_NADM_W_UNT)
dta<- dta %>% rename (unmet_unmarried   =   FP_NADU_W_UNT)
dta<- dta %>% rename (edu_none  =   ED_EDAT_W_NED)
dta<- dta %>% rename (edu_somepri   =   ED_EDAT_W_SPR)
dta<- dta %>% rename (edu_comppri   =   ED_EDAT_W_CPR)
dta<- dta %>% rename (edu_somesec   =   ED_EDAT_W_SSC)
dta<- dta %>% rename (edu_compsec   =   ED_EDAT_W_CSC)
dta<- dta %>% rename (edu_high  =   ED_EDAT_W_HGH)
dta<- dta %>% rename (edu_dkmiss    =   ED_EDAT_W_DKM)

#gen var
dta<-mutate(dta,
    #met demand 
    demandmet_m_all =  100* mcpr_all / (cpr_all + unmet_all),
    demandmet_m_married = 100* mcpr_married / (cpr_married + unmet_married), 
    demandmet_m_unmarried = 100* mcpr_unmarried / (cpr_unmarried + unmet_unmarried), 
    #survey characteristics
    year=as.numeric(substr(surveyid,3,6)),
    type=substr(surveyid,7,9),
    #recode to tidy data further
    group = ifelse(group == "Total 15-49", "Total", group),     
    grouplabel = ifelse(grouplabel == "Total 15-49", "Total", grouplabel)
)

label(dta$year) <- "year of survey" 
label(dta$type) <- "type of survey" 

2.2. Create regional variables

Create regional variables, following UN Statistics Division’s classification: the M49 standard (https://unstats.un.org/unsd/methodology/m49/). Here, we scrape the webpage to get the data table of countries by geographic regions (circled with green in Figure 1) and tidy up the scraped data table.

Figure 1. Target elements to scrape from the UN Statistics Division’s webpage Alt text

suppressPackageStartupMessages(library(rvest))
# data table scraping from the web
ctry_UNSD<-read_html("https://unstats.un.org/unsd/methodology/m49/") %>% 
    html_nodes("table") %>%
    .[[7]] %>% 
    html_table(header = TRUE)     
# tidy up
ctry_UNSD<-ctry_UNSD %>%
    rename (country =   "Country or Area") %>% 
    rename (M49 =   "M49 code") %>% 
    rename (ISOalpha3   =   "ISO-alpha3 code") %>% 
    select(country, M49, ISOalpha3) 
ctry_UNSD$country<-as.character(ctry_UNSD$country)
ctry_UNSD<-ctry_UNSD %>% 
    mutate(
    UNSDsubregion=country,
    UNSDsubregion=ifelse(ISOalpha3!="", "", UNSDsubregion)
    )
for (i in 1:nrow(ctry_UNSD)){
    if (ctry_UNSD[i,4]==""){
    ctry_UNSD[i,4]=ctry_UNSD[i-1,4]
    }}
# Keep only country rows and replace country names as needed for merge
ctry_UNSD<-ctry_UNSD %>% 
    filter(ISOalpha3!="") %>% 
    select(country, UNSDsubregion) %>% 
    mutate(
    country = ifelse(country == "Bolivia (Plurinational State of)", "Bolivia", country) ,
    country = ifelse(country == "Cabo Verde", "Cape Verde", country) , 
    country = ifelse(country == "Democratic Republic of the Congo", "Congo Democratic Republic", country) ,
    country = ifelse(country == "Côte d'Ivoire", "Cote d'Ivoire", country) ,
    country = ifelse(country == "Kyrgyzstan", "Kyrgyz Republic", country) , 
    country = ifelse(country == "Republic of Moldova", "Moldova", country) , 
    country = ifelse(country == "United Republic of Tanzania", "Tanzania", country) ,
    country = ifelse(country == "Viet Nam", "Vietnam", country) 
    )

label(ctry_UNSD$UNSDsubregion) <- "Sub-region, UNSD Methodology 49"

Then, merge the two data frames: ctry_UNSD & dta (cleaned DHS API data).

# inspect and confirm "country" variable
length(unique(dta$country)) #number of unique countries
length(unique(ctry_UNSD$country)) #number of unique countries

# merge UNSD country list with the anaysis data 
dim(dta)
dim(ctry_UNSD)
dta<-left_join(dta, ctry_UNSD, by = "country")
dim(dta)
length(unique(dta$surveyid)) #number of unique surveys
length(unique(dta$country)) #number of unique countries

# check all countries now have UNSDsubregion, 
# especially Cote d'Ivoire, which is still not merging correctly... why??? 
table(dta$UNSDsubregion, exclude = NULL)
    test<-filter(dta, UNSDsubregion=="Western Africa")
    table(test$country)
dta<- mutate(dta, 
    UNSDsubregion=ifelse(country=="Cote d'Ivoire", "Western Africa", UNSDsubregion) )      

2.3. Create lowest vs. highest variables in each group

# min & max: by SES order 
dta<- mutate(dta, 
    min_demandmet_m_married_edu= 
        case_when(group=="Education" & grouplabel=="No education" ~ 
               demandmet_m_married, 
               TRUE~NA_real_) ,
    max_demandmet_m_married_edu=
        case_when(group=="Education (2 groups)" & grouplabel=="Secondary or higher" ~
               demandmet_m_married, 
               TRUE~NA_real_) ,
    min_demandmet_m_married_hhw= 
        case_when(group=="Wealth quintile" & grouplabel=="Lowest" ~
               demandmet_m_married, 
               TRUE~NA_real_) ,
    max_demandmet_m_married_hhw= 
        case_when(group=="Wealth quintile" & grouplabel=="Highest" ~
               demandmet_m_married, 
               TRUE~NA_real_) ,    
    min_demandmet_m_married_rurb= 
        case_when(group=="Residence" & grouplabel=="Rural" ~
               demandmet_m_married, 
               TRUE~NA_real_) ,
    max_demandmet_m_married_rurb= 
        case_when(group=="Residence" & grouplabel=="Urban" ~
               demandmet_m_married, 
               TRUE~NA_real_) 
)

# min & max: without order (only preparation here, to be completed in the next chunk) 
dta<- mutate(dta, 
    min2_demandmet_m_married_age=
        case_when(group=="Age (5-year groups)" ~ 
               demandmet_m_married, 
               TRUE~NA_real_) ,
    max2_demandmet_m_married_age=
        case_when(group=="Age (5-year groups)" ~ 
               demandmet_m_married, 
               TRUE~NA_real_) ,
    min2_demandmet_m_married_reg=
        case_when(group=="Region" ~ 
               demandmet_m_married, 
               TRUE~NA_real_) ,
    max2_demandmet_m_married_reg=
        case_when(group=="Region" ~ 
               demandmet_m_married, 
               TRUE~NA_real_)     
)

2.4. Collapse to the survey level data and create disparity measures

# prep before collapse
dtasum<- mutate(dta, 
    demandmet_m_all=ifelse(group != "Total", NA, demandmet_m_all),  
    demandmet_m_married=ifelse(group != "Total", NA, demandmet_m_married),  
    demandmet_m_unmarried=ifelse(group != "Total", NA, demandmet_m_unmarried)   
    )
dim(dtasum)

# collapse
dtasum1<- dtasum %>% 
    select(surveyid, starts_with("demandmet_m_"), starts_with("min_"),  starts_with("max_"), starts_with("edu_")) %>%
    group_by(surveyid) %>%     
    summarize_all(funs(mean), na.rm=TRUE)
dim(dtasum1)
names(dtasum1)

dtasum2<- dtasum %>% 
    select(surveyid, starts_with("min2_")) %>%
    group_by(surveyid) %>%     
    summarize_all(funs(min), na.rm=TRUE)
dim(dtasum2)
names(dtasum2)

dtasum3<- dtasum %>% 
    select(surveyid, starts_with("max2_")) %>%
    group_by(surveyid) %>%     
    summarize_all(funs(max), na.rm=TRUE)
dim(dtasum3)
names(dtasum3)

dtasum<-left_join(dtasum1, dtasum2, by = "surveyid")   
dtasum<-left_join(dtasum, dtasum3, by = "surveyid")   
#investgate join_all and/or reduce 

dim(dtasum)
names(dtasum)

# disparity measures: absolute % point difference 
dtasum<- mutate(dtasum, 
    gap_demandmet_m_married_edu     = max_demandmet_m_married_edu - min_demandmet_m_married_edu ,   
    gap_demandmet_m_married_hhw     = max_demandmet_m_married_hhw - min_demandmet_m_married_hhw ,
    gap_demandmet_m_married_rurb    = max_demandmet_m_married_rurb - min_demandmet_m_married_rurb ,
    gap2_demandmet_m_married_age    = max2_demandmet_m_married_age - min2_demandmet_m_married_age ,
    gap2_demandmet_m_married_reg    = max2_demandmet_m_married_reg - min2_demandmet_m_married_reg, 
    gap_demandmet_m_mar     =                demandmet_m_married - demandmet_m_unmarried,
    gap2_demandmet_m_mar    =abs(gap_demandmet_m_mar)   
) 

# merge back country name, region, etc. 
dtasumlist<- dta %>% 
    select(surveyid, country, UNSDsubregion, group, year, type) %>% 
    filter(group=="Total") %>% 
    select( -starts_with("group"))

    test<-filter(dta, UNSDsubregion=="Western Africa")
    table(test$country)
dtasum<-full_join(dtasumlist, dtasum, by = "surveyid")
obs<-nrow(dtasum)

2.5. Keep analysis sample

Now the dataset, dtasum, has 294 observations, at the survey level. But, the study was conducted in a subset - i.e., DHS surveys in countries where 2 or more DHS surveys have been conducted, 213 surveys from 55 countries (as of September, 2017).

First, keep only DHS surveys (e.g., not AIS) since 1990.

dtasum<-dtasum %>% 
        filter(year>=1990) %>% 
        filter(type=="DHS") %>% 
        filter( is.na(demandmet_m_married)==FALSE)
nrow(dtasum) #number of unique surveys
length(unique(dtasum$country)) #number of unique countries

Also, for the purpose of reproducing the results, drop newly released surveys since September 2017 to reproduce the results. They are likely surveys conducted in 2014, 2015 or later. See Supplement 1 for the list of surveys included in the study (http://www.ghspjournal.org/content/suppl/2018/06/29/GHSP-D-18-00012.DCSupplemental).

dtasum<-mutate(dtasum, 
        newsurvey=
            year>=2017  |
            (year==2016 & (surveyid!="AM2016DHS" & surveyid!="ET2016DHS"))|
            (year==2015 & (surveyid!="CO2015DHS" & surveyid!="GU2015DHS" 
                            & surveyid!="MW2015DHS" & surveyid!="RW2015DHS" 
                            & surveyid!="TZ2015DHS" & surveyid!="ZW2015DHS"))|
            (year==2014 & (surveyid!="BD2014DHS" & surveyid!="EG2014DHS" 
                            & surveyid!="GH2014DHS" & surveyid!="KE2014DHS" 
                            & surveyid!="KH2014DHS" & surveyid!="LS2014DHS" 
                            & surveyid!="SN2014DHS" & surveyid!="TD2014DHS"))
    )
dtasum<-dtasum %>% filter(newsurvey==0)
nrow(dtasum) #number of unique surveys
length(unique(dtasum$country)) #number of unique countries

Then, finally, keep only surveys from countries that had 2 or more surveys

dtasum<-dtasum %>% group_by(country) %>% mutate(count = n()) %>% filter(count>1)
label(dtasum$count)<- "number of surveys per country" 

obs<-nrow(dtasum)
countries<-length(unique(dtasum$country))
obs
[1] 213
countries
[1] 55

Now the analysis dataset, dtasum, has 213 observations (i.e., DHS surveys) from 55 countries.

2.6. Create further summary measure variables

dtasum<-dtasum %>% group_by(surveyid) %>% mutate(countregion = n())
label(dtasum$countregion)<- "number of regions per survey"

dtasum<-dtasum %>% group_by(country) %>% 
        mutate(
            latest=year==max(year),
            yearmin=min(year), 
            yearmax=max(year))
label(dtasum$latest)<- "latest survey included in this study (per country)"
label(dtasum$yearmin)<- "first survey included in this study (per country)"
label(dtasum$yearmax)<- "latest survey included in this study (per country)"

3. Data Analysis

There are five figures and one table in results. The following presents code for the figures and the table in the order of their appearance in the paper.

3.1. Figure 1

Figure 1 shows “Boxplot Distribution of Within-Country Disparity in Family Planning Demand Met With Modern Methods (Percentage Point) Between the Most- and Least-Advantaged Subgroups, by Background Characteristic”. It uses only the latest survey in each country.

suppressPackageStartupMessages(library(tidyr))

varlist<-c( "gap_demandmet_m_married_edu",  "gap_demandmet_m_married_hhw", 
            "gap_demandmet_m_married_rurb", "gap_demandmet_m_mar", 
            "gap2_demandmet_m_married_age", "gap2_demandmet_m_married_reg")

dtafig<-dtasum %>% filter(latest==1) %>% select(country, varlist) 

dtafiglong<-gather(dtafig, group, gap, starts_with("gap"))
dtafiglong<-dtafiglong %>% 
    mutate(
    group = ifelse(group == "gap_demandmet_m_married_edu", "1.Education", group),   
    group = ifelse(group == "gap_demandmet_m_married_hhw", "2.Wealth", group),  
    group = ifelse(group == "gap_demandmet_m_married_rurb", "3.Residence", group),  
    group = ifelse(group == "gap_demandmet_m_mar", "4.Marital status", group),  
    group = ifelse(group == "gap2_demandmet_m_married_age", "5.Age*", group),   
    group = ifelse(group == "gap2_demandmet_m_married_reg", "6.Region*", group)
    )
suppressPackageStartupMessages(library(ggplot2))

ggplot(dtafiglong, aes(group, gap))+
    geom_boxplot()+
    geom_hline(yintercept = 0, color="red")+
    labs(y="% Point Difference in Met Demand")

3.2. Table

The table presents “Estimated Annual Changes in Disparity in Family Planning Demand Met With Modern Methods Between the Most- and Least-Advantaged Subgroupsa by Background Characteristic (Percentage-Point Change per Year)”.

WORK-IN_PROGRESS in r... 
    egen countryid=group(country)
    #delimit;       
    global varlist "gap_demandmet_m_married_edu 
                    gap_demandmet_m_married_hhw 
                    gap_demandmet_m_married_rurb 
                    gap_demandmet_m_mar
                    gap2_demandmet_m_married_age 
                    gap2_demandmet_m_married_reg";
    #delimit cr 
    
    foreach x of varlist $varlist { 
        xtreg `x' year, i(countryid)
        xtreg `x' year demandmet_m_married, i(countryid)    
        }

3.3. Figure 2

Figure 2 shows “Trends in National-Level Family Planning Demand With Modern Methods (%) and Disparity by Wealth (Percentage Point), by Region and Country”.

dtafig<-dtasum %>% 
    select(country, surveyid, year, demandmet_m_married,
           gap_demandmet_m_married_hhw, UNSDsubregion)
dtafig$region3<-"Other Regions"
dtafig<-dtafig %>% mutate(
        region3= ifelse(
                (UNSDsubregion=="Middle Africa" | UNSDsubregion=="Western Africa" ),
                "Central and Western Africa", 
                region3),       
        region3= ifelse(
                (UNSDsubregion=="Eastern Africa" | UNSDsubregion=="Southern Africa"),
                "Southern and Eastern Africa", 
                region3)
    )

table(dtafig$UNSDsubregion,dtafig$region3,exclude=NULL)

dtafig1 <- dtafig%>% filter(region3=="Central and Western Africa")
dtafig2 <- dtafig%>% filter(region3=="Southern and Eastern Africa")
dtafig3 <- dtafig%>% filter(region3=="Other Regions")
ggplot(dtafig1, aes(demandmet_m_married, gap_demandmet_m_married_hhw)) +             
    geom_point()+
    geom_line() +
    geom_text(aes(label=year), size=2, position = position_stack(0.9) ) +
    facet_wrap(~country, ncol=6) +
    geom_hline(yintercept = 0, color="red", linetype="dashed")+
    geom_vline(xintercept = 75, color="red", linetype="dashed")+
    labs(y= "Difference Between Wealth Quintiles \n(percentage point)",
         x= "National Average (%)") +
    ggtitle("Central and Western Africa") +
    theme(strip.text = element_text(size=rel(0.7)),
          legend.text = element_text(size=rel(0.7)),
          axis.text.x = element_text(size=rel(0.7)),
          axis.text.y = element_text(size=rel(0.7)))
Warning: Removed 2 rows containing missing values (position_stack).
Warning: Removed 2 rows containing missing values (geom_point).

ggplot(dtafig2, aes(demandmet_m_married, gap_demandmet_m_married_hhw)) +             
    geom_point()+
    geom_line() +
    geom_text(aes(label=year), size=2, position = position_stack(0.9) ) +
    facet_wrap(~country, ncol=6) +
    geom_hline(yintercept = 0, color="red", linetype="dashed")+
    geom_vline(xintercept = 75, color="red", linetype="dashed")+
    labs(y= "Difference Between Wealth Quintiles \n(percentage point)",
         x= "National Average (%)") +
    ggtitle("Southern and Eastern Africa") +
    theme(strip.text = element_text(size=rel(0.7)),
          legend.text = element_text(size=rel(0.7)),
          axis.text.x = element_text(size=rel(0.7)),
          axis.text.y = element_text(size=rel(0.7)))
Warning: Removed 3 rows containing missing values (position_stack).
Warning: Removed 3 rows containing missing values (geom_point).

ggplot(dtafig3, aes(demandmet_m_married, gap_demandmet_m_married_hhw)) +             
    geom_point()+
    geom_line() +
    geom_text(aes(label=year), size=2, position = position_stack(0.9) ) +
    facet_wrap(~country, ncol=6) +
    geom_hline(yintercept = 0, color="red", linetype="dashed")+
    geom_vline(xintercept = 75, color="red", linetype="dashed")+    
    labs(y= "Difference Between Wealth Quintiles \n(percentage point)",
         x= "National Average (%)") +
    ggtitle("Other Regions") +
    theme(strip.text = element_text(size=rel(0.7)),
          legend.text = element_text(size=rel(0.7)),
          axis.text.x = element_text(size=rel(0.7)),
          axis.text.y = element_text(size=rel(0.7)))
Warning: Removed 5 rows containing missing values (position_stack).
Warning: Removed 5 rows containing missing values (geom_point).

3.4. Figure 3

Figure 3 presents “Illustrative Examples of Varying Trends of Family Planning Demand Met With Modern Methods (%), Nationally and by Household Wealth Quintile”.

dtafig<-dtasum %>% filter(country=="Cameroon" | country=="Ethiopia" 
                          | country=="Madagascar" | country=="Nigeria")

dtafig$country <- as.factor(dtafig$country)
dtafig$country <- factor(dtafig$country, 
                         levels = c("Madagascar", "Ethiopia", "Cameroon", "Nigeria"))

dtafig<- dtafig %>% rename(national=demandmet_m_married)
dtafig<- dtafig %>% rename(poorest=min_demandmet_m_married_hhw )
dtafig<- dtafig %>% rename(richest=max_demandmet_m_married_hhw )

dim(dtafig)
dtafiglong<-gather(dtafig, group, demandmet_m_married,
                   c(national, richest, poorest ))
dtafiglong$group <- factor(dtafiglong$group, 
                         levels = c("richest", "national", "poorest"))
dim(dtafiglong)
ggplot(dtafiglong, aes(year, demandmet_m_married))+ 
    geom_line(aes(color=group))+
    facet_grid(. ~ country)+
    labs(y="Met Demand (%)")

3.5. Figure 4

Figure 4 is “Correlation Matrix Among Various Disparities in Family Planning Demand Met With Modern Methods, With Correlation Coefficient”. Only the 55 latest surveys are included.

varlist<-c( "gap_demandmet_m_married_edu",  "gap_demandmet_m_married_hhw", 
            "gap_demandmet_m_married_rurb", "gap_demandmet_m_mar", 
            "gap2_demandmet_m_married_age", "gap2_demandmet_m_married_reg")

dtafig<-dtasum %>% 
        filter(latest==1)%>%
        select(country, varlist) 
suppressPackageStartupMessages(library(GGally))
ggpairs(dtafig, columns=varlist,
    diag=list(continuous="density", discrete="bar"), axisLabels="show")

3.6. Figure 5

Figure 5 shows “Changes in the Distribution of Educational Attainment Among Female Population 6 Years and Older”. Graph area varies by the number of surveys and intervals between them.

suppressPackageStartupMessages(library(tidyr))
dtafig<-dtasum %>% 
        select(country, year, starts_with("edu"))%>% 
        select(-edu_dkmiss)
    
dtafiglong<-gather(dtafig, level, percent,
                   c(edu_none, edu_somepri, edu_comppri, 
                     edu_somesec, edu_compsec, edu_high))
dtafiglong$country <- as.factor(dtafiglong$country)
dtafiglong<-dtafiglong %>% 
    mutate(
    level = ifelse(level == "edu_none", "none", level),     
    level = ifelse(level == "edu_somepri", "some_primary", level),  
    level = ifelse(level == "edu_comppri", "complete_primary", level),  
    level = ifelse(level == "edu_somesec", "some_secondary", level),    
    level = ifelse(level == "edu_compsec", "complete_secondary", level),    
    level = ifelse(level == "edu_high", "higher", level)
    )
dtafiglong$level <- factor(dtafiglong$level, 
                         levels = c("higher", "complete_secondary",
                                    "some_secondary", "complete_primary",
                                    "some_primary", "none"))
ggplot(dtafiglong, aes(year, percent, fill=level)) + 
    geom_area()+
    scale_fill_manual(values=c("#006837", "#31a354", "#78c679", 
                               "#08519c", "#3182bd", "#b30000"))+
    facet_wrap(~country, ncol=8)+
    labs(y="Cumulative Distribution of Female Population (%)")+
    theme(strip.text = element_text(size=rel(0.7)),
          legend.text = element_text(size=rel(0.7))
          )

Note: In Figure 5 of the paper, data came from any countries that have two or more surveys. That includes a country like Burundi, which is not a study country for the paper.