Yoonjoung Choi
(Initially published on 2019-09-26. Updated on 2021-04-13)
This working paper is submitted for the PAA 2020 Annual Meeting. It is also available at R Pubs (http://rpubs.com/YJ_Choi/PAA2020) for easy access to the open source data visualization application (https://isquared.shinyapps.io/DHS_ContraceptiveDiscontinuation/) as well as other resources.
Throughout life course, women choose to use, switch, or discontinue contraceptives. Understanding and communicating this contraceptive use dynamics data is essential for family planning programs. This paper aims: to explore an approach to visualize the contraceptive use dynamics data; and to create an open source, interactive data visualization application, using data from 60 countries. Data are estimated twelve-month discontinuation rates. They come from the DHS API for its wide availability and easy accessibility. Sankey diagrams are used, given its relative simplicity, especially compared to chord diagrams. Color and order of methods are determined based on method effectiveness. The tool provides a Sankey diagram and summary statistics for each selected country. The diagrams highlight discontinuation 'while in need' and discontinuation due to method failure across countries. The tool is designed to initiate and facilitate discussion about contraceptive dynamics and, thus, to improve family planning programs.
Dynamic characteristics of contraceptive method use is a unique and essential aspect of family planning. Throughout life course, women choose to use, switch, or discontinue contraceptives based on their changing fertility intention and fecundity. A critical and additional factor that determines the decision is women's access to contraceptives (Choi, Fabic, and Adetunji 2016). Successful family planning programs address barriers to access and improve women's contraceptive choice throughout their reproductive years.
This means understanding and communicating the dynamic use data, in addition to cross-sectional metrics, is essential for family planning programs. The "leaking bucket" (Jain 2014) is an example that clearly illustrates: how individual women may belong to one category based on demand for family planning and contraceptive use at a given time; and how they - even those whose demand is met currently - may move to another category at different times. Panel data are required to study changing demand for family planning and contraceptive use over time at the individual-level: i.e., demand met (specifically contracepting), demand not met (a.k.a. unmet need), or no demand for family planning. To date, no study design has been widely used or standardized to measure these metrics prospectively across settings.
Cross sectional surveys, nevertheless, have provided part of the information based on contraceptive calendar data (Ali, Cleland, and Shah 2012; Bradley, Winfrey, and Croft 2015), although it is fair and important to note limitations in using the calendar data. First, the data are only about contraceptive use dynamics, and it is impossible to understand if the dynamics corresponds with changing demand for family planning over time. Second, advanced skills are needed to manage and analyze individual-level calendar data, though more resources have become available in recent years (The DHS Program 2018). Finally, quality of the calendar data varies widely across surveys (Bradley, Winfrey, and Croft 2015). Despite these limitations, twelve-month discontinuation rates are calculated using the calendar data and are relatively widely available and accessible, giving us opportunities to communicate and mainstream contraceptive use dynamics into family planning programs. In addition, the metric recently was added as a core monitoring indicator for FP2020.
The Demographic and Health Surveys (DHS) Program has published the discontinuation rates routinely in their final reports. In this example of Tanzania DHS 2015-2016 (Table 1), contraceptive use continued in most episodes overall (see the bottom row of "All methods"), but about a quarter of episodes discontinued (because of seven reasons listed in 2nd - 8th columns - see the "Any reason" column). The rate varies by method, ranging from 10% in implants to 34% in pills. In addition, six percent of the episodes were switched to another method use (see the second last column). While the table is routinely available for all surveys, interpretation of the information is not necessarily straightforward. Further, no figure visualizing this information is widely used among family program practitioners, although recently launched tools are welcome and promising to facilitate and advance our understanding of this topic (Duke University; Patierno and Woodin 2019).
Table 1. Twelve-month discontinuation rates reported in Tanzania 2015-2016 DHS Final Report (Source: Tanzania DHS 2015-2016)
Objectives of this paper are: to explore an approach to visualize the contraceptive use dynamics data (specifically, twelve-month discontinuation rates), and to introduce an open source, interactive data vizualization application, using data from 60 countries. The tool can be a resource to better communicate and understand contraceptive use dynamics data and, thus, to improve family planning programs.
Data are twelve-month discontinuation rates, available from the DHS Program's Indicator Data API (The DHS Program). The rate is the percentage of episodes where the specific method is discontinued within 12 months after beginning its use, among women who ever used contraception in the five years before a survey. They are calculated using the contraceptive calendar data, which also collects reasons for discontinuation. Further information about the calendar data are available elsewhere (Bradley, Winfrey, and Croft 2015; The DHS Program 2018). The rates are estimated among all episodes and also disaggregated by method.
The estimated rates available at the Indicator Data API are identical with those available at STATcompiler, but may include additional reason or method categories than those presented in a final report. As an example, data in the above Table 1 can be called via the API, shown in the html format here.
Currently, all standard DHS surveys collect the contraceptive calendar data, though its inclusion varied in the past (Bradley, Winfrey, and Croft 2015). As of September 2019, 60 countries have conducted at least one standard DHS survey that included calendar data. For demonstration purposes, Tanzania DHS 2015-2016 data are used. For the data visualization application, data from all 60 countries are included.
An approach to visualize discontinuation rates is to present flows from the baseline to the endline after 12 months. Each episode begins from a contraceptive method and ends either in the same method, a different method (i.e., switching), or no method (i.e., discontinuation). The matrix from baseline to endline are mechanically similar with a matrix of migration data: flows from origin to destination. Thus two data visualization approaches for migration data are reviewed: chord and Sankey diagrams.
A chord diagram is "a graphical method of displaying the inter-relationships between data in a matrix" (https://en.wikipedia.org/wiki/Chord_diagram). It has been used in global migration studies, presenting complex flow data across global regions or even countries (Abel and Sander 2014; Sander et al. 2014). The diagram has become popular in data visualization for various topics, thanks to its ability to portray very complex matrix data and perhaps also its aesthetic strength. However, it is indeed a complex figure and is not necessarily simple to understand quickly the in and out flows in each circular segment - although it is easier to walk through when the diagram is interactive as shown in this example (Sander, Abel, and Bauer). Another option is a Sankey diagram, named after an engineer from a century ago. It displays flows with arrows or lines, of which the width is shown proportionally to the flow quantity (https://en.wikipedia.org/wiki/Sankey_diagram). The lines can be combined together or split through their paths on each stage of a process.
Both diagrams can be used to visualize migration flows (from Data to Viz b) - for example, migration across ten regions (Abel 2018) (presented in Table 2). In the matrix, rows are origins, and columns represent destination regions. The chord diagram has 10 segments in the circle, representing the 10 regions (Figure 1). Each segment shows migration to and from the region, distinguished by the base and direction of chords. The Sankey diagram on the right side is relatively easier to understand flows, because it separates origin and destination into essentially two stacked bars - though it would have been even clearer, had the consistent region colors been used for both origin and destination.
Table 2. Migration flow quantity by origin and destination between 1960 and 2015 (Source: Abel 2018; https://www.data-to-viz.com/story/AdjacencyMatrix.html)
Figure 1. Same data, different visualization: Chord and Sankey diagrams based on same migration data in Table 2 (Source: https://www.data-to-viz.com/graph/chord.html and https://www.data-to-viz.com/graph/sankey.html)
Based on its relative visual simplicity, the Sankey diagram is utilized to present the baseline and endline data from the discontinuation rates.
To create the Sankey diagram, contraceptive use episode data need to be structured in a matrix, with rows representing a baseline method and columns representing a method at the endline, including no method - i.e., discontinuation. Each cell in the matrix contains the number of episodes, calculated based on the number of method-specific episodes (i.e., the last column in Table 1) and the discontinuation or switching rate. Thus, sum of numbers in each cell is the total number of use episodes analyzed.
Discontinuation in this paper refers to not using any methods at the endline and, thus, excludes switching. Discontinuation is further categorized into three, based on the reasons: discontinuation while 'not in need' for family planning, discontinuation while 'in need', and discontinuation due to pregnancy (i.e., method failure). 'Not in need' includes reasons indicating changes in her fertility intention or perceived fecundity, specifically: desire to become pregnant; infrequent sex/husband away; difficult to get pregnant/menopausal; and marital dissolution/separation. The remaining discontinuation (i.e., Any-reason discontinuation - switching - method failure - discontinuation due to 'not in need' reasons) is considered discontinuation while 'in need'.
The number of switching was calculated by the total number of method specific episodes (i.e., the last column in Table 1) multiplied by the percent switched to another method (i.e., second last column in Table 1). Then the total switching episodes were allocated evenly across all other methods, with a crude assumption that women chose a new method randomly. This assumption was used in order to use available rates from the DHS API, avoiding need for additional analysis of the calendar data - considering the main purpose of the paper and diagram. Although the assumption is unlikely, the number of switching is relatively small, and its magnitude is difficult to differentiate visually in the diagram.
In the example of Tanzania DHS 2015-2016, the baseline-endline contraceptive dynamics matrix includes 9 rows (methods at baseline) and 12 columns (9 methods + 3 discontinuation categories at endline) (Table 3). The sum of each cells is 6,825, as shown in Table 1.
Table 3. Contraceptive flow matrix: number of contraceptive use episodes by baseline and endline method (Note: One difference between the published table and API data is that the "Other" row in the table is split into two in API: other modern methods, and other traditional methods. So, there are 9 methods/rows, instead of 8 in the above Table 1.)
R version 3.5.2 was used for data access, management, and analysis. Code is available in Appendices A-B.
R library networkD3 (Gandrud et al. 2017) was used to create a diagram. Baseline methods are presented on the left, with endline on the right side. Contraceptive methods are listed according to the order of method effectiveness. Modern and traditional methods are presented in blue and green shades, respectively. Discontinuation at the endline is presented in red, orange, and yellow.
Below code was heavily adapted from an example of a migration Sankey diagram (from Data to Viz c).
# Libraries
library(tidyverse)
library(networkD3)
# Reshape the matrix data to long format
data_long <- matrix %>%
rownames_to_column %>%
gather(key = 'key', value = 'value', -rowname) %>%
filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep="")
# Create a node data frame, listing every entities involved in the flow
nodes <- data.frame(name=c(as.character(data_long$source),
as.character(data_long$target)) %>%
unique()
)
# With networkD3, connection must be provided using id, not using real name like in the links dataframe. So we need to reformat it.
data_long$IDsource=match(data_long$source, nodes$name)-1
data_long$IDtarget=match(data_long$target, nodes$name)-1
# Prepare colour scale (6-class Blues, 3-class Greens, & YlOrRd)
ColourScal ='d3.scaleOrdinal() .range(["#084594","#2171b5","#4292c6","#6baed6","#9ecae1","#c6dbef","#238b45","#41ab5d","#74c476","#b10026", "#fc4e2a","#ffffb2"])'
# Make the Network
# set "iterations=0" to avoid automatic assignment of the box order
sankeyNetwork(Links = data_long, Nodes = nodes,
Source = "IDsource", Target = "IDtarget",
Value = "value", NodeID = "name",
sinksRight=FALSE, colourScale=ColourScal,
nodeWidth=40, fontSize=13, nodePadding=20,
iterations=0
)
Using R-Studio Shiny, an application was developed, implementing the above approach to all available countries. Users need to select one country, then the Sankey diagram and summary statistics are populated. Its open source code is available in GitHub.
Figure 2. Contraceptive continuation, switching and discontinuation based on 12-month discontinuation rates in Tanzania DHS 2015-2016
The diagram shows how contraceptive use by each method on the left changed (or unchanged) 12-months after starting its use. Overall, most episodes continued for 12 months since beginning of the use. But:
Discontinuation:
- About 20% of episodes discontinued. (Note: discontinuation in this diagram does not include switching, and, thus the percentage of discontinuation is different from estimates in the report, Table 1 above)
- Of those discontinued episodes, about 35% was because there was no more need for family planning (yellow).
- The rest, however, were discontinued while there was need for family planning (orange: 48% of total discontinuation, or 10% of total use episodes) or because of method failure (red: 17% of total discontinuation, or 3% of total use episodes).
Switching:
- Overall, there are not many switching cases, only 6.4% of total use episodes.
- Because the diagram visualizes the total number of episodes, it is difficult to differentiate different magnitude of switching, at least in this example of Tanzania.
Note about the diagram:
- To understand the diagram clearly, it is best to hover over each method on the left side and see the flows from the method to the right side.
- Height of each box on the left (i.e., baseline) represents the sum of each row in the matrix (Figure 2).
- Height of each box on the right (i.e., endline) is the sum of each column in the matrix (Figure 2). Right side looks longer only because of more number of categories (12 at the endline vs. 9 at the baseline), and total height is same between baseline and endline.
- Again, the flow thickness is nearly impossible to differentiate, when the volume is low.
The application is available at: https://isquared.shinyapps.io/DHS_ContraceptiveDiscontinuation/ (Figure 3). It provides a Sankey diagram, summary statistics, and brief programmatic implications for each selected country.
Figure 3. Interactive Visualization of Contraceptive Dynamics
This paper explores graphic presentation approaches for contraceptive discontinuation data. Accessible open API data are used, to maximize use of existing resources. Sankey diagrams are chosen, given its relative simplicity, compared to chord diagrams. The interactive diagrams highlight discontinuation 'while in need' and discontinuation due to method failure. Such discontinuation can be reduced by addressing the reasons - including ensuring women's ability to switch to other methods effectively, when desired.
The interactive visualization is a tool to facilitate discussion about contraceptive dynamics - and eventually and ideally family planning dynamics, incorporating data on demand for family planning in the future. Data visualization tools have different purposes and target audiences, and their value and impact partially depends on if a tool is used by intended users for intended purposes. Recently launched tools on contraceptive dynamics reflect increasing interest in this topic and should be promoted for appropriate use. For any data tools, however, they are to initiate and facilitate discussion and do not automatically produce recommendations for programs or policies.
Nevertheless, there are specific improvements that are planned to be made in the interactive application, at the latest prior to the PAA 2020 meeting:
- First, addition of a method-specific zoom-in function. While the current figure provides overall messages, the number and varying thickness of flows prevents from exploring more detailed information. This zoom-in or selection function will address some of the problem. In addition, this function will be applied only when the total number of episodes exceeds a threshold (TBD), in order to avoid any discussion or conclusion based on a small sample size.
- Addition of value labels (the number of episodes and percentage), in the zoom-in phase described above. No value label will be used for switching flows, as its calculation was based on an assumption. - Improving program implications. Based on a broad group results across countries, automating more tailored messages - or discussion guides - will be explored.
In the longer term, analysis of the calendar data, instead of using rates from the DHS API, will be considered. While time and labor intensive, such approach will give flexibility in terms of selecting data (e.g., 0-3 years before the survey vs. 0-5 years, per data quality concerns) and calculating switching rates specific to baseline and endline methods.
Abel GJ. Estimates of Global Bilateral Migration Flows by Gender between 1960 and 2015. International Migration Review. 2018 Aug;52(3): https://doi.org/10.1111/imre.12327
Abel GJ, Sander N. Quantifying global international migration flows. Science. 2014 Mar;343(6178):1520-2
Bradley S, Winfrey W, Croft T. Contraceptive Use and Perinatal Mortality in the DHS: An Assessment of the Quality and Consistency of Calendars and Histories. 2015. DHS Methodological Reports. No. 17. Rockville, Maryland, USA: ICF International
Choi Y, Fabic M, Adetunji J. Measuring Access to Family Planning Services in Demographic and Health Surveys: Lessons and Challenges. Studies in Family Planning. 2016 Jun;47(2):145-61
Duke University. Big Data for Reproductive Health. https://sites.google.com/view/bd4rh/home
From Data to Viz a. Chord Diagram. Available at: https://www.data-to-viz.com/graph/chord.html
From Data to Viz b. Researchers Network and Migration Flow. Available at: https://www.data-to-viz.com/story/AdjacencyMatrix.html
From Data to Viz c. Sankey Diagram. Available at: https://www.data-to-viz.com/graph/sankey.html
Gandrud C, Allaire J, Russell K, and Yetman CJ. networkD3: D3 JavaScript Network Graphs from R. 2017-03-18. Available at: http://christophergandrud.github.io/networkD3/
Gu, Z. Circlize implements and enhances circular visualization in R. Bioinformatics. 2014 Oct;30(19):2811-2
Jain A. The leaking bucket phenomenon in family planning. 2014. Available at: https://champions4choice.org/2014/09/the-leaking-bucket-phenomenon-in-family-planning/
Patierno K, and Woodin, J. Choices and Challenges: Dynamics of Contraceptive Use. September 25, 2019. Available at: https://www.prb.org/use-dynamics/
R Studio. D3 JavaScript Network Graphs from R. Version 3.0. https://cran.rstudio.com/web/packages/networkD3/README.html
Sander N, Abel GJ, and Bauer R. The Global Flow of People. Available at: http://download.gsb.bund.de/BIB/global_flow/
Sander N, Abel GJ, Bauer R, and Schmidt J. Visualising Migration Flow Data with Circular Plots. Vienna Institute of Demography Working Papers. February 2014 Available at: https://www.oeaw.ac.at/fileadmin/subsites/Institute/VID/PDF/Publications/Working_Papers/WP2014_02.pdf
The DHS Program. DHS Contraceptive Calendar Tutorial (Version 2). 2018. Available at: https://www.dhsprogram.com/data/calendar-tutorial/
The DHS Program. Indicator Data API. Available at: http://api.dhsprogram.com/#/index.html
# Get required functions
library(jsonlite)
library(data.table)
library(dplyr)
# Call API data for 10 indicators (9 columns + the denominator)
# Save each dataframe
url<-("http://api.dhsprogram.com/rest/dhs/data?f=json&indicatorIds=FP_DISR_W_PRG&surveyids=TZ2015DHS&breakdown=all&perpage=1000")
jsondata<-fromJSON(url)
dta<-data.table(jsondata$Data)
dta<-select(dta, CountryName, SurveyId, Value,
CharacteristicCategory, CharacteristicLabel)
FP_DISR_W_PRG<- dta %>% rename(FP_DISR_W_PRG=Value)
url<-("http://api.dhsprogram.com/rest/dhs/data?f=json&indicatorIds=FP_DISR_W_DES&surveyids=TZ2015DHS&breakdown=all&perpage=1000")
jsondata<-fromJSON(url)
dta<-data.table(jsondata$Data)
dta<-select(dta, CountryName, SurveyId, Value,
CharacteristicCategory, CharacteristicLabel)
FP_DISR_W_DES<- dta %>% rename(FP_DISR_W_DES=Value)
url<-("http://api.dhsprogram.com/rest/dhs/data?f=json&indicatorIds=FP_DISR_W_FRT&surveyids=TZ2015DHS&breakdown=all&perpage=1000")
jsondata<-fromJSON(url)
dta<-data.table(jsondata$Data)
dta<-select(dta, CountryName, SurveyId, Value,
CharacteristicCategory, CharacteristicLabel)
FP_DISR_W_FRT<- dta %>% rename(FP_DISR_W_FRT=Value)
url<-("http://api.dhsprogram.com/rest/dhs/data?f=json&indicatorIds=FP_DISR_W_SID&surveyids=TZ2015DHS&breakdown=all&perpage=1000")
jsondata<-fromJSON(url)
dta<-data.table(jsondata$Data)
dta<-select(dta, CountryName, SurveyId, Value,
CharacteristicCategory, CharacteristicLabel)
FP_DISR_W_SID<- dta %>% rename(FP_DISR_W_SID=Value)
url<-("http://api.dhsprogram.com/rest/dhs/data?f=json&indicatorIds=FP_DISR_W_WME&surveyids=TZ2015DHS&breakdown=all&perpage=1000")
jsondata<-fromJSON(url)
dta<-data.table(jsondata$Data)
dta<-select(dta, CountryName, SurveyId, Value,
CharacteristicCategory, CharacteristicLabel)
FP_DISR_W_WME<- dta %>% rename(FP_DISR_W_WME=Value)
url<-("http://api.dhsprogram.com/rest/dhs/data?f=json&indicatorIds=FP_DISR_W_MET&surveyids=TZ2015DHS&breakdown=all&perpage=1000")
jsondata<-fromJSON(url)
dta<-data.table(jsondata$Data)
dta<-select(dta, CountryName, SurveyId, Value,
CharacteristicCategory, CharacteristicLabel)
FP_DISR_W_MET<- dta %>% rename(FP_DISR_W_MET=Value)
url<-("http://api.dhsprogram.com/rest/dhs/data?f=json&indicatorIds=FP_DISR_W_OTH&surveyids=TZ2015DHS&breakdown=all&perpage=1000")
jsondata<-fromJSON(url)
dta<-data.table(jsondata$Data)
dta<-select(dta, CountryName, SurveyId, Value,
CharacteristicCategory, CharacteristicLabel)
FP_DISR_W_OTH<- dta %>% rename(FP_DISR_W_OTH=Value)
url<-("http://api.dhsprogram.com/rest/dhs/data?f=json&indicatorIds=FP_DISR_W_ANY&surveyids=TZ2015DHS&breakdown=all&perpage=1000")
jsondata<-fromJSON(url)
dta<-data.table(jsondata$Data)
dta<-select(dta, CountryName, SurveyId, Value,
CharacteristicCategory, CharacteristicLabel)
FP_DISR_W_ANY<- dta %>% rename(FP_DISR_W_ANY=Value)
url<-("http://api.dhsprogram.com/rest/dhs/data?f=json&indicatorIds=FP_DISR_W_SWH&surveyids=TZ2015DHS&breakdown=all&perpage=1000")
jsondata<-fromJSON(url)
dta<-data.table(jsondata$Data)
dta<-select(dta, CountryName, SurveyId, Value,
CharacteristicCategory, CharacteristicLabel)
FP_DISR_W_SWH<- dta %>% rename(FP_DISR_W_SWH=Value)
url<-("http://api.dhsprogram.com/rest/dhs/data?f=json&indicatorIds=FP_DISR_W_NUM&surveyids=TZ2015DHS&breakdown=all&perpage=1000")
jsondata<-fromJSON(url)
dta<-data.table(jsondata$Data)
dta<-select(dta, CountryName, SurveyId, Value,
CharacteristicCategory, CharacteristicLabel)
FP_DISR_W_NUM<- dta %>% rename(FP_DISR_W_NUM=Value)
Clean and manage the data frame.
library(dplyr)
# Merge 10 dataframes
idvars<-c("CountryName", "SurveyId", "CharacteristicCategory", "CharacteristicLabel")
dtaapi<-FP_DISR_W_PRG %>%
full_join(FP_DISR_W_DES, by =idvars) %>%
full_join(FP_DISR_W_FRT, by =idvars) %>%
full_join(FP_DISR_W_SID, by =idvars) %>%
full_join(FP_DISR_W_WME, by =idvars) %>%
full_join(FP_DISR_W_MET, by =idvars) %>%
full_join(FP_DISR_W_OTH, by =idvars) %>%
full_join(FP_DISR_W_ANY, by =idvars) %>%
full_join(FP_DISR_W_SWH, by =idvars) %>%
full_join(FP_DISR_W_NUM, by =idvars)
# Tidy data: rename and clean variable names
dta<-dtaapi %>%
rename ( xprg= FP_DISR_W_PRG) %>%
rename ( xdes= FP_DISR_W_DES) %>%
rename ( xfrt= FP_DISR_W_FRT) %>%
rename ( xsid= FP_DISR_W_SID) %>%
rename ( xwme= FP_DISR_W_WME) %>%
rename ( xmet= FP_DISR_W_MET) %>%
rename ( xoth= FP_DISR_W_OTH) %>%
rename ( xany= FP_DISR_W_ANY) %>%
rename ( xswh= FP_DISR_W_SWH) %>%
rename ( denom= FP_DISR_W_NUM) %>%
rename (country = CountryName) %>%
rename (group = CharacteristicCategory) %>%
rename (grouplabel = CharacteristicLabel)
colnames(dta)<-tolower(names(dta))
# Keep only estimates by contraceptive methods (i.e., drop the total row)
table(dta$group)
dta<-dta %>% filter(group=="Contraceptive method")
# Create effectiveness order and sort by it
dta<-dta %>% mutate(
order=0,
order= ifelse(grouplabel == "Female sterilization",1, order),
order= ifelse(grouplabel == "IUD",2, order),
order= ifelse(grouplabel == "Implants",3, order),
order= ifelse(grouplabel == "Injectables",4, order),
order= ifelse(grouplabel == "Pill",5, order),
order= ifelse(grouplabel == "Condom",6, order),
order= ifelse(grouplabel == "Other modern methods",7, order),
order= ifelse(grouplabel == "Periodic abstinence",8, order),
order= ifelse(grouplabel == "Withdrawal",9, order),
order= ifelse(grouplabel == "Other traditional methods",10, order))
dta<-arrange(dta, order)
Calculate the number of episodes for each cell in the matrix.
# Analysis
dta<-dta %>%
mutate (
# Calculate number of episodes disoncinuted
Discontinuation=denom*(xany-xswh)/100,
DiscontinuationNotInNeed=denom*(xdes+xfrt)/100,
DiscontinuationFailure=denom*(xprg)/100,
DiscontinuationInNeed=Discontinuation - DiscontinuationNotInNeed - DiscontinuationFailure,
# Calculate number of episodes switched to each method
FemaleSterilization=((denom*xswh/100) / (nrow(dta)-1) ),
Implants=((denom*xswh/100) / (nrow(dta)-1) ),
Injectables=((denom*xswh/100) / (nrow(dta)-1) ),
Pills=((denom*xswh/100) / (nrow(dta)-1) ),
Condom=((denom*xswh/100) / (nrow(dta)-1) ),
OtherModern=((denom*xswh/100) / (nrow(dta)-1) ),
Rhythm=((denom*xswh/100) / (nrow(dta)-1) ),
Withdrawal=((denom*xswh/100) / (nrow(dta)-1) ),
OtherTraditional=((denom*xswh/100) / (nrow(dta)-1) ),
# Calculate number of episodes continued
continue=denom*(100-xany)/100,
FemaleSterilization= ifelse(grouplabel == "Female sterilization",
continue, FemaleSterilization),
Implants= ifelse(grouplabel == "Implants",
continue, Implants),
Injectables= ifelse(grouplabel == "Injectables",
continue, Injectables),
Pills= ifelse(grouplabel == "Pill",
continue, Pills),
Condom= ifelse(grouplabel == "Condom",
continue, Condom),
OtherModern= ifelse(grouplabel == "Other modern methods",
continue, OtherModern),
Rhythm= ifelse(grouplabel == "Periodic abstinence",
continue, Rhythm),
Withdrawal= ifelse(grouplabel == "Withdrawal",
continue, Withdrawal),
OtherTraditional= ifelse(grouplabel == "Other traditional methods",
continue, OtherTraditional),
# check if test==denom
test=DiscontinuationNotInNeed+DiscontinuationInNeed+DiscontinuationFailure+FemaleSterilization+Implants+Injectables+Pills+Condom+OtherModern+Rhythm+Withdrawal+OtherTraditional,
confirm=round(test-denom, 1)
)
table(dta$confirm) # this should be 0 (or reasonable close to 0)
Prepare a matrix ready for the diagram.
# Select only relevant variables/columns
matrix<-dta %>%
select(grouplabel,
FemaleSterilization, Implants, Injectables,
Pills, Condom, OtherModern,
Rhythm, Withdrawal, OtherTraditional,
DiscontinuationFailure, DiscontinuationInNeed,
DiscontinuationNotInNeed) %>%
mutate_if(is.numeric, round, 1)
# Label rows and check against the grouplabel
rownames(matrix) <- c("FemaleSterilization","Implants","Injectables","Pills","Condom","OtherModern","Rhythm","Withdrawal","OtherTraditional")
# Final matrix
matrix<-matrix %>% select(-grouplabel)
View(matrix)
One additional adjustment:
It is important to maintain the methods order on both sides of the diagram, determined based on method effectiveness. The order on the left side is determined by the first target on the right side; and, if any cell is 0 in the first column in the matrix (which is the first target/destination), the data frame loses its order in the left side of the sankey diagram. Thus, if there is any 0 in the first column cells, assign a small artificial number - at least for now.
# Recode sterilization 0=0.00001
matrix<-matrix %>% mutate(
FemaleSterilization= ifelse(FemaleSterilization == 0,
0.00001,
FemaleSterilization))
head(matrix[,1], 10)
# Again, label rows and check against the grouplabel
rownames(matrix) <- c("FemaleSterilization","Implants","Injectables","Pills","Condom","OtherModern","Rhythm","Withdrawal","OtherTraditional")
View(matrix)