Author: YJ Choi
Last updated: 2019-06-23

Family planning dynamics: Chord or Sankey?

This demonstrates and compares two approaches to visualize family planning dynamics data: specifically discontinuation and switching data from DHS.

1. Visualizing dynamic data: chord and Sankey diagrams

A chord diagram is “a graphical method of displaying the inter-relationships between data in a matrix.” This paper on global migration flow shows a great example of how the diagram can be used to present complex “flow data” on migration across global regions effectively (Figure 1). Even more detailed version with country-level migration is possible.

Figure 1. Classic example of a chord diagram visualizing inter-related flow data (Abel and Sander, 2014)
(Source: Abel GJ, Sander N. Quantifying global international migration flows. Science. 2014 Mar 28;343(6178):1520-2)

The diagram has become popular in data visualization for various topics, thanks to its ability to portray very complex matrix data and perhaps also its aesthetic strength. However, it is indeed a complex figure and is generally hard to understand quickly - although it is much easier when the diagram is interactive: click here for an amazing interactive version of the migration figure.

There are interesting and insightful discussions around complexity of chord diagrams like this and this. Some explore and use other diagrams to visualize “flow data” equally or even more effectively, of course, all depending on specific objectives of data analysis (and thus communication of the results). One example is Sankey diagrams, named after an engineer and captain from a century ago.

Below figure shows how same migration data from this paper can be visualized in chord and Sankey diagrams. Figure 3 is the flow data matrix. (NOTE: the two diagrams were extracted from these excellent sites on data visualization: here and here.)

Figure 2. Same data, different visualization: Chord and Sankey diagrams based on same migration data (Source: https://www.data-to-viz.com/graph/chord.html and https://www.data-to-viz.com/graph/sankey.html)

Figure 3. Matrix of migration

In this example of migration across regions between two time points, Sankey diagram on the right is relatively easier to understand “flow” because it separates the beginning and end of the period into essentially two stacked bars. Data users do not need to understand chord color and base arc length in the chord diagram on the left. Nevertheless, in my opinion, a consistent box color for each region on both sides might be even easier to understand. The diagram can be also used to visualize multi-period flow data like here, beyond just two-state.

2. Family planning use and demand dynamics

As clearly illustrated in the “leaking bucket” figure, women’s contraceptive use and demand for family planning change overtime. Without panel data, it is difficult to have family planning use and demand flow information: i.e., demand met (specifically contracepting), demand not met (a.k.a. “unmet need”“), or no demand for family planning.

Figure 4. Leaking Bucket (Source: Jain A. The leaking bucket phenomenon in family planning. 2014. https://champions4choice.org/2014/09/the-leaking-bucket-phenomenon-in-family-planning/)

But, cross sectional surveys, including the most well known Demographic and Health Surveys, have provided flow information among women who ever used contraception in the five years before a survey: i.e., continued using a method, switched to another method, or discontinued the use. More about the survey’s “calendar data” is here.

It is often measured in 12-month “rates” (i.e., percent of episodes discontinued/switched within the time frame), and below table shows how the data are presented typically. In this example, overall (see the bottom row, “All methods”), contraceptive use continued in most episodes, but about a quarter of episodes discontinued (because of 7 reasons listed in columns). Six percent of the episodes (which is captured in the ‘any reason discontinuation’ column) were switched to another method use. Rate of discontinuation varies by method, ranging from 10% in implants to 34% in pills (of course excluding 0% discontinuation among female sterilization cases). It is highly recommended to read the footnote of this table very carefully.

Figure 5. 12-month discontinuation data from Tanzania 2015-2016 DHS (Source: Tanzania DHS 2015-2016)

Given its dynamic nature, a flow diagram would be a good visualization approach. Next section presents reproducible R code to generate the two diagrams using the above data from Tanzania 2015-2016 DHS.

But, wait! There is missing information to create the diagrams: detailed switching information - for example, among those pill use episodes that switched to other methods (second last column), percent distribution by new switched method. That information is actually available in the “calendar data” but typically not tabulated routinely. Since my purpose here is to demonstrate and compare visualization approaches, no individual-level data are analyzed to calculated that information (which itself alone is a significant and complex data analysis activity!). Rather, the tabulated data are used with a simple, though unlikely assumption: the new switched method is randomly chosen (clearly not the case in the real world), and, thus, switching to each method has same probabilities.

3. Diagrams to vizualize family planning use and demand dynamics

3.1. Get the discontinuation and switching tabulation data via DHS API

Data in the above table can be called via DHS API. See here how the data look like (in html format). To learn more about how to call DHS API data, see this “How to get DHS API data: introduction”. To see R code for data access and wrangling in this section, see the markdown file in GitHub.

One difference between the published table and API data: the “Other” row in the table is split into two: other modern methods, and other traditional methods. So, there are 9 methods/rows, instead of 8 in the above table.

3.2. Complete the flow matrix

Now the final analysis data should be 9 (methods) by 12 (9 methods + 3 discontinuation categories) data frame.

Important footnote:
1. It has the number of episodes starting from one method (in each row) ending in one of the 12 columns by the end of the 12-month period. Sum of numbers in each row adds up to the total episodes by method.
2. Unlike the table above, in this analysis and visualization, discontinuation refers to not using any methods and thus excludes switching episodes.
3. As mentioned, a simple assumption is made that new switched method is selected randomly.
4. Discontinuation is split into three groups: Discontinuation while NOT in need vs. discontinuation while in need vs. discontinuation due to method failure. “Not in need” includes two reasons: desire to become pregnant, and other fertility related reasons. It is assumed, for this exercise, that all who discontinued because they wanted more effective methods (5th column in the above table) indeed switched to another method.

To see R code for further data processing, see the markdown file in GitHub.

The analysis data frame, 9 by 12 matrix, should look like this, with most number of episodes by method end up being in the same method continuation cells (i.e., diagonal cells), followed by the discontinuation cells (i.e., the last two columns). Rows represent source and columns are target (i.e., the number of episodes move from a source to a target in the diagram).

Figure 6. Matrix of contraceptive use dynamics

Argh, but, one more! When the first column (i.e., first target) has a cell with “0” (like 6th row in the figure above), the source for that cell (like “Other Modern” in the figure) is pushed back in terms of the source order in the diagram. Since the rows are in the order of effectiveness, we would like to keep that order. One not so smart way to fix this problem (until I find a smart solution) is to artificially assign a small number in that case. Here, 0.0001 is used, although no episode of Other Modern moved to Female Sterilization.

3.3. Sankey diagram

Let’s start with a Sankey diagram, which I think is more appropriate for this topic of discontinuation and switching. Below code has been heavily borrowed from an example, thanks to “from data to viz”!

# Libraries
library(tidyverse)
#library(patchwork)
#library(hrbrthemes)
#library(circlize)
library(networkD3)

# Define the "data" for the digram is "matrix" from above
data <- matrix

# Reshape data to long format 
data_long <- data %>%
    rownames_to_column %>%
    gather(key = 'key', value = 'value', -rowname) %>%
    filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep="")

# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(name=c(as.character(data_long$source),
                           as.character(data_long$target)) %>%
                        unique()
                    )

# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
data_long$IDsource=match(data_long$source, nodes$name)-1 
data_long$IDtarget=match(data_long$target, nodes$name)-1

# Prepare colour scale (7-class Blues, 7-class Greens, 7-class YlOrRd)
ColourScal ='d3.scaleOrdinal() .range(["#084594","#2171b5","#4292c6","#6baed6","#9ecae1","#c6dbef","#238b45","#41ab5d","#74c476","#b10026", "#fc4e2a","#ffffb2"])'

# Make the Network
##### set "iterations=0" to avoid automatic assignment of the box order
sankeyNetwork(Links = data_long, Nodes = nodes,
            Source = "IDsource", Target = "IDtarget",
            Value = "value", NodeID = "name", 
            sinksRight=FALSE, colourScale=ColourScal, 
            nodeWidth=40, fontSize=13, nodePadding=20,
            iterations=0
            )

Great, the diagram shows how contraceptive use episodes by each method on the left changed (or unchanged) 12-months after starting the methods, based on the calendar data in Tanzania DHS 2015-2016. Overall, most episodes continued for 12 months since beginning of the use. But:

Discontinuation:
- About 20% of episodes discontinued.
- Of those discontinued episodes, about 35% was because there was no more need for family planning (yellow).
- The rest, however, were discontinued while there was need for family planning (orange: 48% of total discontinuation, or 10% of total use episodes) or because of method failure (red: 17% of total discontinuation, or 3% of total use episodes).
- By individual method, Proportionally, there is higher discontinuation among short-acting methods, which makes sense.

Switching:
- There are not many switching cases.
- Again, individual switching flows are calculated with a crude assumption (i.e., random selection of the next/other method) for the purpose of this exercise. But, given relatively low level of switching (in the example of Tanzania, at least), this is what a typical Sankey diagram would look like for contraceptive discontinuation and switching.

NOTE about the diagram:
- To understand the diagram clearly, it is best to HOVER OVER EACH METHOD on the left side and see the chords/flows from the method to the right side. - Height of each box on the left (i.e., source) is the sum of each row in the Matrix, Figure 6.
- Height of each box on the right (i.e., target) is the sum of each column in the Matrix, Figure 6. - Right side looks longer only because of more number of categories, and total height is same between source and target.
- And a caution: the flow/chord thickness is nearly impossible to differentiate the volume when it is low. For example, the artificially assigned 0.0001 episodes from other modern to sterilization (if you remember from above) is as clear/prominent as other flows to sterilization.

3.4. Chord diagram

# Libraries
library(tidyverse)
library(viridis)
#library(patchwork)
library(hrbrthemes)
library(circlize)
#library(chorddiag)  #devtools::install_github("mattflor/chorddiag")

# Define the "data" for the digram is "matrix" from above
data <- matrix

# Reshape data to long format
data_long <- data %>%
  rownames_to_column %>%
  gather(key = 'key', value = 'value', -rowname)

# parameters
circos.clear()
circos.par(start.degree = 90, gap.degree = 4, 
           track.margin = c(-0.1, 0.1), 
           points.overflow.warning = FALSE)
par(mar = rep(0, 4))

# color palette
#mycolor <- viridis(10, alpha = 1, begin = 0, end = 1, option = "D")
#mycolor <- mycolor[sample(1:10)]
mycolor <-c("#084594","#2171b5","#4292c6","#6baed6","#9ecae1","#c6dbef","#238b45","#41ab5d","#74c476","#b10026", "#fc4e2a","#ffffb2")

# Base plot
chordDiagram(
    x = data_long, 
    grid.col = mycolor,
    transparency = 0.25,
    directional = 1,
    direction.type = c("arrows", "diffHeight"), 
    diffHeight  = -0.04,
    annotationTrack = "grid", 
    annotationTrackHeight = c(0.05, 0.1),
    link.arr.type = "big.arrow", 
    link.sort = TRUE, 
    link.largest.ontop = TRUE)

# Add text and axis
circos.trackPlotRegion(
    track.index = 1, 
    bg.border = NA, 
    panel.fun = function(x, y) {
    
        xlim = get.cell.meta.data("xlim")
        sector.index = get.cell.meta.data("sector.index")
    
        # Add names to the sector. 
        circos.text(
            x = mean(xlim), 
            y = 3.2, 
            labels = sector.index, 
            facing = "bending", 
            cex = 0.8
            )
        # Add graduation on axis
        circos.axis(
            h = "top", 
            major.at = seq(from = 0, to = xlim[2], 
                     by = ifelse(test = xlim[2]>10, yes = 2, no = 1)), 
            minor.ticks = 1, 
            major.tick.percentage = 0.5,
            labels.niceFacing = FALSE
            )
        }
    )

Hmm, the chord diagram seems pretty ineffective in my opinion. Not even motivated to fix the aesthetic problems.