Introduction

This code through explores how to make a Sankey Diagram. Sankey Diagrams are like flow diagrams in which the arrow width varies with the flow rate.

Sankey Diagrams depict the flow of resources and aid in understanding the origins and applications of the resources, materials, or expenses portrayed in a streamlined manner.



Packages Required to Make a Sankey Diagram

These are the packages required to generate a Sankey Diagram.

  • ggsankey

  • ggplot2

  • dplyr


Uses of Sankey Diagram

Sankey Diagrams allow you to visually depict complex processes while highlighting a specific aspect or resource.

The Sankey Diagram is used to view detailed information in a single view; therefore, giving users the options of both high-level and specific information.

Sankey Diagrams highlight prominent producers and assist your audience in visualizing relative magnitudes and locations with the most significant potential.


Content Overview

I will discuss and illustrate how to utilize a Sankey Diagram with very little data. The purpose of this code through is to introduce you to the possibility of creating a Sankey Diagram. Also, I will attempt to explain information that uses the Sankey Diagram.


Why You Should Care

This topic is significant because when doing research and presenting your findings, it is important that other individuals who are not in your field of study can comprehend the method/topic when presenting your findings.

For instance, economists use extensive data sets, and people not familiar with the economics field might find it hard to comprehend. Given this challenge, Sankey Diagram can help visualize the large data set and make it less complicated and easier to comprehend.


Learning Objectives

You will specifically learn:

  • Uses of Sankey Diagram

  • Why you should care about Sankey Diagram

  • Building your Sankey Diagram

  • Creating a sample data

  • Adding titles and legends to a Sankey Diagram

  • Labeling your axis on the Sankey Diagram

  • Specify your desired color for the Sankey Diagram



Building your Sankey Diagram

This section will demonstrate how to generate sample data and construct a basic Sankey Diagram. For this example, we would create hypothetical data that examines people’s relationship status and whether or not they are happy.

Creating Sample Data

It should be noted that this is a very simplified data set.

s1 <- sample(x= c("Single",
                  "Married", 
                  "Married with kids", 
                  "Married Without kids"), 
              size = 100, 
              replace=TRUE)

s2 <- sample(x= c("Male", 
                  "Female"), 
             size = 100, 
             replace=TRUE)

s3 <- sample(x= c("Happy", 
                  "Not Happy"), 
             size = 100, 
             replace=TRUE)

d <- data.frame(cbind(s1,s2,s3))
names(d) <- c('Relationship', 
              'Gender', 
              'Outcome')

df <- d%>%
  make_long(Relationship, 
            Gender, 
            Outcome)


Creating a Simple Chart

This simple example demonstrates how to generate the Sankey Diagram’s backdrop.

pl <- ggplot(df, aes(x = x,                        
                     next_x = next_x,                                     
                     node = node,
                     next_node = next_node,        
                     fill = factor(node)))
                     
pl <- pl +geom_sankey(flow.alpha = 0.5,          #This Creates the transparency of your node 
                      node.color = "black",     # This is your node color        
                      show.legend = TRUE)        # This determines if you want your legend to show

pl


Adding title, Sub-title, caption and labels

The first chart appears to be accurate. But we have no idea what chart we are looking at. Following that, we would provide some more information such as a title, sub title, and caption.

pl <- ggplot(df, aes(x = x,                        
                     next_x = next_x,                                     
                     node = node,
                     next_node = next_node,        
                     fill = factor(node),
                     label = node))             # This Creates a label for each node
                     
pl <- pl +geom_sankey(flow.alpha = 0.5,          #This Creates the transparency of your node 
                      node.color = "black",     # This is your node color        
                      show.legend = TRUE)        # This determines if you want your legend to show

pl <- pl + geom_sankey_label(Size = 3, 
                             color = "black", 
                             fill = "white") # This specifies the Label format for each node 


pl <- pl + theme_bw()
pl <- pl + theme(legend.position = 'none')
pl <- pl + theme(axis.title = element_blank(),
                 axis.text.y = element_blank(),
                 axis.ticks = element_blank(),
                 panel.grid = element_blank())

pl <- pl + scale_fill_viridis_d(option = "inferno")
pl <- pl + labs(title = "Creating a Sankey Diagram")
pl <- pl + labs(subtitle = "Using a simplified ficticious data")
pl <- pl + labs(caption ="Opeyemi Omiwale" )
pl <- pl + labs(fill = 'Nodes')
pl



Adding percentage numbers on the nodes

The first and second charts appear excellent, but there isn’t much information on them, so we don’t know what we’re looking at. Then we’d add extra information to assist us comprehend exactly what we’re looking at.

First we would create the total numbers

To do this, we would first build a new data set and then group it by nodes. The frequency would then be calculated.

reagg <- df%>%
  dplyr::group_by(node)%>%  # Here we are grouping the data by node and then we are taking the frequency of it 
  tally()

Next we would merge the data set together

df2 <- merge(df,
             reagg, 
             by.x = 'node', 
             by.y = 'node', 
             all.x = TRUE)

Next we combine our codes together

reagg <- df%>%
  dplyr::group_by(node)%>%  # Here we are grouping the data by node and then we are taking the frequency of it 
  tally()

df2 <- merge(df, 
             reagg, 
             by.x = 'node', 
             by.y = 'node', 
             all.x = TRUE)

pl <- ggplot(df2, aes(x = x,                        
                     next_x = next_x,                                     
                     node = node,
                     next_node = next_node,        
                     fill = factor(node),
                     label = paste0(node, " = ", n)))             # This Creates a label for each node
                     
pl <- pl +geom_sankey(flow.alpha = 0.5,          #This Creates the transparency of your node 
                      node.color = "black",     # This is your node color        
                      show.legend = TRUE)        # This determines if you want your legend to show

pl <- pl + geom_sankey_label(Size = 3,
                             color = "black", 
                             fill = "white") # This specifies the Label format for each node 


pl <- pl + theme_bw()
pl <- pl + theme(legend.position = 'none')
pl <- pl + theme(axis.title = element_blank(),
                 axis.text.y = element_blank(),
                 axis.ticks = element_blank(),
                 panel.grid = element_blank())


pl <- pl + scale_fill_viridis_d(option = "inferno")
pl <- pl + labs(title = "Creating a Sankey Diagram")
pl <- pl + labs(subtitle = "Using a simplified ficticious data")
pl <- pl + labs(caption ="Opeyemi Omiwale" )
pl <- pl + labs(fill = 'Nodes')
pl


Using Desired Color for Diagram

For the most part, the colors are assigned to each node automatically. Now we’ll look at how to adjust the colors of each node.

pl <- pl + scale_fill_manual(values = c('Happy' = "aquamarine2",
                                        'Not Happy ' = "pink",
                                        'Single' = "green",
                                        'Married' = "orange",
                                        'Married with kids' = "blue",
                                        'Married Without kids' = "chocolate4",
                                        'Male' = "chartreuse1",
                                        'Female' = "aquamarine4"))


pl


Conclusion

The Sankey Diagram is extremely versatile; it is easily modifiable, and it can depict complicated information. I hope this code walkthrough was useful, and that you are now able to design far more sophisticated Sankey diagrams.

Further Resources

Learn more about Creating more complex Sankey Diagram with the following:




Works Cited

This code through references and cites the following sources: