This code through explores how to make a Sankey Diagram. Sankey Diagrams are like flow diagrams in which the arrow width varies with the flow rate.
Sankey Diagrams depict the flow of resources and aid in understanding the origins and applications of the resources, materials, or expenses portrayed in a streamlined manner.
These are the packages required to generate a Sankey Diagram.
ggsankey
ggplot2
dplyr
Sankey Diagrams allow you to visually depict complex processes while highlighting a specific aspect or resource.
The Sankey Diagram is used to view detailed information in a single view; therefore, giving users the options of both high-level and specific information.
Sankey Diagrams highlight prominent producers and assist your audience in visualizing relative magnitudes and locations with the most significant potential.
I will discuss and illustrate how to utilize a Sankey Diagram with very little data. The purpose of this code through is to introduce you to the possibility of creating a Sankey Diagram. Also, I will attempt to explain information that uses the Sankey Diagram.
This topic is significant because when doing research and presenting your findings, it is important that other individuals who are not in your field of study can comprehend the method/topic when presenting your findings.
For instance, economists use extensive data sets, and people not familiar with the economics field might find it hard to comprehend. Given this challenge, Sankey Diagram can help visualize the large data set and make it less complicated and easier to comprehend.
You will specifically learn:
Uses of Sankey Diagram
Why you should care about Sankey Diagram
Building your Sankey Diagram
Creating a sample data
Adding titles and legends to a Sankey Diagram
Labeling your axis on the Sankey Diagram
Specify your desired color for the Sankey Diagram
This section will demonstrate how to generate sample data and construct a basic Sankey Diagram. For this example, we would create hypothetical data that examines people’s relationship status and whether or not they are happy.
It should be noted that this is a very simplified data set.
s1 <- sample(x= c("Single",
"Married",
"Married with kids",
"Married Without kids"),
size = 100,
replace=TRUE)
s2 <- sample(x= c("Male",
"Female"),
size = 100,
replace=TRUE)
s3 <- sample(x= c("Happy",
"Not Happy"),
size = 100,
replace=TRUE)
d <- data.frame(cbind(s1,s2,s3))
names(d) <- c('Relationship',
'Gender',
'Outcome')
df <- d%>%
make_long(Relationship,
Gender,
Outcome)This simple example demonstrates how to generate the Sankey Diagram’s backdrop.
pl <- ggplot(df, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node)))
pl <- pl +geom_sankey(flow.alpha = 0.5, #This Creates the transparency of your node
node.color = "black", # This is your node color
show.legend = TRUE) # This determines if you want your legend to show
plThe first chart appears to be accurate. But we have no idea what chart we are looking at. Following that, we would provide some more information such as a title, sub title, and caption.
pl <- ggplot(df, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node)) # This Creates a label for each node
pl <- pl +geom_sankey(flow.alpha = 0.5, #This Creates the transparency of your node
node.color = "black", # This is your node color
show.legend = TRUE) # This determines if you want your legend to show
pl <- pl + geom_sankey_label(Size = 3,
color = "black",
fill = "white") # This specifies the Label format for each node
pl <- pl + theme_bw()
pl <- pl + theme(legend.position = 'none')
pl <- pl + theme(axis.title = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank())
pl <- pl + scale_fill_viridis_d(option = "inferno")
pl <- pl + labs(title = "Creating a Sankey Diagram")
pl <- pl + labs(subtitle = "Using a simplified ficticious data")
pl <- pl + labs(caption ="Opeyemi Omiwale" )
pl <- pl + labs(fill = 'Nodes')
plThe first and second charts appear excellent, but there isn’t much information on them, so we don’t know what we’re looking at. Then we’d add extra information to assist us comprehend exactly what we’re looking at.
To do this, we would first build a new data set and then group it by nodes. The frequency would then be calculated.
reagg <- df%>%
dplyr::group_by(node)%>% # Here we are grouping the data by node and then we are taking the frequency of it
tally()df2 <- merge(df,
reagg,
by.x = 'node',
by.y = 'node',
all.x = TRUE)reagg <- df%>%
dplyr::group_by(node)%>% # Here we are grouping the data by node and then we are taking the frequency of it
tally()
df2 <- merge(df,
reagg,
by.x = 'node',
by.y = 'node',
all.x = TRUE)
pl <- ggplot(df2, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = paste0(node, " = ", n))) # This Creates a label for each node
pl <- pl +geom_sankey(flow.alpha = 0.5, #This Creates the transparency of your node
node.color = "black", # This is your node color
show.legend = TRUE) # This determines if you want your legend to show
pl <- pl + geom_sankey_label(Size = 3,
color = "black",
fill = "white") # This specifies the Label format for each node
pl <- pl + theme_bw()
pl <- pl + theme(legend.position = 'none')
pl <- pl + theme(axis.title = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank())
pl <- pl + scale_fill_viridis_d(option = "inferno")
pl <- pl + labs(title = "Creating a Sankey Diagram")
pl <- pl + labs(subtitle = "Using a simplified ficticious data")
pl <- pl + labs(caption ="Opeyemi Omiwale" )
pl <- pl + labs(fill = 'Nodes')
plFor the most part, the colors are assigned to each node automatically. Now we’ll look at how to adjust the colors of each node.
pl <- pl + scale_fill_manual(values = c('Happy' = "aquamarine2",
'Not Happy ' = "pink",
'Single' = "green",
'Married' = "orange",
'Married with kids' = "blue",
'Married Without kids' = "chocolate4",
'Male' = "chartreuse1",
'Female' = "aquamarine4"))
plThe Sankey Diagram is extremely versatile; it is easily modifiable,
and it can depict complicated information. I hope this code walkthrough
was useful, and that you are now able to design far more sophisticated
Sankey diagrams.
Learn more about Creating more complex Sankey Diagram with the following:
Resource I: Check the R Graph Gallery
Resource II: Learn how to create complex Sankey Diagram with R
Resource III: A more In-depth code through for creating Sankey Diagram for beginners
This code through references and cites the following sources:
Yan Holtz. (2018). Source I. The R Graph Gallery
TechAnswers88 (2021). Source II. Sankey chart using your dataframe in GGPLOT
Yan Holtz. (2018). Source III. The Most basic Sankey Diagram