https://www.youtube.com/c/TechAnswers88
youtube video link with explanations for these examples https://youtu.be/XRu_Nb8hfIA
Easiest way to create Sankey diagram from your own data in ggplot.
Use the ggsankey package and create your own Data driven Sankey chart. Full customisation is available as the plot is a ggplot object and you can control the look and feel as you want it.
When you have to create a sankey diagram to use in your publications, MS Word document or a PowerPoint document, then this is the most practical and easy approach to use.
Create data labels with numbers and percentages at each node.
Just define the columns which you want to use, customise the colours using the themes and then save it as an image file on your desktop.
This example uses the ggsankey package which I think is a great package and does it job effectively and easily.
As this package is not in CRAN that means you would have to install it from the author’s github.
Install the remotes package firstt. Then use the install_github command to install the package.
#install.packages("remotes")
#remotes::install_github("davidsjoberg/ggsankey")
library(ggsankey)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.3
Note that this data is not aggregated. Each record is for a patient and there is no need to do any aggregation, counting or summing up..
#'How many pizzas you eat in a month'
d <- data.frame(Question = c('How many pizzas'
,'How many pizzas'
,'How many pizzas')
, Answer = c('1 Pizza','2 Pizzas','3 Pizzas')
, Responses = c(200,300,400))
d
## Question Answer Responses
## 1 How many pizzas 1 Pizza 200
## 2 How many pizzas 2 Pizzas 300
## 3 How many pizzas 3 Pizzas 400
All you need to do is the make_long command ( from the ggsankey package). Provide the data columns which you would like to see in your sankey chart
# Step 1
df <- d %>%
make_long(Question,Answer,Responses)
df
FALSE # A tibble: 9 x 4
FALSE x node next_x next_node
FALSE <fct> <chr> <fct> <chr>
FALSE 1 Question How many pizzas Answer 1 Pizza
FALSE 2 Answer 1 Pizza Responses 200
FALSE 3 Responses 200 <NA> <NA>
FALSE 4 Question How many pizzas Answer 2 Pizzas
FALSE 5 Answer 2 Pizzas Responses 300
FALSE 6 Responses 300 <NA> <NA>
FALSE 7 Question How many pizzas Answer 3 Pizzas
FALSE 8 Answer 3 Pizzas Responses 400
FALSE 9 Responses 400 <NA> <NA>
pl <- ggplot(df, aes(x = x
, next_x = next_x
, node = node
, next_node = next_node
, fill = factor(node)
, label = node)
)
pl <- pl +geom_sankey(flow.alpha = 0.5
, node.color = "black"
,show.legend = FALSE)
pl <- pl +geom_sankey_label(size = 3, color = "black", fill= "white", hjust = -0.5)
pl <- pl + theme_bw()
pl <- pl + theme(legend.position = "none")
pl <- pl + theme(axis.title = element_blank()
, axis.text.y = element_blank()
, axis.ticks = element_blank()
, panel.grid = element_blank())
pl <- pl + scale_fill_viridis_d(option = "inferno")
pl <- pl + labs(title = "Sankey diagram using ggplot")
pl <- pl + labs(subtitle = "Showing the responses to a multiple choice question")
pl <- pl + labs(caption = "@techanswers88")
pl <- pl + labs(fill = 'Nodes')
pl