0.libraries

library(visNetwork)
library(ggraph)

## Loading required package: ggplot2

library(networkD3)
library(tidyverse)

## ── Attaching packages ──────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ tibble  2.1.3     ✔ purrr   0.3.2
## ✔ tidyr   1.0.0     ✔ dplyr   0.8.3
## ✔ readr   1.3.1     ✔ stringr 1.4.0
## ✔ tibble  2.1.3     ✔ forcats 0.4.0

## ── Conflicts ─────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

1.Introduction

quote: “If you can’t find what you want, then write it and add it to the stack”

I was searching for a lucid instruction to visualize a matrix in a network format. The first step was searching for the packages, specially one based on ggplot, and I found several. The question became what to choose?

In order to choose one, I had to knew the capabilites of each, regarding my need. Ideally, I would like to had an interactive network, with possibility to use size and color of the nodes, and thickness and transparency of directed edges as visual channels.

The following is the result of my efforts to work with three packages: 1. visNetwork 2. NetworkD3 3. Ggraph

Publication of this report in my rpubs would help to keep these efforts documented safely, and may help other people in the same situation to pick their proper tool more quickly.

2.About the DataSet

The goal and result of this report is independent from dataset. So any dataset that can be presented as a network, can be used with the presented R code. Therefore, you can safely skip the rest of this section, after the next paragraph.

The data here is about 47 Japan prefectures. Prefectures, similar to states or provinces, can be seen as systems with inputs and outputs. Efficiency of a system is evaluated by considering outputs over inputs, i.e. how good the system is in converting inputs to outputs? This dataset is about such evaluation. Each prefecture has evaluated all prefectures, including itself, and assigned an efficiency score of 0 to 100 to each of them. For instance, from the Point of view of Tokyo, Kanagawa is 50% efficient, while Aichi is 100% efficient. From the point of view of Kanagawa, Tokyo is 80% efficient, and Aichi is 100% efficient. Thus, the network edges should be “directed”. For further reading about this efficiency analysis method, one can check its wikipedia [https://en.wikipedia.org/wiki/Data_envelopment_analysis] or my own working paper here [https://www.researchgate.net/publication/337261828_Visualization_of_Cross-Efficiency_Matrix_Using_Multidimensional_Unfolding].

3.Dataset format

nodes <- read_csv("/home/shaahin/Downloads/nodes.csv")

## Parsed with column specification:
## cols(
##   id = col_double(),
##   label = col_character(),
##   value = col_double(),
##   title = col_character(),
##   self_eff = col_double(),
##   binary_self_eff = col_double()
## )

edges <- read_csv("/home/shaahin/Downloads/edges.csv")

## Parsed with column specification:
## cols(
##   from = col_double(),
##   to = col_double(),
##   value = col_double()
## )

cem_df <- read_csv("/home/shaahin/Downloads/cem_df.csv")

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   DMU = col_character()
## )

## See spec(...) for full column specifications.

long_cem <- read_csv("/home/shaahin/Downloads/long_cem.csv")

## Parsed with column specification:
## cols(
##   DMU = col_character(),
##   to = col_character(),
##   cross_efficiency = col_double()
## )

chosen_edges <- edges %>% filter(value == 1) 


head(nodes)

## # A tibble: 6 x 6
##      id label    value title                       self_eff binary_self_eff
##   <dbl> <chr>    <dbl> <chr>                          <dbl>           <dbl>
## 1     1 Hokkaido 0.678 <p>Hokkaido<br>0.677531456…    1.                  1
## 2     2 Aomori   0.634 <p>Aomori<br>0.63432961135…    0.786               0
## 3     3 Iwate    0.712 <p>Iwate<br>0.712421963333…    0.913               0
## 4     4 Miyagi   0.596 <p>Miyagi<br>0.59641994702…    0.732               0
## 5     5 Akita    0.611 <p>Akita<br>0.610503654212…    0.783               0
## 6     6 Yamagata 0.769 <p>Yamagata<br>0.768798203…    0.933               0

head(edges)

## # A tibble: 6 x 3
##    from    to value
##   <dbl> <dbl> <dbl>
## 1     1     1 1    
## 2     2     1 1    
## 3     3     1 1    
## 4     4     1 1    
## 5     5     1 0.822
## 6     6     1 0.785

4.visNework

feeding in the nodes dataset and the chosen edges, since the graph is by default a full network, and visualization of a full network with more than a few nodes, would be useless, as the nodes would be floating in the sea of edges. It would be purpose-defeating as nothing can be discovered in such plot.

So by default, the visNetwork() takes the value column of nodes, as the size of the nodes. Here the value is average cross-efficiency.

visNetwork(nodes = nodes,edges =  chosen_edges )  %>%
        visEdges(arrows = 'to' , smooth = TRUE) %>% 
        visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)

While it is very cool to have an interactive map, I could not figure out how to control the transparency of the of the edges, and for publication the interactive plot loses all its magic.

Let’s have a look at the same graph where the size of the nodes are proportional to the incoming links.

## node size as a function of input arrows 
excellence_df <- chosen_edges %>% count(to)

excellence_nodes <- nodes %>%
        left_join(excellence_df, by = c("id"="to")) %>%
        mutate(avg_creff = value , value = n) %>%
        mutate(value = ifelse(is.na(value),0,value)) %>% 
        mutate(title = paste0("<p>", cem_df$DMU,"<br>", value, "</p>"))

visNetwork( excellence_nodes , chosen_edges )  %>%
        visEdges(arrows = 'to' , smooth = TRUE) %>% 
        visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)

Amazing interactive graph, with many points to highlight such as the important nodes, Aichi, Tokyo, … , as well as Kanagawa, a small node with self-link.

Still not I can rely on, due to lack of control on transparency of the links and losing the interactivity magic in static version.

5.Network3D

n3d_edges <- data.frame(from = rep(1:nrow(cem_df),nrow(cem_df)),
                   to = rep(1:nrow(cem_df), each = nrow(cem_df) ),
                   value = round(long_cem$cross_efficiency,6))


head(n3d_edges)

##   from to    value
## 1    1  1 1.000000
## 2    2  1 1.000000
## 3    3  1 1.000000
## 4    4  1 1.000000
## 5    5  1 0.821718
## 6    6  1 0.784529

n3d_chosen_edges <- edges %>% filter(value == 1) 

n3d_excellence_df <- chosen_edges %>% count(to)

n3d_excellence_nodes <- nodes %>%
        left_join(excellence_df, by = c("id"="to")) %>%
        mutate(avg_creff = value , size = n) %>%
        mutate(value = ifelse(is.na(value),0,value)) %>% 
        mutate(title = paste0("<p>", cem_df$DMU,"<br>", value, "</p>")) %>% 
        mutate(group = "1")

forceNetwork(Links = n3d_chosen_edges %>% mutate(from = from - 1 , to = to -1 ),
              Nodes = n3d_excellence_nodes,
             Source = "from", Target = "to",
             Value = "value", NodeID = "label",
             Group = "group", Nodesize = "size" ,
             fontSize = 20,opacity = 0.8, arrows = TRUE, opacityNoHover = 0.1)

## Links is a tbl_df. Converting to a plain data frame.

## Nodes is a tbl_df. Converting to a plain data frame.

The outcome of Network3D is beautiful, the interactivity to almost hide not-selected nodes and edges is useful, and in general it is revealing. I could not get the texts the way that I wanted, and it seemed that the transparency of edges and nodes are controlled by one single argument.

Thus, I ventured on to try ggraph.

6.Ggraph

Here I present several perspectives to the network visualization of prefectures data. I played with different variables to be mapped into different visual channels.

In the first plot, the size of nodes is proportional to average cross-efficiency, the color is a binary choice of simple efficiency; whether it is perfect or not.

cem_graph_df <- chosen_edges

cem_graph <-igraph::graph_from_data_frame(cem_graph_df , directed = TRUE ,
                                          vertices = nodes  )

ggraph(cem_graph) + 
        geom_edge_link(arrow = arrow(length = unit(4,"mm")),
                       end_cap = circle(3,"mm"),
                       start_cap = circle(3,"mm") , 
                       alpha = 0.5) + 
        geom_node_point(
                        mapping =  aes(colour = factor(binary_self_eff) ,
                                       size = value))

## Using `stress` as default layout

In the second perspective, the number of incomming endges is mapped to size of the nodes.

# perspective 2 
nodes_with_link_count <- nodes %>%
        left_join(chosen_edges %>% count(to) , by = c("id"="to")) %>% 
        ungroup() %>% 
        mutate(incoming_links = ifelse(test = is.na(n),yes = 0,no = n ))

cem_graph_df <- chosen_edges

cem_graph <-igraph::graph_from_data_frame(cem_graph_df , directed = TRUE ,
                                          vertices = nodes_with_link_count  )

ggraph(cem_graph) + 
        geom_edge_link(arrow = arrow(length = unit(4,"mm")),
                       end_cap = circle(3,"mm"),
                       start_cap = circle(3,"mm") , 
                       alpha = 0.5) + 
        geom_node_point(
                mapping =  aes(colour = factor(binary_self_eff) ,
                               size = incoming_links))

## Using `stress` as default layout

In the third pespectives, being content with the number of incoming edges as node size, I added the labels on the map, and their size is corresponding to the incoming edges, and their transparency to the simple efficiency.

# perspective 3 
nodes_with_link_count <- nodes %>%
        left_join(chosen_edges %>% count(to) , by = c("id"="to")) %>% 
        ungroup() %>% 
        mutate(incoming_links = ifelse(test = is.na(n),yes = 0,no = n ))

cem_graph_df <- chosen_edges

cem_graph <-igraph::graph_from_data_frame(cem_graph_df , directed = TRUE ,
                                          vertices = nodes_with_link_count  )

ggraph(cem_graph) + 
        geom_edge_link(arrow = arrow(length = unit(4,"mm")),
                       end_cap = circle(3,"mm"),
                       start_cap = circle(3,"mm") , 
                       alpha = 0.5) + 
        geom_node_point(
                mapping =  aes(colour = factor(binary_self_eff) ,
                               size = incoming_links)) + 
        geom_node_text(aes(label = label , alpha = binary_self_eff , size = incoming_links))

## Using `stress` as default layout

Forth perspective is a failed effort to use curved edges.

# perspective 4 
nodes_with_link_count <- nodes %>%
        left_join(chosen_edges %>% count(to) , by = c("id"="to")) %>% 
        ungroup() %>% 
        mutate(incoming_links = ifelse(test = is.na(n),yes = 0,no = n ))

cem_graph_df <- chosen_edges

cem_graph <-igraph::graph_from_data_frame(cem_graph_df , directed = TRUE ,
                                          vertices = nodes_with_link_count  )

ggraph(cem_graph) + 
        geom_edge_arc(arrow = arrow(length = unit(4,"mm")),
                       end_cap = circle(3,"mm"),
                       start_cap = circle(3,"mm") , 
                       alpha = 0.5) + 
        geom_node_point(
                mapping =  aes(colour = factor(binary_self_eff) ,
                               size = incoming_links)) + 
        geom_node_text(aes(label = label , alpha = binary_self_eff , size = incoming_links))

## Using `stress` as default layout

Fifth perspective is about changing the layout from stress to dh. The result is very satisfying. Also the shape of nodes is a function of simple efficiency.

# perspective 5 
nodes_with_link_count <- nodes %>%
        left_join(chosen_edges %>% count(to) , by = c("id"="to")) %>% 
        ungroup() %>% 
        mutate(incoming_links = ifelse(test = is.na(n),yes = 0,no = n ))

cem_graph_df <- chosen_edges

cem_graph <-igraph::graph_from_data_frame(cem_graph_df , directed = TRUE ,
                                          vertices = nodes_with_link_count  )


set.seed(7)
ggraph(cem_graph, layout = "dh") + 
        geom_edge_link(arrow = arrow(length = unit(4,"mm")),
                       end_cap = circle(3,"mm"),
                       start_cap = circle(3,"mm") , 
                       alpha = 0.3) + 
        geom_node_point(
                mapping =  aes(colour = factor(binary_self_eff) ,
                               size = incoming_links, 
                               shape = factor(binary_self_eff))) + 
        geom_node_text(aes(label = label ,
                           alpha = binary_self_eff ,
                           size = incoming_links), 
                       color = "darkblue") + theme(legend.position = "none")

and at last, in the 6th perspective

# perspective 6 
nodes_with_link_count <- nodes %>%
        left_join(chosen_edges %>% count(to) , by = c("id"="to")) %>% 
        ungroup() %>% 
        mutate(incoming_links = ifelse(test = is.na(n),yes = 0,no = n )) %>% 
    mutate(simple_efficiency = ifelse(binary_self_eff==1,
                                      "efficient","inefficient")) %>% 
    mutate(simple_efficiency = factor(simple_efficiency)) %>% 
    plyr::rename(replace = c("self_eff"="simple_efficiency_score"   ))

cem_graph_df <- chosen_edges

cem_graph <-igraph::graph_from_data_frame(cem_graph_df , directed = TRUE ,
                                          vertices = nodes_with_link_count  )


set.seed(7)
cem_network <- 
ggraph(cem_graph, layout = "dh") + 
        geom_edge_link(arrow = arrow(length = unit(4,"mm")),
                       end_cap = circle(3,"mm"),
                       start_cap = circle(3,"mm") , 
                       alpha = 0.3) + 
        geom_node_point(
                mapping =  aes(colour =simple_efficiency ,
                               size = incoming_links, 
                               shape = simple_efficiency)) + 
        geom_node_text(aes(label = label ,
                           alpha = value ,
                           size = incoming_links), 
                       color = "darkblue") +
        theme(panel.background = element_rect(fill=NA),
              panel.border = element_rect(fill = NA) ) + 
    scale_shape_manual(values = c(19,1)) + 
    scale_color_manual(values = c("green","red"))+
    guides(alpha = FALSE)

cem_network

#ggsave(cem_network,filename = "cem_network.png", device = "png" , dpi = 320)

I removed the background, kept the frame, added a legend for colors and shapes.

Visualization of Cross-Efficiency Matrix as a Network

Shahin Ashkiani

11/29/2019