Original code:
First lets get the packages:
packages = c('igraph', 'tidygraph', 'ggraph', 'visNetwork', 'lubridate', 'tidyverse', 'ggplot2','deldir')
for(p in packages){library
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
Prepare the data available:
GAStech_nodes <- read_csv("GAStech_email_node.csv")
## Parsed with column specification:
## cols(
## id = col_double(),
## label = col_character(),
## Department = col_character(),
## Title = col_character()
## )
GAStech_edges <- read_csv("GAStech_email_edge-v2.csv")
## Parsed with column specification:
## cols(
## source = col_double(),
## target = col_double(),
## SentDate = col_character(),
## SentTime = col_time(format = ""),
## Subject = col_character(),
## MainSubject = col_character(),
## sourceLabel = col_character(),
## targetLabel = col_character()
## )
GAStech_edges$SentDate = dmy(GAStech_edges$SentDate)
GAStech_edges$Weekday = wday(GAStech_edges$SentDate, label = TRUE, abbr = FALSE)
GAStech_edges_aggregated <- GAStech_edges %>%
filter(MainSubject == "Work related") %>%
group_by(source, target, Weekday) %>%
summarise(Weight = n()) %>%
filter(source!=target) %>%
filter(Weight > 1) %>%
ungroup()
GAStech_edges_aggregated
## # A tibble: 1,456 x 4
## source target Weekday Weight
## <dbl> <dbl> <ord> <int>
## 1 1 2 Monday 4
## 2 1 2 Tuesday 3
## 3 1 2 Wednesday 5
## 4 1 2 Friday 8
## 5 1 3 Monday 4
## 6 1 3 Tuesday 3
## 7 1 3 Wednesday 5
## 8 1 3 Friday 8
## 9 1 4 Monday 4
## 10 1 4 Tuesday 3
## # … with 1,446 more rows
GAStech_graph <- tbl_graph(nodes = GAStech_nodes, edges = GAStech_edges_aggregated, directed = TRUE)
GAStech_graph %>%
activate(edges) %>%
arrange(desc(Weight))
## # A tbl_graph: 54 nodes and 1456 edges
## #
## # A directed multigraph with 1 component
## #
## # Edge Data: 1,456 x 4 (active)
## from to Weekday Weight
## <int> <int> <ord> <int>
## 1 40 41 Tuesday 23
## 2 40 43 Tuesday 19
## 3 41 43 Tuesday 15
## 4 41 40 Tuesday 14
## 5 42 41 Tuesday 13
## 6 42 40 Tuesday 12
## # … with 1,450 more rows
## #
## # Node Data: 54 x 4
## id label Department Title
## <dbl> <chr> <chr> <chr>
## 1 1 Mat.Bramar Administration Assistant to CEO
## 2 2 Anda.Ribera Administration Assistant to CFO
## 3 3 Rachel.Pantanal Administration Assistant to CIO
## # … with 51 more rows
This is the original graph
g <- GAStech_graph %>%
mutate(betweenness_centrality = centrality_betweenness()) %>%
mutate(closeness_centrality = centrality_closeness()) %>%
ggraph(layout = "nicely") +
geom_edge_link(aes()) +
geom_node_point(aes(colour = closeness_centrality, size=betweenness_centrality))
g + theme_graph()
Solution to Qns1: We do not need to mutate centrality_betweenness and centrality_closeness functions anymore. Using the new ggraph function, we can just include our already created GAStech_graph variable into the graph slot. Next we can simply add centrality_betweenness and centrality_closeness as color and size repectivley.
g <-
ggraph(GAStech_graph,layout = "nicely") +
geom_edge_link(aes()) +
geom_node_point(aes(colour = centrality_closeness(), size=centrality_betweenness()))
g + theme_graph()
First problem: It is clear problem that the black bold edge links between all the nodes make it hard to read the graph and understand the relationship between the nodes. You cannot see the Edges link (geom_edge_link) that connecting each node has the same weight and are too thick. Its not clear which direction the links are in or on which weekday did the transaction occured.
First Solution: Change the color of the links to represent the days of the transaction. The weight of the is the boldness of the links and thier colors are the weekdays the transaction occured.
Second Problem: The nodes are not labeled in any way. A reader would have no idea what each circle (node) would mean or what department where they in. The original monotone blue color did not sit well on the back edge links either.
Second Solution: Find ways to display more information about the nodes such as department, names or title of the person whose node it is. Find a color gradient that distinguish itself from the background colors (if there are any)
Third Problem: Current arrangment/layout of the nodes and edges is very messy. Its hard to develop insights from them.
Third Solution: Find better layout that can best display all the information.
g <-
ggraph(GAStech_graph, layout = "linear", circular = TRUE) +
geom_node_voronoi(aes(fill = Department),
max.radius = 0.1,
colour = 'white',alpha=0.5)+
geom_edge_arc(aes(alpha = Weight,
color = Weekday),
width = 0.5) +
geom_node_text(aes(label = label,size=70),
repel = TRUE)+
scale_color_gradient(low = "white",
high = "black")+
geom_node_point(aes(colour = centrality_closeness(),
size=centrality_betweenness()))+
labs(title="GAStech Network Graph",
colour="Closeness Centrality",
size="Betweenness Centrality")
g
Data preparation:
GAStech_edges_aggregated <- GAStech_edges %>%
left_join(GAStech_nodes, by = c("sourceLabel" = "label")) %>%
rename(from = id) %>%
left_join(GAStech_nodes, by = c("targetLabel" = "label")) %>%
rename(to = id) %>%
filter(MainSubject == "Work related") %>%
group_by(from, to) %>%
summarise(weight = n()) %>%
filter(from!=to) %>%
filter(weight > 1) %>%
ungroup()
GAStech_nodes <- GAStech_nodes %>%
rename(group = Department)
A.When a name is selected from the drop-down list,the corresponding node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will also be labelled too.
B.When a node of the interactive graph is selected, the node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will be labelled as well.
Original code:
visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)
New Code: Based on the requirments, we have added two new important modules: visOptions that allows the selected node and adjacent nodes to be highlighted. reference: https://www.rdocumentation.org/packages/visNetwork/versions/2.0.8/topics/visOptions
visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = list(enabled = TRUE,
labelOnly = FALSE,
algorithm = "hierarchical"),
nodesIdSelection = TRUE,
autoResize = TRUE,
collapse = TRUE) %>%
visInteraction(hover = TRUE)%>%
visNodes(mass = 50,
font = list(size = 30))
First Problem: All the nodes and the edges are color coded but there is no legend to understand what colors mean.
First Solution: Create a legend to show the groups/departments of the employess.
Second Problem: While there is a dropdown and select filter for individual employees. There is none for the departments. If a user wants to understand the relationships within a department, they are unable to.
Second Solution: A departments dropdown filter.
Third Problem: Nodes are labbled but cannot be easily read. The labels are outside the circle and hidden if the graph is zommed out to a certain extent. They tend to clump up together when zoomed out too.
Third solution: Change the node shape from a circle to a box and put the label inside. Try to make the labels not overlap and show when zoommed out to a certain extent.
Extras: Edges links now have arrows to understand the to and from between the nodes.
reference: https://www.rdocumentation.org/packages/visNetwork/versions/2.0.8/topics/visNetwork
g<-visNetwork(GAStech_nodes, GAStech_edges_aggregated,main = "GAStech Network Graph") %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = list(enabled = TRUE,
labelOnly = FALSE,
algorithm = "hierarchical"),
autoResize = TRUE,
collapse = TRUE,
nodesIdSelection = list(enabled = TRUE,
values = unique(GAStech_nodes$id)),
selectedBy = list(variable="group")) %>%
visNodes(shape = "box" ) %>%
visEdges(arrows = "to") %>%
visInteraction(hover = TRUE)%>%
visEdges(smooth = FALSE) %>%
visPhysics(stabilization = FALSE) %>%
visNodes(mass = 50,
font = list(size = 30))
visLegend(g)