With reference to the organisation network graph in Section 6.1 of Hands-on Exercise 10, you are required to complete the following tasks:
Before we are able to start on the task, we have to first launch the packages and prepare the datasets. The following data preparation step is the same as Hands-on Exercise 10: Visualising and Analysing Network Data with R.
The following packages has been be downloaded and launch to create teh visualisations required for this Dataviz Makeover.
packages = c('igraph', 'tidygraph', 'ggraph', 'visNetwork', 'lubridate', 'tidyverse')
for(p in packages){library
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
GAStech_email_node.csv and GAStech_email_edge-v2.csv will be used for this assignment
GAStech_nodes <- read_csv("data/GAStech_email_node.csv")
## Parsed with column specification:
## cols(
## id = col_double(),
## label = col_character(),
## Department = col_character(),
## Title = col_character()
## )
GAStech_edges <- read_csv("data/GAStech_email_edge-v2.csv")
## Parsed with column specification:
## cols(
## source = col_double(),
## target = col_double(),
## SentDate = col_character(),
## SentTime = col_time(format = ""),
## Subject = col_character(),
## MainSubject = col_character(),
## sourceLabel = col_character(),
## targetLabel = col_character()
## )
As the SentDate attributes is being treated as “chr” data type, the following code changes the data type to a Date data type.
GAStech_edges$SentDate = dmy(GAStech_edges$SentDate)
GAStech_edges$Weekday = wday(GAStech_edges$SentDate, label = TRUE, abbr = FALSE)
Aggregation of data is done by grouping the the datasets by the dates, senders, receivers, main subject which is work related and the day of the week. A new field, Weight, showing the number of records has been added.
GAStech_edges_aggregated <- GAStech_edges %>%
filter(MainSubject == "Work related") %>%
group_by(source, target, Weekday) %>%
summarise(Weight = n()) %>%
filter(source!=target) %>%
filter(Weight > 1) %>%
ungroup()
GAStech_edges_aggregated
## # A tibble: 1,456 x 4
## source target Weekday Weight
## <dbl> <dbl> <ord> <int>
## 1 1 2 Monday 4
## 2 1 2 Tuesday 3
## 3 1 2 Wednesday 5
## 4 1 2 Friday 8
## 5 1 3 Monday 4
## 6 1 3 Tuesday 3
## 7 1 3 Wednesday 5
## 8 1 3 Friday 8
## 9 1 4 Monday 4
## 10 1 4 Tuesday 3
## # ... with 1,446 more rows
To create a network graph data.frame, tbl_graph function has been used to create a network object from nodes and edge data. The activate function allows the nodes tibble to be active and allow the nodes to be arrange with the highest weight first with the arrange function.
GAStech_graph <- tbl_graph(nodes = GAStech_nodes, edges = GAStech_edges_aggregated, directed = TRUE)
GAStech_graph %>%
activate(edges) %>%
arrange(desc(Weight))
## # A tbl_graph: 54 nodes and 1456 edges
## #
## # A directed multigraph with 1 component
## #
## # Edge Data: 1,456 x 4 (active)
## from to Weekday Weight
## <int> <int> <ord> <int>
## 1 40 41 Tuesday 23
## 2 40 43 Tuesday 19
## 3 41 43 Tuesday 15
## 4 41 40 Tuesday 14
## 5 42 41 Tuesday 13
## 6 42 40 Tuesday 12
## # ... with 1,450 more rows
## #
## # Node Data: 54 x 4
## id label Department Title
## <dbl> <chr> <chr> <chr>
## 1 1 Mat.Bramar Administration Assistant to CEO
## 2 2 Anda.Ribera Administration Assistant to CFO
## 3 3 Rachel.Pantanal Administration Assistant to CIO
## # ... with 51 more rows
GAStech_graph
## # A tbl_graph: 54 nodes and 1456 edges
## #
## # A directed multigraph with 1 component
## #
## # Node Data: 54 x 4 (active)
## id label Department Title
## <dbl> <chr> <chr> <chr>
## 1 1 Mat.Bramar Administrati~ Assistant to CEO
## 2 2 Anda.Ribera Administrati~ Assistant to CFO
## 3 3 Rachel.Pantanal Administrati~ Assistant to CIO
## 4 4 Linda.Lagos Administrati~ Assistant to COO
## 5 5 Ruscella.Mies.Hab~ Administrati~ Assistant to Engineering Group Ma~
## 6 6 Carla.Forluniau Administrati~ Assistant to IT Group Manager
## # ... with 48 more rows
## #
## # Edge Data: 1,456 x 4
## from to Weekday Weight
## <int> <int> <ord> <int>
## 1 1 2 Monday 4
## 2 1 2 Tuesday 3
## 3 1 2 Wednesday 5
## # ... with 1,453 more rows
After these steps, we are able to move on to create the appropriate network graph for the task given at the start of the document.
The orginal code provided in the exercise uses function available in the previous version of ggraph, which utilised the igraph and dendrogram. In instead of tidygraph which are used in the latest version of ggraph, ggraph 2.0
The original code provided in the exercise are as follow:
g <- GAStech_graph %>%
mutate(betweenness_centrality = centrality_betweenness()) %>%
mutate(closeness_centrality = centrality_closeness()) %>%
ggraph(layout = "nicely") +
geom_edge_link(aes()) +
geom_node_point(aes(colour = closeness_centrality, size=betweenness_centrality))
g + theme_graph()
In ggraph 2.0, internals have been rewritten to only be based on tidygraph. Using ggraph 2.0, instead of having to create new variables betweenness_centrality (using tidygraph function centrality_betweenness) and closeness_centrality using (using tidygraph function centrality_closeness), tidygraph function can directly be used as an input to the layout specs and aesthetic mappings.
The improved code using ggraph ver 2.0:
g <-
ggraph(GAStech_graph, layout = "nicely",) +
geom_edge_link(aes()) +
geom_node_point(aes(colour = centrality_closeness(), size=centrality_betweenness()))
g + theme_graph()
Colour of the node and edge The colour of the node is too dark, As a result, if the centrality_closeness is low, the colour of the node, together with the black edge, makes it hard to spot the node from the graph.
Layout of the network diagram for a clearer view As the graph has many node edge relationship, there are many lines connecting different nodes, and hence it is very challenging to for user to see the graph clearly.
Lack of title to give user a context of what the graph is about and labelling of the nodes
Alternative Design:
g <-
ggraph(GAStech_graph, layout = "nicely") +
geom_edge_link(edge_colour = "gray77", alpha = 0.5) +
geom_node_point(aes(colour = centrality_closeness(), size=centrality_betweenness())) +
theme_graph() + scale_colour_gradient(low = "#00008B", high = "#63B8FF") +
geom_node_text(aes(label = label,size=25), repel = TRUE) +
ggtitle("GAStech - centrality indices")
g
With reference to the organisation network graph in Section 7.4 of Hands-on Exercise 10, you are required to complete the following tasks:
Similar to task 1, there is also a need to prepare the data so that it can be used on visNetwork function.
The following has been provided in the exercise:
GAStech_edges_aggregated <- GAStech_edges %>%
left_join(GAStech_nodes, by = c("sourceLabel" = "label")) %>%
rename(from = id) %>%
left_join(GAStech_nodes, by = c("targetLabel" = "label")) %>%
rename(to = id) %>%
filter(MainSubject == "Work related") %>%
group_by(from, to) %>%
summarise(weight = n()) %>%
filter(from!=to) %>%
filter(weight > 1) %>%
ungroup()
GAStech_nodes <- GAStech_nodes %>%
rename(group = Department)
Once the data has been prepared, we can move on to improve the interactive chart based on the requirements stated in task 2.
The 2 requirements to improve the design of the graphs are: 1. When a name is selected from the drop-down list, the corresponding node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will also be labelled too. 2. When a node of the interactive graph is selected, the node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will be labelled as well.
To meet the requirement, the code has been modified.
visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = list(enabled = TRUE,degree = 1, labelOnly = FALSE, hover = FALSE), nodesIdSelection = TRUE)
In the above interactive network chart, when clicked on or selected via the dropdown list, only the label of corressponding node and linked nodes will appear and be highlighted. Due to the size of the graph, the graph needs to be zoomed in to be able to view the label of the nodes.
The label cannot be seen clearly due to the edges.
There is no legend to indicate what the colour represent in the network graph Although the nodes have different colour based on the department they are in, there are no legend to specify which colour indicates which department.
The edges has no indication of whether it is an in-degree or out-degree relationship between the nodes. As the purpose of the graph is to show the flow and direction of emails sent between different personel, it is more appropriate to show the direction of the email, specifying who is the sender and receiver.
In the graph, the following improvement has been made: 1. box nodes have been used to show the label inside the node. This helps to improve the visibility of the label.
Legend has been added to show the colour and its respective department
Direction of the email has been included in the edges. Arrows has been specify to “to” whereby the sender of the email will have a edge arrowed to the receiver (Sender -> Receiver)
g<- visNetwork(GAStech_nodes, GAStech_edges_aggregated, main = "GAStech email network graph", width = "100%") %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visNodes(shape = "box") %>%
visEdges(arrows = 'to') %>%
visOptions(highlightNearest = list(enabled = TRUE,degree = 1, labelOnly = FALSE, hover = FALSE),nodesIdSelection = TRUE,)
visLegend(g)