With reference to the organisation network graph in Section 6.1 of Hands-on Exercise 10, you are required to complete the following tasks:
Install packages and Data Wrangling
knitr::opts_chunk$set(echo = TRUE)
packages = c('igraph', 'tidygraph', 'ggraph', 'visNetwork', 'lubridate', 'tidyverse')
for(p in packages){library
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
GAStech_nodes <- read_csv("data/GAStech_email_node.csv")
GAStech_edges <- read_csv("data/GAStech_email_edge-v2.csv")
GAStech_edges$SentDate = dmy(GAStech_edges$SentDate)
GAStech_edges$Weekday = wday(GAStech_edges$SentDate, label = TRUE, abbr = FALSE)
GAStech_edges_aggregated <- GAStech_edges %>%
filter(MainSubject == "Work related") %>%
group_by(source, target, Weekday) %>%
summarise(Weight = n()) %>%
filter(source!=target) %>%
filter(Weight > 1) %>%
ungroup()
GAStech_edges_aggregated
## # A tibble: 1,456 x 4
## source target Weekday Weight
## <dbl> <dbl> <ord> <int>
## 1 1 2 Monday 4
## 2 1 2 Tuesday 3
## 3 1 2 Wednesday 5
## 4 1 2 Friday 8
## 5 1 3 Monday 4
## 6 1 3 Tuesday 3
## 7 1 3 Wednesday 5
## 8 1 3 Friday 8
## 9 1 4 Monday 4
## 10 1 4 Tuesday 3
## # ... with 1,446 more rows
Creating network objects using tidygraph
GAStech_graph <- tbl_graph(nodes = GAStech_nodes, edges = GAStech_edges_aggregated, directed = TRUE)
GAStech_graph %>%
activate(edges) %>%
arrange(desc(Weight))
## # A tbl_graph: 54 nodes and 1456 edges
## #
## # A directed multigraph with 1 component
## #
## # Edge Data: 1,456 x 4 (active)
## from to Weekday Weight
## <int> <int> <ord> <int>
## 1 40 41 Tuesday 23
## 2 40 43 Tuesday 19
## 3 41 43 Tuesday 15
## 4 41 40 Tuesday 14
## 5 42 41 Tuesday 13
## 6 42 40 Tuesday 12
## # ... with 1,450 more rows
## #
## # Node Data: 54 x 4
## id label Department Title
## <dbl> <chr> <chr> <chr>
## 1 1 Mat.Bramar Administration Assistant to CEO
## 2 2 Anda.Ribera Administration Assistant to CFO
## 3 3 Rachel.Pantanal Administration Assistant to CIO
## # ... with 51 more rows
g <- GAStech_graph %>%
mutate(betweenness_centrality = centrality_betweenness()) %>%
mutate(closeness_centrality = centrality_closeness()) %>%
ggraph(layout = "nicely") +
geom_edge_link(aes()) +
geom_node_point(aes(colour = closeness_centrality, size=betweenness_centrality))
g + theme_graph()
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font
## family not found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
Identify 3 aspects of the graph visualisation that can be improved.
Overall, it is difficult to gain insights from the chart since it lacks clarity. It is messy and hard to analyse.
Using size to represent the betweenness centrality is redundant since there are too many distracting lines and users have to refer to the lengend to find out which range it belongs to.
The nodes overlap one another. It is unclear as to what each node represents as the employee name is not indicated.
The direction of the edges are not indicated. This will allow us to better understand the relationship between each person.
The chart lacks important information like the department name. Even if a node has a high closeness centrality, we don’t know which department it is referring to and it is not useful.
I have incorporated improvement #1 to #3 into the chart below.
The colour of the edges is grey instead of black. The nodes no longer overlap one another. The direction of the edges are also indicated but it is not very visible. Most importantly, the name of the employee can be seen clearly and this helps the user to interpret the chart easily.
GAStech_nodes <- read_csv("data/GAStech_email_node.csv")
## Parsed with column specification:
## cols(
## id = col_double(),
## label = col_character(),
## Department = col_character(),
## Title = col_character()
## )
GAStech_edges <- read_csv("data/GAStech_email_edge-v2.csv")
## Parsed with column specification:
## cols(
## source = col_double(),
## target = col_double(),
## SentDate = col_character(),
## SentTime = col_time(format = ""),
## Subject = col_character(),
## MainSubject = col_character(),
## sourceLabel = col_character(),
## targetLabel = col_character()
## )
GAStech_edges$SentDate = dmy(GAStech_edges$SentDate)
GAStech_edges$Weekday = wday(GAStech_edges$SentDate, label = TRUE, abbr = FALSE)
GAStech_edges_aggregated <- GAStech_edges %>%
filter(MainSubject == "Work related") %>%
group_by(source, target, Weekday) %>%
summarise(Weight = n()) %>%
filter(source!=target) %>%
filter(Weight > 1) %>%
ungroup()
GAStech_graph <- tbl_graph(nodes = GAStech_nodes, edges = GAStech_edges_aggregated, directed = TRUE)
GAStech_graph %>%
activate(edges) %>%
arrange(desc(Weight))
## # A tbl_graph: 54 nodes and 1456 edges
## #
## # A directed multigraph with 1 component
## #
## # Edge Data: 1,456 x 4 (active)
## from to Weekday Weight
## <int> <int> <ord> <int>
## 1 40 41 Tuesday 23
## 2 40 43 Tuesday 19
## 3 41 43 Tuesday 15
## 4 41 40 Tuesday 14
## 5 42 41 Tuesday 13
## 6 42 40 Tuesday 12
## # ... with 1,450 more rows
## #
## # Node Data: 54 x 4
## id label Department Title
## <dbl> <chr> <chr> <chr>
## 1 1 Mat.Bramar Administration Assistant to CEO
## 2 2 Anda.Ribera Administration Assistant to CFO
## 3 3 Rachel.Pantanal Administration Assistant to CIO
## # ... with 51 more rows
library(viridis)
## Warning: package 'viridis' was built under R version 3.5.3
## Loading required package: viridisLite
## Warning: package 'viridisLite' was built under R version 3.5.3
g <- GAStech_graph %>%
mutate(betweenness_centrality = centrality_betweenness()) %>%
mutate(closeness_centrality = centrality_closeness())%>%
ggraph(layout = "fr")+
geom_edge_link(colour="grey",arrow = arrow(length = unit(6, "pt"))) +
geom_node_point(aes(size=centrality_betweenness(),colour=centrality_closeness())) +
geom_node_text(aes(label=label), nudge_x = 0.03,repel=TRUE, size =2.5,colour="black")+
scale_color_gradient(name = "Centrality Closeness")+
scale_color_viridis(direction = -1, option = "D","Centrality Closeness")+
ggtitle("Network Visualization of Gas Tech")+
theme(plot.title = element_text(hjust = 0.5))+
labs(colour = "Centrality Closeness",size = "Centrality Betweenness")
## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.
g
With reference to the organisation network graph in Section 7.4 of Hands-on Exercise 10, you are required to complete the following tasks:
o When a name is selected from the drop-down list, the corresponding node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will also be labelled too. o When a node of the interactive graph is selected, the node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will be labelled as well.
GAStech_edges_aggregated <- GAStech_edges %>%
left_join(GAStech_nodes, by = c("sourceLabel" = "label")) %>%
rename(from = id) %>%
left_join(GAStech_nodes, by = c("targetLabel" = "label")) %>%
rename(to = id) %>%
filter(MainSubject == "Work related") %>%
group_by(from, to) %>%
summarise(weight = n()) %>%
filter(from!=to) %>%
filter(weight > 1) %>%
ungroup()
GAStech_nodes <- GAStech_nodes %>%
rename(group = Department)
visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)
When a name is selected from the drop-down list, the corresponding node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will also be labelled too. When a node of the interactive graph is selected, the node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will be labelled as well.
visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = list(enabled = TRUE,degree = 1, labelOnly = F, hover = F),nodesIdSelection = TRUE)
Overall, the chart lacks clarity due to the overlapping names and lines. The areas of improvement is listed below:
It is difficult to identify the names as they overlap one another. Even if the user were to zoom into the chart, it is still hard and messy for the user to tell the name of each node due to the lines.
In order to view the employee name, user has to zoom into the chart.
There is no legend or indication to show which department an employee belongs to. The user won’t know what each colour represents.
There are no arrows to indicate the direction of the edge.
There is no chart title.
Improvements made to the graph:
User can filter by group (department) and this will show the interactions within and outside of the respective departments.‘Select By Id’ is renamed to ‘Select Name’.
A title is added to the top to complement the chart and provide the user with an idea of what the chart is about.
Navigation buttons are added to the bottom for easier viewing and facilitate interaction.
When the user select a name in the drop-down list or select the node in the chart directly, the arrows will indicate the direction and the colour will vary by department. Nodes or or lines that are not connected to the node will be isolated to increase clarity. It will also allow the user to focus on nodes that are connected to the particular employee.
The shape of the node is set to ‘Box’ and the employee name is displayed inside the box. This helps to increase clarity since the labels don’t overlap one another.
User can view the relevant details (Department, Job Title and Name) of each employee by hovering over the node.
Other Considerations:
Add in a legend so that user can easily tell which department an employee belongs to. However, I noticed that there is a mismatch between the name on the legend and the respective department colours. Therefore, this was not implemented in the chart since it is misleading and will create confusion for the user.
Allow the user to hover each node instead of using the drop-down list or selecting the node. However, I noticed that if hover was set to True, it will only highlight other nodes in the same department as the employee. Therefore, it will only indicate the internal interactions of the employee and we can’t see the connections made to employees in other departments. This was not implemented since it is better to showcase both internal and external interactions.
GAStech_nodes$title <- paste0("<p><b>Department: </b>", GAStech_nodes$group,"</p>","<p><b>Job Title:</b> ", GAStech_nodes$Title,"</p>","<p><b> Name: </b>", GAStech_nodes$label, "</p>")
GAStech_edges_aggregated$label <- GAStech_edges_aggregated$value
## Warning: Unknown or uninitialised column: 'value'.
visNetwork(GAStech_nodes, GAStech_edges_aggregated,main = "Improved Interactive Network Graph") %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(selectedBy = "group",highlightNearest = list(enabled = T,degree = list(from = 1, to = 1), hover = F,labelOnly = F,algorithm = "hierarchical"), nodesIdSelection = list(enabled = TRUE,main = "Select Name" ))%>%
visInteraction(navigationButtons = TRUE,multiselect = TRUE) %>%
visNodes(shape="box", labelHighlightBold = TRUE)%>%
visEdges(arrows = list(to = list(enabled = TRUE), from = list(enabled = TRUE))) %>%
visPhysics(stabilization = FALSE) %>%
visLayout(randomSeed = 12)