Author: Goh Jia Xian (Jarrett)
Date: 22-Nov-2019
packages = c('igraph', 'tidygraph', 'ggraph', 'visNetwork', 'lubridate', 'tidyverse')
for(p in packages){library
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
GAStech_nodes <- read_csv("data/GAStech_email_node.csv")
GAStech_edges <- read_csv("data/GAStech_email_edge-v2.csv")
#Wrangling Time
GAStech_edges$SentDate = dmy(GAStech_edges$SentDate)
GAStech_edges$Weekday = wday(GAStech_edges$SentDate, label = TRUE, abbr = FALSE)
# Wrangling Attributes
GAStech_edges_aggregated <- GAStech_edges %>%
filter(MainSubject == "Work related") %>%
group_by(source, target, Weekday) %>%
summarise(Weight = n()) %>%
filter(source!=target) %>%
filter(Weight > 1) %>%
ungroup()
Qns: Improve the code chunk used to create the organisation network graph by using the latest functions provided in ggraph2.0
GAStech_graph <- tbl_graph(nodes = GAStech_nodes, edges = GAStech_edges_aggregated, directed = TRUE)
g <- GAStech_graph %>%
mutate(betweenness_centrality = centrality_betweenness()) %>%
mutate(closeness_centrality = centrality_closeness()) %>%
ggraph(layout = "nicely") +
geom_edge_link(aes()) +
geom_node_point(aes(colour = closeness_centrality,
size=betweenness_centrality))
g + theme_graph()
Changes made:
1. It is not required to create a tbl_graph object as GGraph2’s internals can be based on tidygraph. The inputs will be automatically be transformed into a tbl_graph object.
2. Mutate functions are not necessary as the functions can be called in ‘colour’ and ‘size’.
ggraph(GAStech_edges_aggregated, layout = 'nicely') +
geom_edge_link() +
geom_node_point(aes(colour = centrality_closeness(),
size = centrality_betweenness())) +
theme_graph()
Qns: Identify three aspects of the graph visualisation in Section 6.1 that can be improved.
Based on the graph plotted in part 1, it is difficult to retrieve any meaningful insights due to poor design in the following aspects:
1. Network Layout
Problem: The graph in general looks disorganised and unncessarily complicated, it makes it diffcult
for the readers to study the links between the nodes.
Solution: Use a new layout to display the chart to prevent the edges from crossing and overlapping.
2. Nodes
Problem: Unable to identify what each node represents as there are no labels indicating either name
or group. Also, some of the nodes cannot be seen as they share the same colour as its edges.
Solution: Use labels to show nodes with high Betweenness Centrality and Closeness Centrality.
The nodes should be coloured according to its department.
3. Edges
Problem: Unable to derive the frequency of emails sent in the network as all edges used the same
weight
Solution: Set the weight of the edges according to the frequency of emails sent between nodes.
Qns: Provide the sketch of your alternative design.
Qns: Using appropriate ggraph functions, plot the alternative design
GAStech_graph <- tbl_graph(nodes = GAStech_nodes, edges = GAStech_edges_aggregated, directed = TRUE)
GAStech_graph %>%
activate(edges) %>%
arrange(desc(Weight))
GAStech_graph <- GAStech_graph %>%
mutate(BetweennessCentrality = centrality_betweenness()) %>%
mutate(ClosenessCentrality = centrality_closeness()) %>%
mutate(ClosenessCentrality = ifelse(ClosenessCentrality >= 0.015, 'High (> 0.015)', 'Low (< 0.015)'))
# Plotting Graph
ggraph(GAStech_graph, layout = 'linear') +
geom_edge_arc(aes(width=Weight),
alpha=0.15,
strength = 0.5) +
scale_edge_width(range = c(0.3, 5)) +
geom_node_point(aes(colour = Department,
size = BetweennessCentrality,
shape = ClosenessCentrality,
fill = Department)) +
scale_shape_manual(values=c(23, 21)) +
geom_node_label(aes(label=ifelse(BetweennessCentrality > 300 | ClosenessCentrality == 'High (> 0.015)',
label,
NA)),
repel = TRUE,
alpha = 0.5,
size = 4) +
theme_graph()
GAStech_edges_aggregated <- GAStech_edges %>%
left_join(GAStech_nodes,
by = c("sourceLabel" = "label")) %>%
rename(from = id) %>%
left_join(GAStech_nodes,
by = c("targetLabel" = "label")) %>%
rename(to = id) %>%
filter(MainSubject == "Work related") %>%
group_by(from, to) %>%
summarise(weight = n()) %>%
filter(from!=to) %>%
filter(weight > 1) %>%
ungroup()
GAStech_nodes <- GAStech_nodes %>%
rename(group = Department)
Qns: Improve the design of the graph by incorporating the following interactivity:
1. When a name is selected from the drop-down list, the corresponding node will not only be highlighted
but also will be labelled. Furthermore, all the linked nodes of the selected node will also be labelled
too.
2. When a node of the interactive graph is selected, the node will not only be highlighted but also
will be labelled. Furthermore, all the linked nodes of the selected node will be labelled as well.
visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = list(enabled = TRUE,
labelOnly=TRUE),
nodesIdSelection=TRUE)
visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = list(enabled = TRUE,
labelOnly=FALSE),
nodesIdSelection=TRUE)
Qns: Identify three aspects of the graph visualisation in Section 7.4 that can be improved.
Based on the graph plotted in part 1, the following aspects should be improved:
Edges
Problem 1: While a node is highlighted, edges that are not highlighted remains on the chart
masking the highlighted edges. This makes it difficult to study the chart.
Solution: Emphasize the edges that are highlighted or grey out the edges that are not
highlighted
Problem 2: Frequency of emails sent between nodes cannot be derived as all edges share the
same width.
Solution: Add weight to all edges or include frequency on the edges.
Problem 3: Unable to draw information about the direction on the flow of emails.
Solution: Include arrows on edges to signify the flow of emails between 2 nodes.
2. Nodes
Problem 4: It is difficult to read the nodes' label as it is masked by the edges.
Solution: Change the shape of the node and include the names in the node.
Problem 5: Unable to gather the job title of each node.
Solution: Create a tooltip that shows the title's information on hover.
3. General
Problem 6: Unable gather information on the frequency of correspondence between each department
Solution: Allow users view all edges of a department.
Qns: Provide the sketch of your alternative design.
Qns: Using appropriate visNetwork functions, plot the alternative design
GAStech_nodes <- GAStech_nodes %>%
mutate(label = str_replace(label,"[[:punct:]]"," ")) %>%
rename(title = Title) %>%
mutate(title = paste("Title: ", title))
GAStech_edges_aggregated <- GAStech_edges_aggregated %>%
mutate(label = paste(weight))
visNetwork(GAStech_nodes, GAStech_edges_aggregated, main = "GASTech Email's Network Graph") %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visEdges(selectionWidth=7, arrows = "to") %>%
visOptions(highlightNearest = list(enabled = TRUE, labelOnly=FALSE),
nodesIdSelection=TRUE, selectedBy = "group",
width='100%',
height='100%') %>%
visInteraction(tooltipDelay = 0,
tooltipStay = 60,
tooltipStyle='position: fixed;visibility:hidden;padding: 1px;font-size:12px;background-color: white;') %>%
visNodes(font = list(size = 30), shape='ellipse') %>%
visLegend(main = "Department", position='right', width=0.15)