Author: Goh Jia Xian (Jarrett)
Date: 22-Nov-2019

Setting Up the Environment

Preparing R Packages for Both Tasks

packages = c('igraph', 'tidygraph', 'ggraph', 'visNetwork', 'lubridate', 'tidyverse')

for(p in packages){library
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Importing Network Data

GAStech_nodes <- read_csv("data/GAStech_email_node.csv")
GAStech_edges <- read_csv("data/GAStech_email_edge-v2.csv")

Processing the Data

#Wrangling Time
GAStech_edges$SentDate  = dmy(GAStech_edges$SentDate)
GAStech_edges$Weekday = wday(GAStech_edges$SentDate, label = TRUE, abbr = FALSE)

# Wrangling Attributes
GAStech_edges_aggregated <- GAStech_edges %>%
  filter(MainSubject == "Work related") %>%
  group_by(source, target, Weekday) %>%
    summarise(Weight = n()) %>%
  filter(source!=target) %>%
  filter(Weight > 1) %>%
  ungroup()

Task 1

Task 1 Part 1: Improve the Orginal Code

Qns: Improve the code chunk used to create the organisation network graph by using the latest functions provided in ggraph2.0

Before: Section 6.1 of Hands-on Exercise 10

GAStech_graph <- tbl_graph(nodes = GAStech_nodes, edges = GAStech_edges_aggregated, directed = TRUE)

g <- GAStech_graph %>%
  mutate(betweenness_centrality = centrality_betweenness()) %>%
  mutate(closeness_centrality = centrality_closeness()) %>%
  ggraph(layout = "nicely") + 
  geom_edge_link(aes()) +
  geom_node_point(aes(colour = closeness_centrality, 
                      size=betweenness_centrality))

g + theme_graph()

After: Improvement Made

Changes made:
1. It is not required to create a tbl_graph object as GGraph2’s internals can be based on tidygraph. The inputs will be automatically be transformed into a tbl_graph object.

2. Mutate functions are not necessary as the functions can be called in ‘colour’ and ‘size’.

ggraph(GAStech_edges_aggregated, layout = 'nicely') +
  geom_edge_link() +  
  geom_node_point(aes(colour = centrality_closeness(), 
                      size = centrality_betweenness())) +
  theme_graph()

Task 1 Part 2: Three Aspects of Improvement

Qns: Identify three aspects of the graph visualisation in Section 6.1 that can be improved.

Based on the graph plotted in part 1, it is difficult to retrieve any meaningful insights due to poor design in the following aspects:

1. Network Layout

  Problem: The graph in general looks disorganised and unncessarily complicated, it makes it diffcult
  for the readers to study the links between the nodes.
  
  Solution: Use a new layout to display the chart to prevent the edges from crossing and overlapping.

2. Nodes

  Problem: Unable to identify what each node represents as there are no labels indicating either name
  or group. Also, some of the nodes cannot be seen as they share the same colour as its edges. 
  
  Solution: Use labels to show nodes with high Betweenness Centrality and Closeness Centrality. 
  The nodes should be coloured according to its department.

3. Edges

  Problem: Unable to derive the frequency of emails sent in the network as all edges used the same
  weight

  Solution: Set the weight of the edges according to the frequency of emails sent between nodes. 

Task 1 Part 3: Alternative design

Qns: Provide the sketch of your alternative design.

Task 1 Part 4: Plot Alternate Design

Qns: Using appropriate ggraph functions, plot the alternative design

Preparing Graph

GAStech_graph <- tbl_graph(nodes = GAStech_nodes, edges = GAStech_edges_aggregated, directed = TRUE)

GAStech_graph %>%
  activate(edges) %>%
  arrange(desc(Weight))

Plotting the Graph

GAStech_graph <- GAStech_graph %>%
  mutate(BetweennessCentrality = centrality_betweenness()) %>%
  mutate(ClosenessCentrality = centrality_closeness()) %>%
  mutate(ClosenessCentrality = ifelse(ClosenessCentrality >= 0.015, 'High (> 0.015)', 'Low (< 0.015)'))

# Plotting Graph
ggraph(GAStech_graph, layout = 'linear') + 
    geom_edge_arc(aes(width=Weight), 
                  alpha=0.15, 
                  strength = 0.5) +
    scale_edge_width(range = c(0.3, 5)) +
    geom_node_point(aes(colour = Department, 
                        size = BetweennessCentrality, 
                        shape = ClosenessCentrality, 
                        fill = Department)) +
    scale_shape_manual(values=c(23, 21)) +
    geom_node_label(aes(label=ifelse(BetweennessCentrality > 300 | ClosenessCentrality == 'High (> 0.015)',
                                     label, 
                                     NA)), 
                    repel = TRUE, 
                    alpha  = 0.5, 
                    size = 4) +
    theme_graph()

Task 2

Data preparation for Interactive Graph

GAStech_edges_aggregated <- GAStech_edges %>%
  left_join(GAStech_nodes, 
            by = c("sourceLabel" = "label")) %>%
  rename(from = id) %>%
  left_join(GAStech_nodes, 
            by = c("targetLabel" = "label")) %>%
  rename(to = id) %>%
  filter(MainSubject == "Work related") %>%
  group_by(from, to) %>%
  summarise(weight = n()) %>%
  filter(from!=to) %>%
  filter(weight > 1) %>%
  ungroup()

GAStech_nodes <- GAStech_nodes %>%
  rename(group = Department)

Task 2 Part 1: Improve the Graph

Qns: Improve the design of the graph by incorporating the following interactivity:

1. When a name is selected from the drop-down list, the corresponding node will not only be highlighted
but also will be labelled. Furthermore, all the linked nodes of the selected node will also be labelled
too.

2. When a node of the interactive graph is selected, the node will not only be highlighted but also
will be labelled. Furthermore, all the linked nodes of the selected node will be labelled as well. 

Before: Section 7.4 of Hands-on Exercise 10

visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visOptions(highlightNearest = list(enabled = TRUE, 
                                     labelOnly=TRUE), 
             nodesIdSelection=TRUE)

After: Interactive Graph Showing Highlighted Labels only

visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visOptions(highlightNearest = list(enabled = TRUE, 
                                     labelOnly=FALSE), 
             nodesIdSelection=TRUE) 

Task 2 Part 2: Three Aspects of Improved.

Qns: Identify three aspects of the graph visualisation in Section 7.4 that can be improved.

Based on the graph plotted in part 1, the following aspects should be improved:

Edges

  Problem 1: While a node is highlighted, edges that are not highlighted remains on the chart
  masking the highlighted edges. This makes it difficult to study the chart.
  
  Solution: Emphasize the edges that are highlighted or grey out the edges that are not
  highlighted
  
  
  Problem 2: Frequency of emails sent between nodes cannot be derived as all edges share the
  same width.
  
  Solution: Add weight to all edges or include frequency on the edges.  
  
  
  Problem 3: Unable to draw information about the direction on the flow of emails.
  
  Solution: Include arrows on edges to signify the flow of emails between 2 nodes.
  

2. Nodes

  Problem 4: It is difficult to read the nodes' label as it is masked by the edges.
  
  Solution: Change the shape of the node and include the names in the node. 
  
  
  Problem 5: Unable to gather the job title of each node.
  
  Solution: Create a tooltip that shows the title's information on hover.
  

3. General

  Problem 6: Unable gather information on the frequency of correspondence between each department 

  Solution: Allow users view all edges of a department.

Task 2 Part 3: Sketch Alternate Graph

Qns: Provide the sketch of your alternative design.

Task 2 Part 4: Plot Alternate Graph

Qns: Using appropriate visNetwork functions, plot the alternative design

GAStech_nodes <- GAStech_nodes %>%
  mutate(label = str_replace(label,"[[:punct:]]"," ")) %>%
  rename(title = Title) %>%
  mutate(title = paste("Title: ", title))

GAStech_edges_aggregated <- GAStech_edges_aggregated %>%
  mutate(label = paste(weight))
  
visNetwork(GAStech_nodes, GAStech_edges_aggregated, main = "GASTech Email's Network Graph") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visEdges(selectionWidth=7, arrows = "to") %>%
  visOptions(highlightNearest = list(enabled = TRUE, labelOnly=FALSE), 
             nodesIdSelection=TRUE, selectedBy = "group",
             width='100%',
             height='100%') %>%
  visInteraction(tooltipDelay = 0, 
                 tooltipStay = 60,
                 tooltipStyle='position: fixed;visibility:hidden;padding: 1px;font-size:12px;background-color: white;') %>%
  visNodes(font = list(size = 30), shape='ellipse') %>%
  visLegend(main = "Department", position='right', width=0.15)