Harvey Kristanto Lauw

17 November 2019

1. Installing and Launching R Packages

packages = c('igraph', 'tidygraph', 'ggraph', 'visNetwork', 'lubridate', 'tidyverse')

for(p in packages){library
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2. Data Wrangling

GAStech_nodes <- read_csv("data/GAStech_email_node.csv")
GAStech_edges <- read_csv("data/GAStech_email_edge-v2.csv")

GAStech_edges$SentDate  = dmy(GAStech_edges$SentDate)
GAStech_edges$Weekday = wday(GAStech_edges$SentDate, label = TRUE, abbr = FALSE)

GAStech_edges_aggregated <- GAStech_edges %>%
  filter(MainSubject == "Work related") %>%
  group_by(source, target, Weekday) %>%
    summarise(Weight = n()) %>%
  filter(source!=target) %>%
  filter(Weight > 1) %>%
  ungroup()

GAStech_graph <- tbl_graph(nodes = GAStech_nodes, edges = GAStech_edges_aggregated, directed = TRUE)

GAStech_graph %>%
  activate(edges) %>%
  arrange(desc(Weight))
## # A tbl_graph: 54 nodes and 1456 edges
## #
## # A directed multigraph with 1 component
## #
## # Edge Data: 1,456 x 4 (active)
##    from    to Weekday Weight
##   <int> <int> <ord>    <int>
## 1    40    41 Tuesday     23
## 2    40    43 Tuesday     19
## 3    41    43 Tuesday     15
## 4    41    40 Tuesday     14
## 5    42    41 Tuesday     13
## 6    42    40 Tuesday     12
## # ... with 1,450 more rows
## #
## # Node Data: 54 x 4
##      id label           Department     Title           
##   <dbl> <chr>           <chr>          <chr>           
## 1     1 Mat.Bramar      Administration Assistant to CEO
## 2     2 Anda.Ribera     Administration Assistant to CFO
## 3     3 Rachel.Pantanal Administration Assistant to CIO
## # ... with 51 more rows

3. Task 1: Static Organisation Graph

With reference to the organisation network graph in Section 6.1 of Hands-on Exercise 10, you are required to complete the following tasks:

3.1 Initial plot from Section 6.1

g <- GAStech_graph %>%
  mutate(betweenness_centrality = centrality_betweenness()) %>%
  mutate(closeness_centrality = centrality_closeness()) %>%
  ggraph(layout = "nicely") + 
  geom_edge_link(aes()) +
  geom_node_point(aes(colour = closeness_centrality, size=betweenness_centrality))

g + theme_graph()

3.2 Sketch for Task 1’s Revised plot

Sketch

Sketch

3.3 Revised plot from Section 6.1

  1. Issue 1: Lack of Transparency in the edges makes it difficult for any analysis on the weight of the edges.
    • Solution 1: Revised plot includes geom_edge_link() with weight of edges to control the width and alpha to be able to visualize any overlapping connections.
  2. Issue 2: Unable to analyse any trend based on time even with the weekdays column given within the data table.
    • Solution 2: Revised plot includes facet_edges() to reduce any overplotting and provide meaning to analyse the differences in the connection between nodes over the different days of the week.
  3. Issue 3: Lack of Categorisation in the nodes for users to make up analysis based on connections between nodes.
    • Solution 3: Revised plot includes a circle layout within ggraph(). Additionally, geom_node_voronoi() is added to visualise the departments of the nodes.
set_graph_style()

g <- GAStech_graph %>%
  mutate(betweenness_centrality = centrality_betweenness()) %>%
  mutate(closeness_centrality = centrality_closeness()) %>%
  ggraph(layout = "circle") + 
  geom_node_voronoi(aes(fill = Department), max.radius = 0.5, colour = 'white') +
  geom_edge_density(aes(fill = Weight)) +
  geom_edge_link(aes(width = Weight), alpha = 0.1) + 
  geom_node_point(aes(color = closeness_centrality, size = betweenness_centrality)) 

g + theme_graph() + facet_edges(~Weekday) + theme(legend.position = "bottom", legend.direction = "vertical")

## nerror = 4 
## Increasing madj from 23 to 28 and trying again.
## nerror = 4 
## Increasing madj from 28 to 34 and trying again.
## nerror = 4 
## Increasing madj from 23 to 28 and trying again.
## nerror = 4 
## Increasing madj from 28 to 34 and trying again.
## nerror = 4 
## Increasing madj from 23 to 28 and trying again.
## nerror = 4 
## Increasing madj from 28 to 34 and trying again.
## nerror = 4 
## Increasing madj from 23 to 28 and trying again.
## nerror = 4 
## Increasing madj from 28 to 34 and trying again.
## nerror = 4 
## Increasing madj from 23 to 28 and trying again.
## nerror = 4 
## Increasing madj from 28 to 34 and trying again.

4. Task 2: Interactive Organisation Graph

With reference to the organisation network graph in Section 7.4 of Hands-on Exercise 10, you are required to complete the following tasks:

4.1 Data preperation

GAStech_edges_aggregated <- GAStech_edges %>%
  left_join(GAStech_nodes, by = c("sourceLabel" = "label")) %>%
  rename(from = id) %>%
  left_join(GAStech_nodes, by = c("targetLabel" = "label")) %>%
  rename(to = id) %>%
  filter(MainSubject == "Work related") %>%
  group_by(from, to) %>%
    summarise(weight = n()) %>%
  filter(from!=to) %>%
  filter(weight > 1) %>%
  ungroup()

GAStech_nodes <- GAStech_nodes %>%
  rename(group = Department)

4.2 Initial plot from Section 7.4

visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)

4.3 Sketch for Task 2’s Revised plot

Sketch

Sketch

4.4 Revised plot from Section 7.4

  1. Issue 1: Unable to view labels clearly as it overlaps with the edges between each nodes when the label is placed outside each node.
    • Solution 1: Revised plot includes visNodes() in order to enlarge the size of the node to fit the amount of characters within the label. Users can now easily read each labels when selected as there is no longer overlapping of texts and edges.
  2. Issue 2: Lack of organization in the undirected visualisation network. Makes it hard for users to make proper analysis between different departments.
    • Solution 2: Revided plot includes visEdges() provide arrows at the end of edges to represent the indegree & outdegree between each nodes. The directed graph allows user to have additional analysis on the purpose of connection between nodes if any.
  3. Issue 3: Colored edges & layout causing interaction to be disorganized.
    • Solution 3: Revised plot uses “layout_in_circle” as a layout to focus all edges analysis in the middle of the plot. Set change of color (“Red” in the code) only if highlighted in visEdges() which activates by selection of nodes or edges. Additionally, the inclusion of navigation buttons in visInteraction() to provide different means to interact with the network visualization
visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
  visIgraphLayout(layout = "layout_in_circle") %>%
  visInteraction(dragNodes = TRUE, dragView = TRUE,zoomView = TRUE,navigationButtons=TRUE) %>%
  visNodes(shape = "box" ,
           shadow = list(enabled = TRUE,
                         size = 20),
           font = "16px arial black") %>%
  visEdges(arrows = "to" ,
           selectionWidth = 8,
           shadow = FALSE,
           color = list(highlight = "Red")) %>%
  visOptions(highlightNearest = list(enabled = TRUE,
                                     hover = TRUE,
                                     algorithm = "hierarchical"),
             nodesIdSelection = list(enabled = TRUE,
                                     values = unique(GAStech_nodes$id)))