DataViz Makeover 2

Nurul Khairina Binte Abdul Kadir
24 November 2019

1.0 Task Overview - Static Organisation Graph

With reference to the organisation network graph in Section 6.1 of Hands-on Exercise 10, you are required to complete the following tasks:

  • Improve the code chunk used to create the organisation network graph by using the latest functions provided in ggraph2.0.
  • Identify three aspects of the graph visualisation in Section 6.1 that can be improved.
  • Provide the sketch of your alternative design.
  • Using appropriate ggraph functions, plot the alternative design.

2.0 Original Chart

Install packages and Data Wrangling

knitr::opts_chunk$set(echo = TRUE)
packages = c('igraph', 'tidygraph', 'ggraph', 'visNetwork', 'lubridate', 'tidyverse')

for(p in packages){library
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

GAStech_nodes <- read_csv("data/GAStech_email_node.csv")
GAStech_edges <- read_csv("data/GAStech_email_edge-v2.csv")

GAStech_edges$SentDate  = dmy(GAStech_edges$SentDate)
GAStech_edges$Weekday = wday(GAStech_edges$SentDate, label = TRUE, abbr = FALSE)

GAStech_edges_aggregated <- GAStech_edges %>%
  filter(MainSubject == "Work related") %>%
  group_by(source, target, Weekday) %>%
    summarise(Weight = n()) %>%
  filter(source!=target) %>%
  filter(Weight > 1) %>%
  ungroup()
GAStech_edges_aggregated
## # A tibble: 1,456 x 4
##    source target Weekday   Weight
##     <dbl>  <dbl> <ord>      <int>
##  1      1      2 Monday         4
##  2      1      2 Tuesday        3
##  3      1      2 Wednesday      5
##  4      1      2 Friday         8
##  5      1      3 Monday         4
##  6      1      3 Tuesday        3
##  7      1      3 Wednesday      5
##  8      1      3 Friday         8
##  9      1      4 Monday         4
## 10      1      4 Tuesday        3
## # ... with 1,446 more rows

Creating network objects using tidygraph

GAStech_graph <- tbl_graph(nodes = GAStech_nodes, edges = GAStech_edges_aggregated, directed = TRUE)

GAStech_graph %>%
  activate(edges) %>%
  arrange(desc(Weight))
## # A tbl_graph: 54 nodes and 1456 edges
## #
## # A directed multigraph with 1 component
## #
## # Edge Data: 1,456 x 4 (active)
##    from    to Weekday Weight
##   <int> <int> <ord>    <int>
## 1    40    41 Tuesday     23
## 2    40    43 Tuesday     19
## 3    41    43 Tuesday     15
## 4    41    40 Tuesday     14
## 5    42    41 Tuesday     13
## 6    42    40 Tuesday     12
## # ... with 1,450 more rows
## #
## # Node Data: 54 x 4
##      id label           Department     Title           
##   <dbl> <chr>           <chr>          <chr>           
## 1     1 Mat.Bramar      Administration Assistant to CEO
## 2     2 Anda.Ribera     Administration Assistant to CFO
## 3     3 Rachel.Pantanal Administration Assistant to CIO
## # ... with 51 more rows
g <- GAStech_graph %>%
  mutate(betweenness_centrality = centrality_betweenness()) %>%
  mutate(closeness_centrality = centrality_closeness()) %>%
  ggraph(layout = "nicely") + 
  geom_edge_link(aes()) +
  geom_node_point(aes(colour = closeness_centrality, size=betweenness_centrality))

g + theme_graph()
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font
## family not found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

2.1 Area of Improvement

Identify 3 aspects of the graph visualisation that can be improved.

Overall, it is difficult to gain insights from the chart since it lacks clarity. It is messy and hard to analyse.

  1. Using size to represent the betweenness centrality is redundant since there are too many distracting lines and users have to refer to the lengend to find out which range it belongs to.

  2. The nodes overlap one another. It is unclear as to what each node represents as the employee name is not indicated.

  3. The direction of the edges are not indicated. This will allow us to better understand the relationship between each person.

  4. The chart lacks important information like the department name. Even if a node has a high closeness centrality, we don’t know which department it is referring to and it is not useful.

2.2 Sketch of Alternative Design

2.3 Alternative Design

I have incorporated improvement #1 to #3 into the chart below.

The colour of the edges is grey instead of black. The nodes no longer overlap one another. The direction of the edges are also indicated but it is not very visible. Most importantly, the name of the employee can be seen clearly and this helps the user to interpret the chart easily.

GAStech_nodes <- read_csv("data/GAStech_email_node.csv")
## Parsed with column specification:
## cols(
##   id = col_double(),
##   label = col_character(),
##   Department = col_character(),
##   Title = col_character()
## )
GAStech_edges <- read_csv("data/GAStech_email_edge-v2.csv")
## Parsed with column specification:
## cols(
##   source = col_double(),
##   target = col_double(),
##   SentDate = col_character(),
##   SentTime = col_time(format = ""),
##   Subject = col_character(),
##   MainSubject = col_character(),
##   sourceLabel = col_character(),
##   targetLabel = col_character()
## )
GAStech_edges$SentDate  = dmy(GAStech_edges$SentDate)
GAStech_edges$Weekday = wday(GAStech_edges$SentDate, label = TRUE, abbr = FALSE)

GAStech_edges_aggregated <- GAStech_edges %>%
  filter(MainSubject == "Work related") %>%
  group_by(source, target, Weekday) %>%
    summarise(Weight = n()) %>%
  filter(source!=target) %>%
  filter(Weight > 1) %>%
  ungroup()

GAStech_graph <- tbl_graph(nodes = GAStech_nodes, edges = GAStech_edges_aggregated, directed = TRUE)

GAStech_graph %>%
  activate(edges) %>%
  arrange(desc(Weight))
## # A tbl_graph: 54 nodes and 1456 edges
## #
## # A directed multigraph with 1 component
## #
## # Edge Data: 1,456 x 4 (active)
##    from    to Weekday Weight
##   <int> <int> <ord>    <int>
## 1    40    41 Tuesday     23
## 2    40    43 Tuesday     19
## 3    41    43 Tuesday     15
## 4    41    40 Tuesday     14
## 5    42    41 Tuesday     13
## 6    42    40 Tuesday     12
## # ... with 1,450 more rows
## #
## # Node Data: 54 x 4
##      id label           Department     Title           
##   <dbl> <chr>           <chr>          <chr>           
## 1     1 Mat.Bramar      Administration Assistant to CEO
## 2     2 Anda.Ribera     Administration Assistant to CFO
## 3     3 Rachel.Pantanal Administration Assistant to CIO
## # ... with 51 more rows
library(viridis)
## Warning: package 'viridis' was built under R version 3.5.3
## Loading required package: viridisLite
## Warning: package 'viridisLite' was built under R version 3.5.3
g <- GAStech_graph %>%
  mutate(betweenness_centrality = centrality_betweenness()) %>%
  mutate(closeness_centrality = centrality_closeness())%>%
  ggraph(layout = "fr")+
  geom_edge_link(colour="grey",arrow = arrow(length = unit(6, "pt"))) +
  geom_node_point(aes(size=centrality_betweenness(),colour=centrality_closeness())) +
  geom_node_text(aes(label=label), nudge_x = 0.03,repel=TRUE, size =2.5,colour="black")+
  scale_color_gradient(name = "Centrality Closeness")+
  scale_color_viridis(direction = -1, option = "D","Centrality Closeness")+
  ggtitle("Network Visualization of Gas Tech")+
  theme(plot.title = element_text(hjust = 0.5))+
  labs(colour = "Centrality Closeness",size = "Centrality Betweenness")
## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.
g

3.0 Task Overview - Interactive Organisation Graph

With reference to the organisation network graph in Section 7.4 of Hands-on Exercise 10, you are required to complete the following tasks:

  • Improve the design of the graph by incorporating the following interactivity:

o When a name is selected from the drop-down list, the corresponding node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will also be labelled too. o When a node of the interactive graph is selected, the node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will be labelled as well.

  • Identify three aspects of the graph visualisation in Section 7.4 that can be improved.
  • Provide the sketch of your alternative design.
  • Using appropriate visNetwork functions, plot the alternative design.

4.0 Original Chart

GAStech_edges_aggregated <- GAStech_edges %>%
  left_join(GAStech_nodes, by = c("sourceLabel" = "label")) %>%
  rename(from = id) %>%
  left_join(GAStech_nodes, by = c("targetLabel" = "label")) %>%
  rename(to = id) %>%
  filter(MainSubject == "Work related") %>%
  group_by(from, to) %>%
    summarise(weight = n()) %>%
  filter(from!=to) %>%
  filter(weight > 1) %>%
  ungroup()

GAStech_nodes <- GAStech_nodes %>%
  rename(group = Department)
visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)

4.1 Improve Chart By Incorporating Interactive Elements

When a name is selected from the drop-down list, the corresponding node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will also be labelled too. When a node of the interactive graph is selected, the node will not only be highlighted but also will be labelled. Furthermore, all the linked nodes of the selected node will be labelled as well.

visNetwork(GAStech_nodes, GAStech_edges_aggregated) %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visOptions(highlightNearest = list(enabled = TRUE,degree = 1, labelOnly = F, hover = F),nodesIdSelection = TRUE) 

4.2 Areas For Improvement

Overall, the chart lacks clarity due to the overlapping names and lines. The areas of improvement is listed below:

  1. It is difficult to identify the names as they overlap one another. Even if the user were to zoom into the chart, it is still hard and messy for the user to tell the name of each node due to the lines.

  2. In order to view the employee name, user has to zoom into the chart.

  3. There is no legend or indication to show which department an employee belongs to. The user won’t know what each colour represents.

  4. There are no arrows to indicate the direction of the edge.

  5. There is no chart title.

4.3 Alternative Design Sketch

4.4 Final Design

Improvements made to the graph:

  1. User can filter by group (department) and this will show the interactions within and outside of the respective departments.‘Select By Id’ is renamed to ‘Select Name’.

  2. A title is added to the top to complement the chart and provide the user with an idea of what the chart is about.

  3. Navigation buttons are added to the bottom for easier viewing and facilitate interaction.

  4. When the user select a name in the drop-down list or select the node in the chart directly, the arrows will indicate the direction and the colour will vary by department. Nodes or or lines that are not connected to the node will be isolated to increase clarity. It will also allow the user to focus on nodes that are connected to the particular employee.

  5. The shape of the node is set to ‘Box’ and the employee name is displayed inside the box. This helps to increase clarity since the labels don’t overlap one another.

  6. User can view the relevant details (Department, Job Title and Name) of each employee by hovering over the node.

Other Considerations:

  1. Add in a legend so that user can easily tell which department an employee belongs to. However, I noticed that there is a mismatch between the name on the legend and the respective department colours. Therefore, this was not implemented in the chart since it is misleading and will create confusion for the user.

  2. Allow the user to hover each node instead of using the drop-down list or selecting the node. However, I noticed that if hover was set to True, it will only highlight other nodes in the same department as the employee. Therefore, it will only indicate the internal interactions of the employee and we can’t see the connections made to employees in other departments. This was not implemented since it is better to showcase both internal and external interactions.

GAStech_nodes$title <- paste0("<p><b>Department: </b>", GAStech_nodes$group,"</p>","<p><b>Job Title:</b> ", GAStech_nodes$Title,"</p>","<p><b> Name: </b>", GAStech_nodes$label, "</p>")
GAStech_edges_aggregated$label <- GAStech_edges_aggregated$value
## Warning: Unknown or uninitialised column: 'value'.
visNetwork(GAStech_nodes, GAStech_edges_aggregated,main = "Improved Interactive Network Graph") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
   visOptions(selectedBy = "group",highlightNearest = list(enabled = T,degree =  list(from = 1, to = 1), hover = F,labelOnly = F,algorithm = "hierarchical"), nodesIdSelection = list(enabled = TRUE,main = "Select Name" ))%>%
  visInteraction(navigationButtons = TRUE,multiselect = TRUE) %>%
  visNodes(shape="box", labelHighlightBold = TRUE)%>%
  visEdges(arrows = list(to = list(enabled = TRUE), from = list(enabled = TRUE)))  %>%
  visPhysics(stabilization = FALSE) %>%
  visLayout(randomSeed = 12)