Processing math: 100%

Introduction


Network graphs are a great way to visualise relationships between objects and their importance within the network. This tutorial demonstrates the basics of the package ggraph.

It follows the principles of Leland Wilkinson’s The Grammar of Graphics by extending on well-known data visualisation package ggplot2, which can be challenging to apply to networks. Data structures from igraph, an older network package originally created in C, are also supported.

Install and load packages

Load the following packages once installed. We leverage igraph and tidyverse to handle some more complex data pre-processing.

Example 1: Undirected Network


A network is comprised of objects referred to as nodes or Vertices and their connections named Edges. This example is only concerned with identifying connections where direction does not matter. Real world examples include social networks or road maps within a city.

Adjacency matrix

The dataset is converted into a weighted N∗N adjacency matrix of character connections. We count characters’ lines per episode, then use cosine similarity to weigh connections based on episodes appearances and amount of dialogue.

igraph network

The network’s magnitude of nodes and connections create difficulty in extracting valuable information without further customisation. This is where a network package shines. We remove weaker connections from the adjacency matrix and introduce igraph functions for layout and community detection.

Communities are densely connected clusters within the larger network. The Louvain Method used is an algorithm which creates small clusters optimising locally, then iteratively builds a network using smaller clusters as nodes. A random layout is used for now, with other options discussed later. Also of note are attributes of vertices and edges which can be controlled using V() and E().

Implement ggraph

Given the level of labels and nodes, extensive customisation is still required for meaningful insight. Using ggraph allows adjustments to be added layer by layer. Following the ggraph() function to initialise the graph, layers to control network attributes include:

  • geom_edge_link(): Straight line edge connections between nodes
  • geom_node_point(): Vertex markers
  • geom_node_text(): Vertex labels

Lastly, layers affecting overall features of the graph such as theme or labels carry over from ggplot2.

Adjusting aesthetics and layout

Further adjustments can be performed on the parameters directly or through the aes() aesthetic mapping function. We replace geom_edge_link() with geom_edge_fan() to avoid overlapping and reduce the boldness of edges. The vertices use aes() to link size and text with degree centrality. The degree() function calculates the importance of vertices by their number of connections.

The code layout = "fr" refers to Fruchterman-Reingold, one of the most widely used algorithms. It is a force-directed method which attempts to avoid edges crossing or differing significantly in length.

Example 2: Directed Network


A directed network makes a distinction between the source and target of a connection, e.g. an electric circuit board. For our dataset, we examine how often characters visit prominent locations of the show.

Implement ggraph

Converting to ggraph provides a better implementation of the layout algorithm in avoiding overlap and increasing legibility.

Further reading


The ggraph package contains a huge number of customisations to test out, while layout and community detection algorithms used in this tutorial are deep subjects on their own right.

Networks and Visualisations:

  1. Reference manual for ggraph: CRAN
  2. Detailed overview of networks in R: Jesse Sadler
  3. Introduction to The Grammar of Graphics: Towards Data Science

Advanced Topics:

  1. Defining Community Detection and its two broad methods: Analytics Vidhya Blog
  2. Introduction to force-directed layout algorithms: yWorks
  3. Cosine Similarity introduction for NLP: Machine Learning Plus