Introduction

This post further explores the word cooccurance network on our project on #qurananalytics. It will also serve as a simple tutorial on the igraph, tidygraph and ggraph packages in relation to the work on #qurananalytics.


Load Packages and Libraries

packages=c('dplyr', 'tidyverse', 'udpipe', 'ggplot2',
           'igraph', 'tidygraph', 'ggraph', 'knitr', 'quRan')
for (p in packages){
  if (! require (p,character.only = T)){
    install.packages(p)
  }
library(p,character.only = T)
}

Getting the Language Model Ready

# during first time model download execute the below line too
# udmodel <- udpipe_download_model(language = "english")
setwd("F:/RProjects")
# Load the model
udmodel <- udpipe_load_model(file = 'english-ewt-ud-2.5-191206.udpipe')

Start with annotating

Let’s start by annotating Surah Yusuf. The annotated data.frame can next be used for basic text analytics.

# Select the surah
Q01 <- quran_en_sahih %>% filter(surah == 12)
x <- udpipe_annotate(udmodel, x = Q01$text, doc_id = Q01$ayah_title)
x <- as.data.frame(x)

The resulting data.frame has a field called upos which is the Universal Parts of Speech tag and also a field called lemma which is the root form of each token in the text. These 2 fields give us a broad range of analytical possibilities.


Co-occurrences

Analyzing single words is a good start. Multi-word expressions should be more interesting. We can get multi-word expressions by looking either at collocations (words following one another), at word co-occurrences within each sentence or at word co-occurrences of words which are close in the neighbourhood of one another.

Co-occurrences allow to see how words are used either in the same sentence or next to each other. The udpipe package makes creating co-occurrence graphs using the relevant POS tags easy.

Nouns, adjectives, verbs, and adverbs used in same sentence

We look how many times nouns, proper nouns, adjectives, verbs, adverbs, and numbers are used in the same verse.

cooccur <- cooccurrence(x = subset(x, upos %in% c("NOUN", "PROPN", "VERB",
                                                  "ADJ", "ADV", "NUM")), 
                     term = "lemma", 
                     group = c("doc_id", "paragraph_id", "sentence_id"))
head(cooccur)

The result can be easily visualised using the igraph and ggraph R packages.

library(igraph)
library(ggraph)
library(ggplot2)
wordnetwork <- head(cooccur, 100)
wordnetwork <- graph_from_data_frame(wordnetwork)
ggraph(wordnetwork, layout = "fr") +
  geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_colour = "#ed9de9") +
  geom_node_point(aes(size = igraph::degree(wordnetwork)), shape = 1, color = "black") +
  geom_node_text(aes(label = name), col = "darkblue", size = 3) +
  labs(title = "Co-occurrences within sentence",
       subtitle = "Top 100 Nouns, Names, Adjectives, Verbs, Adverbs",
       caption = "Surah Yusuf (Saheeh International)")

The story is revealed by Allah (SWT). The main characters are Joseph, his father, his brothers, the king, and the wife of the minister (al-’Azeez). So the verb “say” dominates since it is a narrated story. Interesting to see the strong link and occurence of “know” with “Allah”.


Introduction to Networks and Graph Tools in R

This “introductory tutorial” should be useful for those new to #networkscience. We will be using the graph tools frequently in our work on #qurananalytics. Many of the examples are adapted from

  1. Work on dolphins by http://users.dimi.uniud.it/~massimo.franceschet/ns/syllabus/make/ggraph/ggraph.html
  2. Network Analysis and Visualization with R and igraph
  3. https://www.r-graph-gallery.com/network.html
  4. https://www.data-imaginist.com/2017/ggraph-introduction-layouts/

Network graph in summary

  • Network diagrams (or Graphs) show interconnections between a set of entities.
  • Each entity is represented by a Node (or vertice).
  • Connections between nodes are represented by links (or edges).
  • 3 packages are of interest in R:
    • igraph for data preparation and plotting,
    • ggraph for plotting using the grammar of graphic, and
    • tidygraph adapting igraph to the tidy format

Building simple network diagrams

  • Different input formats are available:
    • Adjacency matrix
    • Incidence matrix
    • Edge list
    • Dataframe
  • You need to know whether the network you’re trying to build is directed or undirected, and weighted or unweighted.
  • igraph package will read that kind of data and transform it into a graph object that is required to make a plot.

Adjacency matrix

  • An adjacency matrix is a square matrix where individuals in rows and columns are the same.
  • It is typically the kind of matrix you get when calculating the correlation between each pair of individuals.
  • In this example, we have a 5X5 integer matrix with random numbers. “0” means no connection between the nodes.
  • By default, we get an unweighted and oriented network.
library(igraph)
# Create data
set.seed(10)
data <- matrix(sample(0:2, 25, replace=TRUE), nrow=5)
colnames(data) = rownames(data) = LETTERS[1:5]
 
# build the graph object
g0a <- graph_from_adjacency_matrix(data)
 
# plot it
plot(g0a)

Incidence matrix

  • An incidence matrix is not square and entities provided in rows and columns are not necessarily the same.
  • By default, the graph object is directed from rows to columns.
# data
set.seed(1)
data <- matrix(sample(0:2, 15, replace=TRUE), nrow=3)
colnames(data) <- letters[1:5]
rownames(data) <- LETTERS[1:3]
 
# create the network object
g0b <- graph_from_incidence_matrix(data)
 
# plot it
plot(g0b)

Edge list

  • An edge list has 2 columns. Each row represents a connection between a source (first column) and a target (left column).
  • This kind of input often goes with another dataframe providing features for each node. (See a simple example below).
  • Possible to add information concerning each link in the edge list.
# create data:
links <- data.frame(
    source=c("A","A", "A", "A", "A","F", "B"),
    target=c("B","B", "C", "D", "F","A","E")
    )

# create the network object
g0c <- graph_from_data_frame(d=links, directed=F) 

# plot it
plot(g0c)

Literal List of connections

  • Last option that is less often used: a literal list of the connection.
    • Can be understood by igraph thanks to the graph_from_literal() function.
# create data:
g0d <- graph_from_literal( A-B-C-D, E-A-E-A, D-C-A, D-A-D-C )

# plot it
plot(g0d)

Simple Directed Network

  • Igraph only accepts Igraph objects.
  • Create networks from scratch or to import them.
  • Simple network that is directed, with four nodes, each represented by a number.
    • Arrows indicate links between nodes: 1->2, 2->3, 3->1, 4->2.
g0e <- graph(edges=c(1,2, 2,3, 3,1, 4,2), n=4, directed=T)
class(g0e) #checking the class
## [1] "igraph"
plot(g0e) # A simple plot to get started


Network chart parameters

  • This section describes the different parameters offered by the library, allowing to customize node, link and label features..
  • igraph allows to transform several types of input into a graph object that can be plotted using the plot() function as follow:
# Create data
set.seed(1)
data <- matrix(sample(0:1, 100, replace=TRUE, prob=c(0.8,0.2)), nc=10)
g0f <- graph_from_adjacency_matrix(data , mode='undirected', diag=F )

# Default network
par(mar=c(0,0,0,0))
plot(g0f)

Customize node features

  • Nodes can be modified by several arguments that all start with vertex.
    • A vertex is a node in igraph language.
    • Main options are explained in the code below:
plot(g0f,
    vertex.color = rgb(0.8,0.2,0.2,0.9),           # Node color
    vertex.frame.color = "Forestgreen",            # Node border color
    vertex.shape=c("circle","square"),             # One of “none”, “circle”, “square”, “csquare”, “rectangle” “crectangle”, “vrectangle”, “pie”, “raster”, or “sphere”
    vertex.size=c(15:24),                          # Size of the node (default is 15)
    vertex.size2=NA,                               # The second size of the node (e.g. for a rectangle)
    )

Customize label features

  • Following the same principle, options starting with vertex.label.
    • control the label features.
    • When several values are provided to an option, values are used subsequently and recycled.
plot(g0f,
    vertex.label=LETTERS[1:10],                    # Character vector used to label the nodes
    vertex.label.color=c("red","blue"),
    vertex.label.family="Times",                   # Font family of the label (e.g.“Times”, “Helvetica”)
    vertex.label.font=c(1,2,3,4),                  # Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol
    vertex.label.cex=c(0.5,1,1.5),                 # Font size (multiplication factor, device-dependent)
    vertex.label.dist=0,                           # Distance between the label and the vertex
    vertex.label.degree=0 ,                        # The position of the label in relation to the vertex (use pi)
    )

Network with node size based on edge number

  • How to build a network diagram where node size is proportional to its number of connections with other nodes.
  • Make nodes bigger when they are heavily connected with other nodes.
    • It means they have an importance in the network and thus deserves to be highlighted.
  • The degree() function of the igraph package allows us to compute the number of connections per node.
    • pass its result to the vertex.size argument of the plot() function to get the targeted result.
# Count the number of degree for each node:
deg <- degree(g0f, mode="all")
 
# Plot
plot(g0f, vertex.size=deg*6, vertex.color=rgb(0.1,0.7,0.8,0.5) )


Graph Model of Word Cooccurences in Surah Yusuf

We come back to the graph model of how many times nouns, proper nouns, adjectives, verbs, adverbs, and numbers are used in the same verse in Surah Yusuf (111 verses).

Data, format, size

  • The example data used here is not big, many of the ideas behind the visualizations we will generate apply to medium and large-scale networks.
  • When drawing very big networks we focus on identifying and visualizing communities or subsets of nodes.
  • Visualizing larger networks as giant hairballs is less helpful than providing plots that show key characteristics of the graph.

Create graph

  • Read nodes (term1 and term2) and edges
  • Convert raw data to an igraph network object.
  • Use the graph_from_data_frame() function, which takes one data frame (in our case)
    • Its first two columns are the IDs of the source and the target node for each edge.
    • The following columns are edge attributes (cooc, or number of cooccurences).
  • NODES starts with a column of node IDs.
    • Any following columns are interpreted as node attributes.
head(cooccur, 50)
wordnetwork <- head(cooccur, 50)
gm <- graph_from_data_frame(wordnetwork)
gm
## IGRAPH 489b8ed DN-- 35 50 -- 
## + attr: name (v/c), cooc (e/n)
## + edges from 489b8ed (vertex names):
##  [1] Joseph ->say     indeed ->say     say    ->when    father ->say    
##  [5] Allah  ->say     Allah  ->know    Joseph ->when    Allah  ->indeed 
##  [9] father ->indeed  brother->say     know   ->say     say    ->so     
## [13] one    ->say     indeed ->see     so     ->when    bag    ->brother
## [17] brother->Joseph  father ->Joseph  father ->when    eat    ->say    
## [21] say    ->see     eat    ->seven   say    ->seven   brother->so     
## [25] seven  ->spike   brother->when    indeed ->Joseph  Allah  ->most   
## [29] most   ->people  he     ->rely    enter  ->say     give   ->say    
## + ... omitted several edges
  • R is case sensitive, Weight/weight and Type/type … makes a great difference
  • The description of an igraph object starts with four letters:
    • D or U, for a directed or undirected graph
    • N for a named graph (where nodes have a name attribute)
    • W for a weighted graph (where edges have a weight attribute)
    • B for a bipartite (two-mode) graph (where nodes have a type attribute)
  • The two numbers that follow (35 50) refer to the number of nodes and edges in the graph. The description also lists node & edge attributes, for example:
    • (g/c) - graph-level character attribute
    • (v/c) - vertex-level character attribute
    • (e/n) - edge-level numeric attribute

Some simple commands

  • We have easy access to nodes, edges, and their attributes with:
E(gm)       # The edges of the "net" object
## + 50/50 edges from 489b8ed (vertex names):
##  [1] Joseph ->say     indeed ->say     say    ->when    father ->say    
##  [5] Allah  ->say     Allah  ->know    Joseph ->when    Allah  ->indeed 
##  [9] father ->indeed  brother->say     know   ->say     say    ->so     
## [13] one    ->say     indeed ->see     so     ->when    bag    ->brother
## [17] brother->Joseph  father ->Joseph  father ->when    eat    ->say    
## [21] say    ->see     eat    ->seven   say    ->seven   brother->so     
## [25] seven  ->spike   brother->when    indeed ->Joseph  Allah  ->most   
## [29] most   ->people  he     ->rely    enter  ->say     give   ->say    
## [33] seduce ->seek    cow    ->seven   dry    ->seven   fat    ->seven  
## [37] grain  ->seven   green  ->seven   lean   ->seven   other  ->seven  
## + ... omitted several edges
V(gm)       # The vertices of the "net" object
## + 35/35 vertices, named, from 489b8ed:
##  [1] Joseph  indeed  say     father  Allah   brother know    one     so     
## [10] bag     eat     seven   most    he      enter   give    seduce  cow    
## [19] dry     fat     grain   green   lean    other   do      good    when   
## [28] see     spike   people  rely    seek    take    knowing Lord
E(gm)$cooc  # Edge attribute
##  [1] 23 20 19 18 17 12 12 11 11 11 11 10  9  9  9  8  8  8  8  7  7  7  7  7  7
## [26]  7  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  5  5  5  5  5  5  5  5
V(gm)$name  # Vertex attribute 
##  [1] "Joseph"  "indeed"  "say"     "father"  "Allah"   "brother" "know"   
##  [8] "one"     "so"      "bag"     "eat"     "seven"   "most"    "he"     
## [15] "enter"   "give"    "seduce"  "cow"     "dry"     "fat"     "grain"  
## [22] "green"   "lean"    "other"   "do"      "good"    "when"    "see"    
## [29] "spike"   "people"  "rely"    "seek"    "take"    "knowing" "Lord"

Find nodes and edges by attribute:

  • Returns objects of type vertex sequence/edge sequence)
V(gm)[name=="Allah"]
## + 1/35 vertex, named, from 489b8ed:
## [1] Allah
E(gm)[cooc > 10]
## + 11/50 edges from 489b8ed (vertex names):
##  [1] Joseph ->say    indeed ->say    say    ->when   father ->say   
##  [5] Allah  ->say    Allah  ->know   Joseph ->when   Allah  ->indeed
##  [9] father ->indeed brother->say    know   ->say
# You can also examine the network matrix directly:
gm[1,]
##  Joseph  indeed     say  father   Allah brother    know     one      so     bag 
##       0       0       1       0       0       0       0       0       0       0 
##     eat   seven    most      he   enter    give  seduce     cow     dry     fat 
##       0       0       0       0       0       0       0       0       0       0 
##   grain   green    lean   other      do    good    when     see   spike  people 
##       0       0       0       0       0       0       1       0       0       0 
##    rely    seek    take knowing    Lord 
##       0       0       0       0       0
gm[5,7]
## [1] 1

Mappings

Using igraph plots

  • There are OK plotting functions in igraph
plot(gm) # not a pretty picture!

Plotting parameters

  • The network plots in igraph have a wide set of parameters we can set. Those include node options (starting with vertex.) and edge options (starting with edge.). A list of selected options is included below, but we can also check out ?igraph.plotting for more information.
  1. NODES
    • vertex.color Node color
    • vertex.frame.color Node border color
    • vertex.shape One of “none”, “circle”, “square”, “csquare”, “rectangle”, “crectangle”, “vrectangle”, “pie”, “raster”, or “sphere”
    • vertex.size Size of the node (default is 15)
    • vertex.size2 The second size of the node (e.g. for a rectangle)
    • vertex.label Character vector used to label the nodes
    • vertex.label.family Font family of the label (e.g.“Times”, “Helvetica”)
    • vertex.label.font Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol
    • vertex.label.cex Font size (multiplication factor, device-dependent)
    • vertex.label.dist Distance between the label and the vertex
    • vertex.label.degree The position of the label in relation to the vertex, where 0 is right, “pi” is left, “pi/2” is below, and “-pi/2” is above
  2. EDGES
    • edge.color Edge color
    • edge.width Edge width, defaults to 1
    • edge.arrow.size Arrow size, defaults to 1
    • edge.arrow.width Arrow width, defaults to 1
    • edge.lty Line type, could be 0 or “blank”, 1 or “solid”, 2 or “dashed”, 3 or “dotted”, 4 or “dotdash”, 5 or “longdash”, 6 or “twodash”
    • edge.label Character vector used to label edges
    • edge.label.family Font family of the label (e.g.“Times”, “Helvetica”)
    • edge.label.font Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol
    • edge.label.cex Font size for edge labels
    • edge.curved Edge curvature, range 0-1 (FALSE sets it to 0, TRUE to 0.5)
    • arrow.mode Vector specifying whether edges should have arrows, possible values: 0 no arrow, 1 back, 2 forward, 3 both
  3. OTHER
    • margin Empty space margins around the plot, vector with length 4
    • frame if TRUE, the plot will be framed
    • main If set, adds a title to the plot
    • sub If set, adds a subtitle to the plot
    • asp Numeric, the aspect ratio of a plot (y/x).
    • palette A color palette to use for vertex color
    • rescale Whether to rescale coordinates to [-1,1]. Default is TRUE.

Plot with curved edges (edge.curved=.1) and reduce arrow size:

  • We can set the node & edge options in two ways.
    • first one is to specify them in the plot() function, as we are doing below.
  • Note that using curved edges will allow you to see multiple links between two nodes (e.g. links going in either direction, or multiplex links)
plot(gm, edge.arrow.size=.4, edge.curved=.1)

Set edge color to red, the node color to yellow

plot(gm, edge.arrow.size=.2, edge.color="red",
      vertex.color="yellow", vertex.frame.color="#ffffff",
      vertex.label=V(gm)$Label, vertex.label.color="black")

Set edge width based on number of cooccurences:

E(gm)$cooc
##  [1] 23 20 19 18 17 12 12 11 11 11 11 10  9  9  9  8  8  8  8  7  7  7  7  7  7
## [26]  7  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  5  5  5  5  5  5  5  5
E(gm)$width
## NULL
E(gm)$width <- E(gm)$cooc
E(gm)$width
##  [1] 23 20 19 18 17 12 12 11 11 11 11 10  9  9  9  8  8  8  8  7  7  7  7  7  7
## [26]  7  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  5  5  5  5  5  5  5  5
plot(gm)

Redo colors

  • Also change arrow size and edge color
  • Set the network layout
#change arrow size and edge color:
E(gm)$arrow.size <- .2
E(gm)$edge.color <- "gray80"
graph_attr(gm, "layout") <- layout_with_fr
plot(gm) 

Sometimes, especially with semantic networks, plot only the labels of the nodes:

plot(gm, vertex.shape="none", vertex.label=V(gm)$name, 
      vertex.label.font=2, vertex.label.color="blue",
      vertex.label.cex=.7, edge.color="gray85")

Color edges of graph based on their source node color.

  • Get the starting node for each edge with the ends() igraph function.
    • returns the start and end vertex for edges listed in the es parameter.
    • names parameter controls whether the function returns edge names or IDs.
edge.start <- ends(gm, es=E(gm), names=F)[,1]
edge.col <- V(gm)$color[edge.start]
edge.start
##  [1]  1  2  3  4  5  5  1  5  4  6  7  3  8  2  9 10  6  4  4 11  3 11  3  6 12
## [26]  6  2  5 13 14 15 16 17 18 19 20 21 22 23 24  4  3  5 25 11 26 15  2  2 11
edge.col
## NULL
plot(gm, edge.color=edge.col, edge.curved=.1)  

Color edges of graph based on the number of coocurrences

E(gm)[cooc <= 5]$color <- "red"
E(gm)[cooc > 5 & cooc <= 10]$color <- "yellow"
E(gm)[cooc > 10]$color <- "green"
plot(gm)

# repeat plot
plot(gm, vertex.size=5,
     layout=layout.fruchterman.reingold,
     vertex.label=NA)

Change the graph layout according to some algorithms already implemented on Igraph.

To adjust the figure you just have to assign a new layout: * layout_with_dh * layout_with_fr * layout_with_kk * layout_with_sugiyama

l<-layout_with_dh(gm)
plot(gm, vertex.color=vertex_attr(gm)$cor,
      vertex.size=igraph::degree(gm),
      edge.width=(edge_attr(gm)$weight)/100, 
      edge.color="grey50", 
      edge.curved=0.5,
      layout=l)

Other cool layouts:

par(mfrow=c(2,2), oma=c(0,0,0,0), mar=c(1,1,1,1))#To plot four plots 
plot(gm, vertex.color=vertex_attr(gm)$cor,vertex.label=NA,
     vertex.size=igraph::degree(gm),
     edge.width=(edge_attr(gm)$weight)/100, 
     edge.color="grey50", 
     edge.curved=0.3,
     layout=layout_in_circle, main="layout_in_circle")

plot(gm, vertex.color=vertex_attr(gm)$cor,
     vertex.size=igraph::degree(gm),
     edge.width=(edge_attr(gm)$weight)/100, 
     edge.color="grey50", 
     #edge.curved=0.3,
     layout=layout_as_tree, main="layout_as_tree")

plot(gm, vertex.color=vertex_attr(gm)$cor,
     vertex.size=igraph::degree(gm),
     edge.width=3*(edge_attr(gm)$weight)/100, 
     edge.color="grey50", 
     #edge.curved=0.3,
     layout=layout_as_star, main="layout_as_star")

Additional layouts

  • Use a simple program
  • List of available layouts and then grep
layouts <- grep("^layout_", ls("package:igraph"), value=TRUE)[-1]
layouts <- layouts[!grepl("bipartite|merge|norm|sugiyama|tree", layouts)]
par(mfrow=c(3,3), mar=c(1,1,1,1))
for (layout in layouts) {
      print(layout)
      l <- do.call(layout, list(gm)) 
      plot(gm, edge.arrow.mode=0, edge.width=1, layout=l, main=layout) }
## [1] "layout_as_star"
## [1] "layout_components"
## [1] "layout_in_circle"
## [1] "layout_nicely"
## [1] "layout_on_grid"
## [1] "layout_on_sphere"
## [1] "layout_randomly"
## [1] "layout_with_dh"
## [1] "layout_with_drl"

## [1] "layout_with_fr"
## [1] "layout_with_gem"
## [1] "layout_with_graphopt"
## [1] "layout_with_kk"
## [1] "layout_with_lgl"
## [1] "layout_with_mds"

Fruchterman-Reingold

  • One of the most used force-directed layout algorithms out there.
  • Force-directed layouts try to get a nice-looking graph where edges are similar in length and cross each other as little as possible.
  • They simulate the graph as a physical system.
    • Nodes are electrically charged particles that repel each other when they get too close.
    • Edges act as springs that attract connected nodes closer together.
    • Thus nodes are evenly distributed through the chart area, and the layout is intuitive in that nodes which share more connections are closer to each other.
  • The disadvantage of these algorithms is that they are rather slow and therefore less often used in graphs larger than ~1000 nodes.
  • With force-directed layouts, you can use the niter parameter to control the number of iterations to perform.
    • Default is set at 500 iterations.
    • Lower that number for large graphs to get results faster and check if they look reasonable.
  • The layout can also interpret edge weights. You can set the “weights” parameter which increases the attraction forces among nodes connected by heavier edges.
l <- layout_with_fr(gm)
plot(gm, layout=l)

l <- layout_with_fr(gm, niter=50)
plot(gm, layout=l)

ws <-  c(1, rep(100, ecount(gm)-1))
lw <- layout_with_fr(gm, weights=ws)
plot(gm, layout=lw)

Fruchterman-Reingold Non-Deterministic

  • Fruchterman-Reingold layout is not deterministic
    • different runs will result in slightly different configurations.
    • Saving the layout in l allows us to get the exact same result multiple times, which can be helpful if you want to plot the time evolution of a graph, or different relationships – and want nodes to stay in the same place in multiple plots.
  • By default, the coordinates of the plots are rescaled to the [-1,1] interval for both x and y.
    • change that with the parameter rescale=FALSE and rescale plot manually by multiplying the coordinates by a scalar.
    • use norm_coords to normalize the plot with the boundaries you want. This way you can create more compact or spread out layout versions.
par(mfrow=c(2,2), mar=c(0,0,0,0))   # plot four figures - 2 rows, 2 columns
plot(gm, layout=layout_with_fr)
plot(gm, layout=layout_with_fr)
plot(gm, layout=l)
plot(gm, layout=l)

l <- layout_with_fr(gm)
l <- norm_coords(l, ymin=-1, ymax=1, xmin=-1, xmax=1)
# dev.off()
par(mfrow=c(2,2), mar=c(0,0,0,0))
plot(gm, rescale=F, layout=l*0.4)
plot(gm, rescale=F, layout=l*0.6)
plot(gm, rescale=F, layout=l*0.8)
plot(gm, rescale=F, layout=l*1.0)

# dev.off()

Kamada Kawai

  • Another popular force-directed algorithm that produces nice results for connected graphs.
  • Like Fruchterman Reingold, it attempts to minimize the energy in a spring system.
l <- layout_with_kk(gm)
plot(gm, layout=l)

Graphopt

  • A nice force-directed layout implemented in igraph that uses layering to help with visualizations of large networks.
  • The available graphopt parameters can be used to change the mass and electric charge of nodes, as well as the optimal spring length and the spring constant for edges.
    • The parameter names are charge (defaults to 0.001),
    • mass (defaults to 30),
    • spring.length (defaults to 0), and
    • spring.constant (defaults to 1).
  • Tweaking those can lead to considerably different graph layouts.
l <- layout_with_graphopt(gm)
plot(gm, layout=l)

# The charge parameter below changes node repulsion:
l1 <- layout_with_graphopt(gm, charge=0.02)
l2 <- layout_with_graphopt(gm, charge=0.00000001)
 
par(mfrow=c(1,2), mar=c(1,1,1,1))
plot(gm, layout=l1)
plot(gm, layout=l2)

#dev.off()

LGL algorithm

  • Meant for large, connected graphs.
  • Here you can also specify a root: a node that will be placed in the middle of the layout.
par(mfrow=c(1,2), mar=c(1,1,1,1))
plot(gm, layout=layout_with_lgl, root = 1)
plot(gm, layout=layout_with_lgl, root = 5)

#dev.off()

The MDS (multidimensional scaling) algorithm

  • Tries to place nodes based on some measure of similarity or distance between them. More similar nodes are plotted closer to each other.
  • By default, the measure used is based on the shortest paths between nodes in the network.
  • We can change that by using our own distance matrix (however defined) with the parameter dist.
  • MDS layouts are nice because positions and distances have a clear interpretation.
  • The problem with them is visual clarity: nodes often overlap, or are placed on top of each other.
plot(gm, layout=layout_with_mds)


Analysis

gm1 <- delete_vertices(gm, "seek")
gm1 <- delete_vertices(gm1, "seduce")
gmc <- cluster_spinglass(gm1)
str(gmc) #evaluating output
## List of 4
##  $ membership : chr [1:11] "Joseph" "say" "father" "brother" ...
##  $ csize      : chr [1:6] "indeed" "do" "good" "see" ...
##  $ modularity : chr [1:10] "eat" "seven" "cow" "dry" ...
##  $ temperature: chr [1:6] "Allah" "know" "most" "he" ...
##  - attr(*, "class")= chr "communities"
gmc$membership #checking to which module each node belongs to 
##  [1] 1 2 1 1 4 1 4 1 1 1 3 3 4 4 1 1 3 3 3 3 3 3 3 2 2 1 2 3 4 4 1 2 2
plot(gm1, vertex.color=vertex_attr(gm1)$cor,
     vertex.size = 3*degree(gm1),
     edge.width = edge_attr(gm1)$coor, 
     edge.color = "grey50",
     edge.curved = 0.3, 
     layout = layout_with_kk,
     mark.groups = gmc, 
     mark.border=NA)

Calculate and plot the shortest path between two nodes

  • To emphasize the path between them, we have to create a new color vector for interactions (edges) the same way we did before for the nodes/vertices:
short.path <- shortest_paths(gm, from = "Joseph", to = "seven", output = "both")
short.path
## $vpath
## $vpath[[1]]
## + 3/35 vertices, named, from 489b8ed:
## [1] Joseph say    seven 
## 
## 
## $epath
## $epath[[1]]
## + 2/50 edges from 489b8ed (vertex names):
## [1] Joseph->say   say   ->seven
## 
## 
## $predecessors
## NULL
## 
## $inbound_edges
## NULL
ecol <- rep("gray80", ecount(gm)) #creating a color vector for all the edges with the color "grey80"; the ecount fucntion tells you how many edges the network has
ecol[unlist(short.path$epath)] <- "red" # coloring in red only the path in which we are interested, that we calculed using the shortest_paths function.
l <- layout_with_kk(gm)
plot(gm, vertex.color=vertex_attr(gm)$cor,
     vertex.size=igraph::degree(gm),
     edge.width=2, 
     edge.color=ecol,
     layout=l)


Using ggraph plots

library(ggraph)
lay = create_layout(gm, layout = "kk")
ggraph(lay) + 
  geom_edge_link() + 
  geom_node_point() +
  theme_graph()

# set theme to graph
# set_graph_style()
# unset graph theme 
# unset_graph_style()

Add node names and edge type

ggraph(lay) + 
     geom_edge_link() + 
     geom_node_point() + 
     geom_node_text(aes(label = name), repel=TRUE)

ggraph(lay) + 
    geom_edge_link() + 
    geom_node_point() + 
    geom_node_text(aes(label = name), repel=TRUE) +
    geom_edge_link(aes(color = cooc))

Plot node degree as node size (and alpha)

ggraph(lay) + 
    geom_node_point() + 
    geom_node_text(aes(label = name), repel=TRUE) +
    geom_node_point(aes(size = igraph::degree(gm)), colour = "#de4e96") +
    geom_edge_link(aes(color = cooc))

# plot node degree as node size and alpha
ggraph(lay) + 
    geom_node_point() + 
    geom_node_text(aes(label = name), repel=TRUE) +
    geom_node_point(aes(size = degree(gm), alpha = degree(gm)), color = "#de4e96") +
    geom_edge_link(aes(color = cooc))

Plot edge weight as edge alpha

ggraph(lay) + 
  geom_edge_link(aes(alpha = cooc)) + 
  geom_node_point()

Layouts in ggraph

  • Some network layouts familiar from igraph plotting:
    • ‘star’, ‘circle’, ‘grid’, ‘sphere’, ‘kk’, ‘fr’, ‘mds’, ‘lgl’, etc.
  • Can use geom_edge_link() for straight edges, geom_edge_arc() for curved ones, and geom_edge_fan() when we want to make sure any overlapping multiplex edges will be fanned out.
  • Can set visual properties for the network plot by using key function parameters.
    • nodes have color, fill, shape, size, and stroke. Edges have color, width, and linetype.
    • alpha parameter controls transparency.
ggraph(gm, layout="kk") +
  geom_edge_link() +
  ggtitle("Title")  # add title to the plot

ggraph(gm, layout="kk") +
  geom_edge_fan(color="gray50", width=0.8, alpha=0.5) + 
  geom_node_point(size=2) +
  theme_void()

Different themes

  • As in ggplot2, we can add different themes to the plot.
    • For a cleaner look, you can use a minimal or empty theme with theme_minimal() or theme_void().
ggraph(gm, layout = 'linear') + 
    geom_edge_arc(color = "orange", width=0.7) +
    geom_node_point(size=3, color="gray50") +
    geom_node_text(aes(label = name), repel=TRUE, size = 3) +
    theme_void()

Mapping aesthetics

  • The ggraph package also uses the traditional ggplot2 way of mapping aesthetics: specifying which elements of the data should correspond to different visual properties of the graphic.
  • Done using the aes() function that matches visual parameters with attribute names from the data.
  • In the code below, the edge attribute type and node attribute audience.size are taken from our data as they are included in the igraph object.
ggraph(gm, layout="kk") +
  geom_edge_link(aes(color = cooc)) +           
  geom_node_point(aes(size = degree(gm))) +  
  theme_void()

Legends

  • One great thing about ggplot2 and ggraph we can see above is that they automatically generate a legend which makes plots easier to interpret.
  • We can add a layer with node labels using geom_node_text() or geom_node_label() which correspond to similar functions in ggplot2.
ggraph(gm,  layout = 'kk') +
  geom_edge_arc(color="gray", curvature=0.3) +            
  geom_node_point(color="#de4e96", aes(size = degree(gm))) +     
  geom_node_text(aes(label = name), size=3, color="darkblue", repel=T) +
  theme_void()

Use edge alpha to indicate direction, direction is from lighter to darker node

ggraph(gm) +
    geom_edge_link(aes(alpha = stat(index)), show.legend = FALSE) +
    geom_node_point()  +
    geom_node_text(aes(label = name), size=3, color="darkblue", repel=T) +
    theme_graph()

ggraph(gm) +
    geom_edge_link(aes(alpha = stat(index)), show.legend = FALSE) +
    geom_edge_link(aes(color = cooc)) +
    geom_node_point()  +
    geom_node_text(aes(label = name), size=3, color="darkblue", repel=T) +
    theme_graph()

Arc and Coord

  • ggraph provides a lot of new layout types and algorithms for our drawing pleasure
ggraph(gm, layout = 'linear') + 
     geom_edge_arc(aes(color = cooc))

ggraph(gm, layout = 'linear',  circular = TRUE) + 
     geom_edge_arc(aes(color = cooc))

ggraph(gm, layout = 'linear',  circular = TRUE) + 
     geom_edge_arc(aes(color = cooc)) +
     geom_node_point() +
     geom_node_text(aes(label = name), size=3, color="darkblue", repel=T)

Fan

  • Sometimes the graph is not simple, i.e. it has multiple edges between the same nodes.
  • Using links is a bad choice here because edges will overlap and the viewer will be unable to discover parallel edges.
  • geom_edge_fan() helps here.
    • If there are no parallel edges it behaves like geom_edge_link() and draws a straight line.
    • If parallel edges exists it will spread them out as arcs with different curvature.
    • Parallel edges will be sorted by directionality prior to plotting so edges flowing in the same direction will be plotted together.
ggraph(gm, layout = 'kk') + 
     geom_edge_fan(aes(color = cooc, width = cooc))

ggraph(gm, layout = 'kk') + 
     geom_edge_fan(aes(color = cooc, width = cooc)) +
     geom_node_point() +
     geom_node_text(aes(label = name), size=3, color="darkred", repel=T)

Density

  • Consider the case where it is of interest to see which types of edges dominates certain areas of the graph.
  • You can colour the edges, but edges can tend to get overplotted, thus reducing readability.
  • geom_edge_density() lets you add a shading to your plot based on the density of edges in a certain area:
ggraph(gm, layout = 'kk') + 
     geom_edge_density(aes(fill = cooc)) + 
     geom_edge_link(alpha = 0.25, color = "steelblue") +
     geom_node_point(color="#de4e96", aes(size = igraph::degree(gm)))

Other ways to represent a network

  • ggraph offers a number of other interesting ways to represent networks, including dendrograms, treemaps, hive plots, and circle plots.
  • a quick heatmap of the network matrix.
netm <- get.adjacency(gm, attr="cooc", sparse=F)
colnames(netm) <- V(gm)$name
rownames(netm) <- V(gm)$name
palf <- colorRampPalette(c("yellow", "red")) 
heatmap(netm[,17:1], Rowv = NA, Colv = NA, col = palf(100), 
        scale="none", margins=c(10,10) )

Plot the degree distribution for our network:

  • Depending on what properties of the network or its nodes and edges are most important to you, simple graphs can often be more informative than network maps
deg.dist <- degree_distribution(gm, cumulative=T, mode="all")
deg.dist
##  [1] 1.00000000 1.00000000 0.45714286 0.28571429 0.28571429 0.22857143
##  [7] 0.11428571 0.08571429 0.08571429 0.08571429 0.08571429 0.02857143
## [13] 0.02857143 0.02857143 0.02857143 0.02857143
plot( x=0:max(degree(gm)), y=1-deg.dist, pch=19, cex=1.2, col="orange", 
      xlab="Degree", ylab="Cumulative Frequency")

Fun With Network Graphs

wordnetwork <- head(cooccur, 100)
gm1 <- graph_from_data_frame(wordnetwork)
ggraph(gm1, layout = 'kk') + 
      geom_edge_density(aes(fill = cooc)) + 
      geom_edge_link(alpha = 0.7, color = "#57d3e6") +
      geom_node_point(aes(size = degree(gm1)), colour = "#a83268") +
      geom_node_text(aes(label = name), size = 3, repel=TRUE) +
      ggtitle("Network of Top 100 Words")

ggraph(gm1, layout = 'kk') + 
      geom_edge_density(aes(fill = cooc)) + 
      geom_edge_link(alpha = 0.7, color = "#57d3e6") +
      geom_node_point(aes(size = degree(gm1)),
                      colour = "#a83268") +
      ggtitle("Network of Top 100 Words")

ggraph(gm1, layout = 'kk') + 
      geom_edge_density(aes(fill = cooc)) + 
      geom_edge_link(alpha = 0.7, color = "#57d3e6") +
      geom_node_point(aes(size = degree(gm1)),
                      colour = "#a83268", show.legend = FALSE) +
      ggtitle("Network of Top 100 Words")

ggraph(gm1, layout = 'linear', circular = TRUE) + 
     geom_edge_arc(color = "orange", width=0.7) +
     geom_node_point(aes(size = degree(gm1)), alpha = igraph::degree(gm1), 
                     colour = "#a83268") +
     geom_node_text(aes(label = name), size = 3, repel=TRUE) +
     theme_void() +
     ggtitle("Network of Top 100 Words In Circular Layout")

ggraph(gm1, layout = 'linear', circular = TRUE) + 
     geom_edge_arc(color = "#57d3e6", width=0.7) +
     geom_node_point(aes(size = degree(gm1)), alpha = igraph::degree(gm1), 
                     colour = "#a83268") +
     geom_node_text(aes(label = name), size = 3, repel=TRUE) +
     theme_void() +
     ggtitle("Network of Top 100 Words In Circular Layout")

Working With a Somewhat Bigger Graph

wordnetwork <- head(cooccur, 2000)
jg <- graph_from_data_frame(wordnetwork)

ggraph(jg, layout = 'kk') + 
      geom_edge_density(aes(fill = cooc)) + 
      geom_edge_link(alpha = 0.7, color = "#57d3e6") +
      geom_node_point(aes(size = igraph::degree(jg)), 
                      colour = "#a83268", show.legend = FALSE) +
      ggtitle("Data Art of Top 2000 Coccurrences")

The plot shows all the 421 nodes are connected? Perhaps we can do other analysis.

is.connected(jg) # Is it connected?
## [1] FALSE
no.clusters(jg) # How many components?
## [1] 3
table(clusters(jg)$csize) # How big are these?
## 
##   2 417 
##   2   1
max(degree(jg, mode="in")) # Vertex degree
## [1] 77
max(degree(jg, mode="out"))
## [1] 120
max(degree(jg, mode="all"))
## [1] 126
# In-degree distribution
plot(degree.distribution(jg, mode="in"), log="xy")

# Out-degree distribution
plot(degree.distribution(jg, mode="out"), log="xy")

is.connected(jg) being FALSE says that there are some nodes not connected.

lay = create_layout(jg, layout = "fr")
ggraph(lay) + 
   geom_edge_link() + 
   geom_node_point() +
   geom_node_text(aes(label = name), size = 3)

The rather ugly plot shows the 4 nodes. Let us delete and repeat the cluster analysis. But instead of using the delete_vertices() function like in an earlier example, we just use the main component. First, find the components and then subset the graph based on those components. In igraph the largest component is not always the first one with id == 1.

Taking the largest component

cl <- clusters(jg)
jg1 <- induced_subgraph(jg, which(cl$membership == which.max(cl$csize)))
summary(jg1)
## IGRAPH 9126a4e DN-- 417 1998 -- 
## + attr: name (v/c), cooc (e/n)

Another approach is

jg2 <- induced_subgraph(jg, 
          V(jg)[components(jg)$membership == which.max(components(jg)$csize)])
jg2
## IGRAPH 912baa5 DN-- 417 1998 -- 
## + attr: name (v/c), cooc (e/n)
## + edges from 912baa5 (vertex names):
##  [1] Joseph ->say     indeed ->say     say    ->when    father ->say    
##  [5] Allah  ->say     Allah  ->know    Joseph ->when    Allah  ->indeed 
##  [9] father ->indeed  brother->say     know   ->say     say    ->so     
## [13] one    ->say     indeed ->see     so     ->when    bag    ->brother
## [17] brother->Joseph  father ->Joseph  father ->when    eat    ->say    
## [21] say    ->see     eat    ->seven   say    ->seven   brother->so     
## [25] seven  ->spike   brother->when    indeed ->Joseph  Allah  ->most   
## [29] most   ->people  he     ->rely    enter  ->say     give   ->say    
## + ... omitted several edges
graph.density(jg2) # Density
## [1] 0.01151771
transitivity(jg2) # Transitivity
## [1] 0.2413977
# Transitivity of a random graph of the same size
g <- erdos.renyi.game(vcount(jg2), ecount(jg2), type="gnm")
transitivity(g)
## [1] 0.02121244
# Transitivity of a random graph with the same degrees
g2 <- degree.sequence.game(degree(jg2,mode="all"), method="vl")
transitivity(g2)
## [1] 0.1108868

Community Structure Detection

fc <- fastgreedy.community(simplify(as.undirected(jg2)))
plot(jg2, vertex.color=vertex_attr(jg2)$cor,
      vertex.size = 2,
      vertex.label=NA,
      edge.width = NA, 
      edge.color = NA,
      layout = layout_with_kk,
      mark.groups = fc, 
      mark.border=NA)

igraph documentation lists the following for clusters and communities. Interested readers can explore the different functions following the example above.

  • cluster.distribution Connected components of a graph
  • clusters Connected components of a graph
  • cluster_edge_betweenness Community structure detection based on edge betweenness
  • cluster_fast_greedy Community structure via greedy optimization of modularity
  • cluster_infomap Infomap community finding
  • cluster_label_prop Finding communities based on propagating labels
  • cluster_leading_eigen Community structure detecting based on the leading eigenvector of the community matrix
  • cluster_louvain Finding community structure by multi-level optimization of modularity
  • cluster_optimal Optimal community structure
  • cluster_spinglass Finding communities in graphs based on statistical meachanics
  • cluster_walktrap Community strucure via short random walks

I published some work on statistical mechanics using the spin glass model a long time ago, hence the first choice. Let us explore cluster_edge_betweenness.

fc <- cluster_edge_betweenness(simplify(as.undirected(jg2)))
plot(jg2, vertex.color=vertex_attr(jg2)$cor,
      vertex.size = 2,
      vertex.label=NA,
      edge.width = NA, 
      edge.color = NA,
      layout = layout_with_kk,
      mark.groups = fc, 
      mark.border=NA)


Example using tidygraph

I plan to cover more examples on tidygraph in a future post.This example also introduces the use of user built functions in R. We use the gm1 network (with 100 cooccurrences ) * Plot evolution of reach from hub to neighbors + nearest + next nearest

library(tidygraph)
library(ggraph)
tbl_graph <- as_tbl_graph(simplify(as.undirected(gm1)))
hub_id <- which.max(degree(gm1))
tbl_graph
## # A tbl_graph: 57 nodes and 100 edges
## #
## # An undirected simple graph with 2 components
## #
## # Node Data: 57 x 1 (active)
##   name   
##   <chr>  
## 1 Joseph 
## 2 indeed 
## 3 say    
## 4 father 
## 5 Allah  
## 6 brother
## # ... with 51 more rows
## #
## # Edge Data: 100 x 2
##    from    to
##   <int> <int>
## 1     1     2
## 2     1     3
## 3     1     4
## # ... with 97 more rows
hub_id
## say 
##   3
tbl_graph <- tbl_graph %>%
   activate(nodes) %>%
   mutate(
     hub_dist = replace_na(bfs_dist(root = hub_id), Inf),
     degree = degree(gm1),
     cluster = as.factor(group_infomap())
   )
tbl_graph
## # A tbl_graph: 57 nodes and 100 edges
## #
## # An undirected simple graph with 2 components
## #
## # Node Data: 57 x 4 (active)
##   name    hub_dist degree cluster
##   <chr>      <dbl>  <dbl> <fct>  
## 1 Joseph         1      9 1      
## 2 indeed         1     12 1      
## 3 say            0     30 1      
## 4 father         1      9 1      
## 5 Allah          1     13 3      
## 6 brother        1      7 1      
## # ... with 51 more rows
## #
## # Edge Data: 100 x 2
##    from    to
##   <int> <int>
## 1     1     2
## 2     1     3
## 3     1     4
## # ... with 97 more rows
basic <- tbl_graph %>%
   ggraph(layout = "kk") +
   geom_edge_link(width = .1) +
   geom_node_point(aes(size = degree, color = degree)) +
   coord_fixed() +
   guides(size = FALSE)
plot(basic)

cluster <- tbl_graph %>%
   ggraph(layout = "kk") +
   geom_edge_link(width = .1) +
   geom_node_point(aes(size = degree, color = cluster)) +
   coord_fixed() +
   theme(legend.position = "none")
plot(cluster)

# Define function
reach_graph <- function(n) {
   tbl_graph %>%
     activate(nodes) %>%
     mutate(
       reach = n,
       reachable = ifelse(hub_dist <= n, "reachable", "non_reachable"),
       reachable = ifelse(hub_dist == 0, "Hub", reachable)
     )
 }
evolving_graph <- bind_graphs(reach_graph(0), reach_graph(1), reach_graph(2), reach_graph(3))
evol <- evolving_graph %>%
   ggraph(layout = "kk") +
   geom_edge_link(width = .1, alpha = .2) +
   geom_node_point(aes(size = degree*0.1, color = reachable)) +
   scale_size(range = c(1, 2)) +
   scale_color_manual(values = c("Hub" = "blue",
                                 "non_reachable" = "#00BFC4",
                                 "reachable" = "#F8766D","")) +
   coord_fixed() +
   facet_nodes(~reach, ncol = 2, nrow = 2, labeller = label_both) +
   theme(legend.position = "none") +
   labs(title = "3 Step Reach From Hub",
       subtitle = "Neighbor, Nearest Neighbor, and Next Nearest Neighbor",
       caption = "Virus Spread")
plot(evol)

# Plot 2nd degree reach separately
reach_graph(1) %>%
    ggraph(layout = "kk") +
    geom_edge_link(width = .1, alpha = .2) +
    geom_node_point(aes(size = degree, color = reachable)) +
    scale_color_manual(values = c("Hub" = "blue",
                                  "non_reachable" = "#00BFC4",
                                  "reachable" = "#F8766D","")) +
    coord_fixed() +
    theme_graph() +
    labs(title = "2 Step Reach From Hub",
       subtitle = "Neighbor, Nearest Neighbor",
       caption = "Virus Spread")

The big blue blob is the node “say” with the highest degree, as the network hub. It takes 3 steps for it to “reach” the other nodes. The hub is like a super spreader in a virus network. It will never reach the two isolated nodes. (Hence, the isolation of clusters in pandemics.)

Larger Network

tbl_graph <- as_tbl_graph(simplify(as.undirected(jg)))
hub_id <- which.max(degree(jg))
tbl_graph <- tbl_graph %>%
   activate(nodes) %>%
   mutate(
     hub_dist = replace_na(bfs_dist(root = hub_id), Inf),
     degree = degree(jg),
     cluster = as.factor(group_infomap())
   )
# Plot 2nd degree reach separately
reach_graph(1) %>%
    ggraph(layout = "kk") +
    geom_edge_link(width = .1, alpha = .2) +
    geom_node_point(aes(size = degree, color = reachable)) +
    scale_color_manual(values = c("Hub" = "blue",
                                  "non_reachable" = "#00BFC4",
                                  "reachable" = "#F8766D","")) +
    coord_fixed() +
    theme_graph() +
    labs(title = "2 Step Reach From Hub",
       subtitle = "Neighbor, Nearest Neighbor",
       caption = "Virus Spread")

Summary

Our work on #qurananalytics rely heavily on the use of #networkscience. In this post we introduce the tools available in R for creating, plotting and analyzing graph networks.

We plan to further explore the tidygraph package and also some of the theoretical analysis on networks like issues of centrality, density etc. We have seen in the final example of concepts like “hub” (most central node) and “spread” for our Surah Yusuf word occurrence network is similar to virus networks.

References

  1. https://bnosac.github.io/udpipe/docs/doc5.html https://doi.org/10.21105/joss.00774
  2. Work on dolphins by http://users.dimi.uniud.it/~massimo.franceschet/ns/syllabus/make/ggraph/ggraph.html
  3. Network Analysis and Visualization with R and igraph
  4. https://www.r-graph-gallery.com/network.html
  5. https://www.data-imaginist.com/2017/ggraph-introduction-layouts/