1 Introduction

This post is a continuation of our previous post on our project on #qurananalytics. In the summary, we mentioned that the study opens other investigation avenues like looking into other Surahs of the Quran or analyzing different aspects of the Quran structure. So, in this post we analyze Surah Taa-Haa (Surah 20) which details the story of Prophet Musa (Moses/Mose). It will also serve as a simple tutorial on the characteristics and statistics of networks. All our posts on #qurananalytics so far, have used the tools of #networkscience with the igraph, tidygraph and ggraph packages. We will also discuss centrality and other network graph characteristics while exploring some layouts provided by tidygraph and ggraph.

2 Load Packages and Libraries

packages=c('dplyr', 'tidyverse', 'udpipe', 'ggplot2', 'graphlayouts',
           'igraph', 'tidygraph', 'ggraph', 'knitr', 'quRan')
for (p in packages){
  if (! require (p,character.only = T)){
    install.packages(p)
  }
library(p,character.only = T)
}

3 Getting the Language Model Ready

# during first time model download execute the below line too
# udmodel <- udpipe_download_model(language = "english")
setwd("F:/RProjects")
# Load the model
udmodel <- udpipe_load_model(file = 'english-ewt-ud-2.5-191206.udpipe')

3.1 Start with annotating

We start by annotating Surah Taa-Haa. The annotated dataframe can next be used for basic text analytics.

# Select the surah
Q01 <- quran_en_sahih %>% filter(surah == 20)
x <- udpipe_annotate(udmodel, x = Q01$text, doc_id = Q01$ayah_title)
x <- as.data.frame(x)

4 Word Co-occurrences

Analyzing multi-word expressions should be interesting. We can get multi-word expressions by looking either at collocations (words following one another), at word co-occurrences within each sentence or at word co-occurrences of words which are close in the neighborhood of one another.

Co-occurrences allow to see how words are used either in the same sentence or next to each other. The udpipe package makes creating co-occurrence graphs using the relevant POS (Parts of Speech) tags easy.

4.1 Nouns, adjectives, verbs, and adverbs used in same sentence

We look how many times nouns, proper nouns, adjectives, verbs, adverbs, and numbers are used in the same verse.

cooccur <- cooccurrence(x = subset(x, upos %in% c("NOUN", "PROPN", "VERB",
                                                  "ADJ", "ADV", "NUM")), 
                     term = "lemma", 
                     group = c("doc_id", "paragraph_id", "sentence_id"))
head(cooccur)

The result can be easily visualized using the igraph and ggraph R packages. We chose the top 50 occurrences for the tutorial so the plots do not clutter.

library(igraph)
library(ggraph)
library(ggplot2)
wordnetwork <- head(cooccur, 50)
wordnetwork <- graph_from_data_frame(wordnetwork)
wordnetwork

## IGRAPH 0b65863 DN-- 41 50 -- 
## + attr: name (v/c), cooc (e/n)
## + edges from 0b65863 (vertex names):
##  [1] Mose    ->say        Lord    ->say        Allah   ->say       
##  [4] say     ->so         indeed  ->say        Lord    ->so        
##  [7] say     ->throw      indeed  ->Lord       say     ->when      
## [10] enduring->more       Allah   ->fear       Lord    ->Mose      
## [13] day     ->say        people  ->say        Pharaoh ->say       
## [16] say     ->see        indeed  ->so         so      ->then      
## [19] cast    ->enemy      away    ->indeed     come    ->Mose      
## [22] Merciful->most       Mose    ->Pharaoh    more    ->punishment
## + ... omitted several edges

ggraph(wordnetwork, layout = "kk") +
  geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_color = "deeppink") +
  geom_node_point(aes(size = igraph::degree(wordnetwork)), shape = 1, color = "black") +
  geom_node_text(aes(label = name), col = "darkblue", size = 3) +
  labs(title = "Co-occurrences Within Sentence",
       subtitle = "Top 50 Nouns, Names, Adjectives, Verbs, Adverbs",
       caption = "Surah Taa-Haa (Saheeh International)")

The base wordnetwork graph is a directed network with 41 nodes and 50 edges (DN– 41 50 –). The nodes have the name attribute. The edges have the cooc attribute (+ attr: name (v/c), cooc (e/n))

The story is revealed by Allah (SWT). The main characters are Mose, his brother Aaron, Pharaoh, and the magicians. So the verb “say” dominates since it is a narrated story. It is interesting to see the strong link and occurrence of “fear” with “Allah”.

5 Network Analysis and Characteristics

This “introductory tutorial” should be useful for those new to #networkscience.¹ We will be using the network graph tools frequently in our work on #qurananalytics, thus a tutorial on the network characteristics using an example from the Quran should be helpful.

We will be looking at various functions related to

This set of functions provide wrappers to a number of graph statistic algorithms in ìgraph. They are intended for use inside the tidygraph framework and some should not be called directly. Thus we will mix and match the use of the relevant igraph and/or tidygraph functions.

We will follow the structured presentation on Network Characteristics² and Centrality Measures³ or easy reference.

5.1 Network characteristics

The Appendix section shows the results of the selected igraph functions for our wordnetwork graph that we may not use in the following discussion.

Our wordnetwork is a directed network. We will also create an undirected version of network.

gd <- wordnetwork
gu <- simplify(as.undirected(wordnetwork))
gd

## IGRAPH 0b65863 DN-- 41 50 -- 
## + attr: name (v/c), cooc (e/n)
## + edges from 0b65863 (vertex names):
##  [1] Mose    ->say        Lord    ->say        Allah   ->say       
##  [4] say     ->so         indeed  ->say        Lord    ->so        
##  [7] say     ->throw      indeed  ->Lord       say     ->when      
## [10] enduring->more       Allah   ->fear       Lord    ->Mose      
## [13] day     ->say        people  ->say        Pharaoh ->say       
## [16] say     ->see        indeed  ->so         so      ->then      
## [19] cast    ->enemy      away    ->indeed     come    ->Mose      
## [22] Merciful->most       Mose    ->Pharaoh    more    ->punishment
## + ... omitted several edges

gu

## IGRAPH 0bec5fe UN-- 41 50 -- 
## + attr: name (v/c)
## + edges from 0bec5fe (vertex names):
##  [1] Mose    --Lord    Mose    --say     Mose    --Pharaoh Mose    --come   
##  [5] Lord    --say     Lord    --indeed  Lord    --so      Allah   --say    
##  [9] Allah   --fear    say     --indeed  say     --day     say     --people 
## [13] say     --Pharaoh say     --so      say     --enemy   say     --fear   
## [17] say     --fire    say     --go      say     --hasten  say     --throw  
## [21] say     --when    say     --see     say     --then    say     --thus   
## [25] indeed  --so      indeed  --away    indeed  --come    indeed  --fire   
## [29] indeed  --do      indeed  --therein enduring--more    Pharaoh --so     
## + ... omitted several edges

The igraph notation for directed edges uses -> (Mose ->say) and – for undirected edges (Mose –Allah).

Generally speaking, we can measure network properties at the level of nodes (centrality measures) or at the level of the network (global measures). If we are interested in the position of nodes within a network, then we are measuring something at the node-level. If we want to understand the structure of the network as a whole, we are measuring something at the network-level. Network analysis often combines both.

We will use the two networks, gu and gd, that we created in this section. For the early examples we will mainly use the igraph package and the basic plot functions. In the later sections we will show some of the same examples together with new ones using the tidygraph and ggraph packages.

5.2 Plot the network again

set.seed(10)
l = layout_with_kk(gd)
plot(gd, layout=l, vertex.label="", vertex.color="gold", edge.color="limegreen", edge.width = E(gd)$cooc)

# Using ggraph
ggraph(gd, layout = "kk") +
  geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_color = "deeppink") +
  geom_node_point(size = 3, color = "steelblue") +
  labs(title = "Co-occurrences Within Sentence",
       subtitle = "Top 50 Nouns, Names, Adjectives, Verbs, Adverbs",
       caption = "Surah Taa-Haa (Saheeh International)")

5.3 Centrality Measures (node-level measures)

Centrality relates to measures of a node’s position in the network. The main objective is to understand the position and/or importance of a node in the network. The individual characteristics of nodes can be described by⁴

Degree
Clustering
Distance to other nodes

The centrality of a node reflects its influence, power, and importance. There are four different types of centrality measures.

Degree - connectedness
Eigenvectors - Influence, Prestige, “not what you know, but who you know”
Betweenness - importance as an intermediary, connector
Closeness, Decay – ease of reaching other nodes

There are many such centrality measures. It can be difficult to go through all of the available centrality measures. We will introduce just a few examples.

Degree centrality
Betweenness centrality
Closeness centrality
Eigenvector centrality
PageRank centrality

Appendix 1 shows other measures.

5.3.1 Degree and Strength

The most straight-forward centrality measure is degree centrality. Degree centrality is simply the number of edges connected to a given node. In a social network, this might mean the number of friends an individual has. We will calculate and visualize the degree centrality by varying the node sizes proportional to degree centrality.

degree(gd)

##       Mose       Lord      Allah        say     indeed   enduring        day 
##          4          4          2         18          8          1          1 
##     people    Pharaoh         so       cast       away       come   Merciful 
##          1          3          8          2          1          3          1 
##       more      enemy       fear       fire         go     hasten     follow 
##          4          3          2          7          1          1          1 
##  messenger         be      bring     family       find         do      throw 
##          1          1          1          1          1          1          2 
##       when        see       then       most punishment      river     severe 
##          1          1          3          1          1          2          1 
##     surely    therein       thus      deity   guidance       here 
##          1          1          1          1          1          1

set.seed(10)
deg = igraph::degree(gd)
sort(deg, decreasing = TRUE)

##        say     indeed         so       fire       Mose       Lord       more 
##         18          8          8          7          4          4          4 
##    Pharaoh       come      enemy       then      Allah       cast       fear 
##          3          3          3          3          2          2          2 
##      throw      river   enduring        day     people       away   Merciful 
##          2          2          1          1          1          1          1 
##         go     hasten     follow  messenger         be      bring     family 
##          1          1          1          1          1          1          1 
##       find         do       when        see       most punishment     severe 
##          1          1          1          1          1          1          1 
##     surely    therein       thus      deity   guidance       here 
##          1          1          1          1          1          1

plot(gd, layout=l, vertex.label="", vertex.color="gold", edge.color="royalblue", vertex.size=deg*2, edge.width=E(gd)$coor)

# Using ggraph
ggraph(gd, layout = "kk") +
  geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_color = "lightseagreen") +
  geom_node_point(size = deg, color = "gold3") +
  labs(title = "Word Co-occurrences Network",
       subtitle = "Node size = degree",
       caption = "Surah Taa-Haa (Saheeh International)")

In weighted networks, we can also use node strength, which is the sum of the weights of edges connected to the node. Let’s calculate node strength and plot the node sizes as proportional to these values.

set.seed(10)
st = graph.strength(gd)
sort(st, decreasing = TRUE)

##        say     indeed         so       fire       Mose       Lord       more 
##         18          8          8          7          4          4          4 
##    Pharaoh       come      enemy       then      Allah       cast       fear 
##          3          3          3          3          2          2          2 
##      throw      river   enduring        day     people       away   Merciful 
##          2          2          1          1          1          1          1 
##         go     hasten     follow  messenger         be      bring     family 
##          1          1          1          1          1          1          1 
##       find         do       when        see       most punishment     severe 
##          1          1          1          1          1          1          1 
##     surely    therein       thus      deity   guidance       here 
##          1          1          1          1          1          1

plot(gd, layout=l, vertex.label="", vertex.color="gold", edge.color="royalblue", edge.width=E(gd)$coor, vertex.size=st)

# Using ggraph
ggraph(gd, layout = "kk") +
  geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_color = "lightseagreen") +
  geom_node_point(size = st, color = "gold3") +
  geom_node_text(aes(filter=(st >= 3), size=st*2, label=name), repel=F) +
  labs(title = "Word Co-occurrences Network",
       subtitle = "Node size = graph.strength",
       caption = "Surah Taa-Haa (Saheeh International)")

Compare the relative node sizes when plotting by degree vs. strength. What differences do you notice?. The top six words are the same say (18), indeed (8), so (8), fire (7), Mose (4), Lord (4).

5.3.2 Degree Distribution

Degree distribution: A frequency count of the occurrence of each degree.

Average degree: Let N be the number of nodes, and L be the number of edges: = 2L/N = 2(50)/41 = 2.439 for gu.

sort(degree(gu), decreasing = TRUE)

##        say     indeed         so       fire       Mose       Lord       more 
##         18          8          8          7          4          4          4 
##    Pharaoh       come      enemy       then      Allah       cast       fear 
##          3          3          3          3          2          2          2 
##      throw      river   enduring        day     people       away   Merciful 
##          2          2          1          1          1          1          1 
##         go     hasten     follow  messenger         be      bring     family 
##          1          1          1          1          1          1          1 
##       find         do       when        see       most punishment     severe 
##          1          1          1          1          1          1          1 
##     surely    therein       thus      deity   guidance       here 
##          1          1          1          1          1          1

mean(degree(gu))

## [1] 2.439024

degree(gu) %>% sum()

## [1] 100

degree.distribution(gu)

##  [1] 0.00000000 0.60975610 0.12195122 0.09756098 0.07317073 0.00000000
##  [7] 0.00000000 0.02439024 0.04878049 0.00000000 0.00000000 0.00000000
## [13] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## [19] 0.02439024

hist(degree.distribution(gu))

# Using ggplot
# Let's count the frequencies of each degree
deg = degree(gu)
d.histogram <- as.data.frame(table(deg))

# Need to convert the first column to numbers, otherwise
# the log-log thing will not work
d.histogram[,1] <- as.numeric(d.histogram[,1])

# Now, plot it!
ggplot(d.histogram, aes(x = deg, y = Freq)) +
  geom_point() +
  scale_x_continuous("Degree\n(nodes with this amount of connections)",
                     breaks = c(1, 3, 10, 30, 100, 300),
                     trans = "log10") +
  scale_y_continuous("Frequency\n(how many of them)",
                     breaks = c(1, 3, 10, 30, 100, 300, 1000),
                     trans = "log10") +
  ggtitle("Degree Distribution (log-log)")

5.3.3 Degree and degree distribution for directed graph

In-degree of any node i: the number of nodes ending at i.
Out-degree of any node i: the number of nodes originating from i.
Every loop adds one degree to each of the in-degree and out-degree of a node.
Degree distribution: A frequency count of the occurrence of each degree
Average degree: let N = number of nodes, and L = number of arcs:
- = = L/N = 50/41 = 1.219, for gd.

degree(gd,mode="in",loops = FALSE)

##       Mose       Lord      Allah        say     indeed   enduring        day 
##          2          1          0         12          4          0          0 
##     people    Pharaoh         so       cast       away       come   Merciful 
##          0          1          6          0          0          0          0 
##       more      enemy       fear       fire         go     hasten     follow 
##          1          1          1          3          0          0          0 
##  messenger         be      bring     family       find         do      throw 
##          0          0          0          0          0          0          2 
##       when        see       then       most punishment      river     severe 
##          1          1          3          1          1          2          1 
##     surely    therein       thus      deity   guidance       here 
##          1          1          1          1          1          1

degree(gd,mode="out",loops = FALSE)

##       Mose       Lord      Allah        say     indeed   enduring        day 
##          2          3          2          6          4          1          1 
##     people    Pharaoh         so       cast       away       come   Merciful 
##          1          2          2          2          1          3          1 
##       more      enemy       fear       fire         go     hasten     follow 
##          3          2          1          4          1          1          1 
##  messenger         be      bring     family       find         do      throw 
##          1          1          1          1          1          1          0 
##       when        see       then       most punishment      river     severe 
##          0          0          0          0          0          0          0 
##     surely    therein       thus      deity   guidance       here 
##          0          0          0          0          0          0

degree(gd,mode="in",loops = FALSE) %>% mean()

## [1] 1.219512

degree(gd,mode="out",loops = FALSE) %>% mean()

## [1] 1.219512

hist(degree.distribution(gd, mode="in"))

hist(degree.distribution(gd, mode="out"))

5.3.4 Why do we care about degree?

Simplest, yet very illuminating centrality measure in a network:
- In a social network, the ones who have connections to many others might have more influence, more access to information, or more prestige than those who have fewer connections.
The degree is the immediate risk of a node for catching whatever is flowing through the network (such as a virus, or some information)

5.3.5 Betweenness

We now do the same for betweenness centrality. It is defined as the number of geodesic paths (shortest paths) that go through a given node. Nodes with high betweenness might be influential in a network if, for example, they capture the most amount of information flowing through the network because the information tends to flow through them.

Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes.
It was introduced as a measure for quantifying the control of a human on the communication between other humans in a social network.
In this conception, nodes that have a high probability to occur on a randomly chosen shortest path between two randomly chosen nodes have a high betweenness.

betw = betweenness(gd, normalized=F)
plot(gd, layout=l, vertex.label="", vertex.color="gold", edge.color="royalblue", vertex.size=betw*0.2, edge.width=E(gd)$cooc)

# calculate the betweenness centrality
sort(betweenness(gu), decreasing = TRUE)

##        say       fire     indeed         so      enemy       Mose       then 
## 338.600000 140.000000 111.733333  67.566667  58.000000   6.366667   6.233333 
##       more       come       Lord    Pharaoh      Allah   enduring        day 
##   6.000000   3.166667   2.333333   1.000000   0.000000   0.000000   0.000000 
##     people       cast       away   Merciful       fear         go     hasten 
##   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000 
##     follow  messenger         be      bring     family       find         do 
##   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000 
##      throw       when        see       most punishment      river     severe 
##   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000 
##     surely    therein       thus      deity   guidance       here 
##   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000

# calculate the standardized betweenness centrality
betwS <- 2*betweenness(gu)/((vcount(gu) - 1)*(vcount(gu)-2))
sort(betwS, decreasing = TRUE)

##         say        fire      indeed          so       enemy        Mose 
## 0.434102564 0.179487179 0.143247863 0.086623932 0.074358974 0.008162393 
##        then        more        come        Lord     Pharaoh       Allah 
## 0.007991453 0.007692308 0.004059829 0.002991453 0.001282051 0.000000000 
##    enduring         day      people        cast        away    Merciful 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##        fear          go      hasten      follow   messenger          be 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       bring      family        find          do       throw        when 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##         see        most  punishment       river      severe      surely 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##     therein        thus       deity    guidance        here 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000

# Using ggraph
ggraph(gd, layout = "kk") +
  geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_color = "lightseagreen") +
  geom_node_point(size = betw*0.2, color = "gold3") +
  geom_node_text(aes(filter=(betw >= 5), size=betw*2, label=name), repel=F) +
  labs(title = "Word Co-occurrences Network",
       subtitle = "Node size = betweenness centrality",
       caption = "Surah Taa-Haa (Saheeh International)")

We can see that there are three nodes (words = “say”, “indeed”, “fire”) that have qualitatively higher betweenness values than all other nodes in the network. One way to interpret this is that these are nodes that tend to act as “bridges” between different clusters of nodes in the network.

5.3.6 Degree centrality for undirected graph

The nodes with higher degree are more central.
Degree is simply the number of nodes at distance one.
Though simple, degree is often a highly effective measure of the influence or importance of a node:
- In many social settings people with more connections tend to have more power and are more visible.
Group-level centralization: degree, as an individual-level centrality measure, has a distribution which can be summarized by its mean and variance as is commonly practiced in data analysis.

# calculate the degree centrality for business network
deg <- degree(gu, loops = FALSE)
sort(deg, decreasing = TRUE)

##        say     indeed         so       fire       Mose       Lord       more 
##         18          8          8          7          4          4          4 
##    Pharaoh       come      enemy       then      Allah       cast       fear 
##          3          3          3          3          2          2          2 
##      throw      river   enduring        day     people       away   Merciful 
##          2          2          1          1          1          1          1 
##         go     hasten     follow  messenger         be      bring     family 
##          1          1          1          1          1          1          1 
##       find         do       when        see       most punishment     severe 
##          1          1          1          1          1          1          1 
##     surely    therein       thus      deity   guidance       here 
##          1          1          1          1          1          1

# calculate the standardized degree centrality
degS <- degree(gu, loops = FALSE)/(vcount(gu) - 1)
sort(degS, decreasing = TRUE)

##        say     indeed         so       fire       Mose       Lord       more 
##      0.450      0.200      0.200      0.175      0.100      0.100      0.100 
##    Pharaoh       come      enemy       then      Allah       cast       fear 
##      0.075      0.075      0.075      0.075      0.050      0.050      0.050 
##      throw      river   enduring        day     people       away   Merciful 
##      0.050      0.050      0.025      0.025      0.025      0.025      0.025 
##         go     hasten     follow  messenger         be      bring     family 
##      0.025      0.025      0.025      0.025      0.025      0.025      0.025 
##       find         do       when        see       most punishment     severe 
##      0.025      0.025      0.025      0.025      0.025      0.025      0.025 
##     surely    therein       thus      deity   guidance       here 
##      0.025      0.025      0.025      0.025      0.025      0.025

sort(deg, decreasing = TRUE) %>% hist()

5.3.7 Outdegree centrality and indegree prestige

The nodes with higher out-degree are more central (choices made).
The nodes with higher in-degree are more prestigious (choices received).

sort(degree(gd, mode='in'), decreasing = TRUE) %>% head(10)

##     say      so  indeed    fire    then    Mose   throw   river    Lord Pharaoh 
##      12       6       4       3       3       2       2       2       1       1

sort(degree(gd, mode='out'), decreasing = TRUE) %>% head(10)

##     say  indeed    fire    Lord    come    more    Mose   Allah Pharaoh      so 
##       6       4       4       3       3       3       2       2       2       2

plot(gd)

What does this say about the importance of these nodes? Well, that depends on the network and the questions–in particular how we might quantify ‘importance’ in our network. But clearly “say”, “indeed” “Mose”, “Allah, Lord”, “Pharaoh” are important words in the Sura. We have explained the importance of “say” in that the Surah is telling a story. “indeed” shows that the Quran is stressing the truth of some of the narrations.

Here’s a short list of some commonly-used centrality measures:

degree() - Number of edges connected to node
graph.strength() - Sum of edge weights connected to a node (aka weighted degree)
betweenness() - Number of geodesic paths that go through a given node
closeness() - Number of steps required to access every other node from a given node
eigen_centrality() - Values of the first eigenvector of the graph adjacency matrix. The values are high for nodes that are connected to many other nodes that are, in turn, connected many others, etc.

5.3.8 Closeness centrality for undirected graph

The farness/peripherality of a node v is defined as the sum of its distances to all other nodes
The closeness is defined as the inverse of the farness.
For comparison purpose, we can standardize the closeness by dividing by the maximum possible value 1/(n − 1)
If there is no (directed) path between node v and i then the total number of nodes is used in the formula instead of the path length.
The more central a node is, the lower its total distance to all other nodes.
Closeness can be regarded as a measure of how long it will take to spread information from v to all other nodes sequentially.

sort(closeness(gu), decreasing = TRUE) %>% head(10)

##         say      indeed        fire          so        Lord     Pharaoh 
## 0.002421308 0.002352941 0.002336449 0.002325581 0.002304147 0.002283105 
##        then        Mose       enemy       throw 
## 0.002283105 0.002277904 0.002277904 0.002272727

# calculate the standardized closeness centrality
closeS <- closeness(gu)*(vcount(gu) - 1)
sort(closeS, decreasing = TRUE) %>% head(15)

##        say     indeed       fire         so       Lord    Pharaoh       then 
## 0.09685230 0.09411765 0.09345794 0.09302326 0.09216590 0.09132420 0.09132420 
##       Mose      enemy      throw      Allah       fear        day     people 
## 0.09111617 0.09111617 0.09090909 0.09049774 0.09049774 0.09029345 0.09029345 
##         go 
## 0.09029345

From the various plots in Section 6.3, there are some words outside the main network cluster. They are disconnected from the main network. Hence, there are some warning messages for disconnected graphs.

We will calculate the Eigenvector and PageRank centrality measures in the next section as we assemble a dataframe of node-level measures.

5.3.9 Correlation analysis among centrality measures for the gu network

# calculate the degree centrality
deg <- degree(gu, loops = FALSE)
sort(deg, decreasing = TRUE) %>% head(15) # sort the nodes in decreasing order

##     say  indeed      so    fire    Mose    Lord    more Pharaoh    come   enemy 
##      18       8       8       7       4       4       4       3       3       3 
##    then   Allah    cast    fear   throw 
##       3       2       2       2       2

# calculate the standardized degree centrality
degS <- degree(gu, loops = FALSE)/(vcount(gu) - 1)
sort(degS, decreasing = TRUE) %>% head(15) # sort the nodes in decreasing order

##     say  indeed      so    fire    Mose    Lord    more Pharaoh    come   enemy 
##   0.450   0.200   0.200   0.175   0.100   0.100   0.100   0.075   0.075   0.075 
##    then   Allah    cast    fear   throw 
##   0.075   0.050   0.050   0.050   0.050

# calculate the closeness centrality
close <- closeness(gu)
sort(close, decreasing = TRUE) %>% head(15)

##         say      indeed        fire          so        Lord     Pharaoh 
## 0.002421308 0.002352941 0.002336449 0.002325581 0.002304147 0.002283105 
##        then        Mose       enemy       throw       Allah        fear 
## 0.002283105 0.002277904 0.002277904 0.002272727 0.002262443 0.002262443 
##         day      people          go 
## 0.002257336 0.002257336 0.002257336

# calculate the standardized closeness centrality
closeS <- closeness(gu)*(vcount(gu) - 1)
sort(closeS, decreasing = TRUE) %>% head(15)

##        say     indeed       fire         so       Lord    Pharaoh       then 
## 0.09685230 0.09411765 0.09345794 0.09302326 0.09216590 0.09132420 0.09132420 
##       Mose      enemy      throw      Allah       fear        day     people 
## 0.09111617 0.09111617 0.09090909 0.09049774 0.09049774 0.09029345 0.09029345 
##         go 
## 0.09029345

# calculate the Betweenness centrality
betw <- betweenness(gu)
sort(betw, decreasing = TRUE) %>% head(15)

##        say       fire     indeed         so      enemy       Mose       then 
## 338.600000 140.000000 111.733333  67.566667  58.000000   6.366667   6.233333 
##       more       come       Lord    Pharaoh      Allah   enduring        day 
##   6.000000   3.166667   2.333333   1.000000   0.000000   0.000000   0.000000 
##     people 
##   0.000000

# calculate the standardized Betweenness centrality
betwS <- 2 * betweenness(gu)/((vcount(gu) - 1) * (vcount(gu)-2))
sort(betwS, decreasing = TRUE) %>% head(15)

##         say        fire      indeed          so       enemy        Mose 
## 0.434102564 0.179487179 0.143247863 0.086623932 0.074358974 0.008162393 
##        then        more        come        Lord     Pharaoh       Allah 
## 0.007991453 0.007692308 0.004059829 0.002991453 0.001282051 0.000000000 
##    enduring         day      people 
## 0.000000000 0.000000000 0.000000000

# calculate the Eigenvector centrality
eigen <- evcent(gu)
sort(eigen[[1]], decreasing = TRUE) %>% head(15)

##       say        so    indeed      Lord      Mose   Pharaoh      fire      then 
## 1.0000000 0.6107979 0.5570388 0.4763855 0.3892785 0.3726103 0.3509806 0.3449028 
##     throw      come      fear     Allah     enemy       day    people 
## 0.3000885 0.2405517 0.2289512 0.2289512 0.2036726 0.1862980 0.1862980

# calculate the PageRank centrality
page <- page.rank(gu)
sort(page[[1]], decreasing = TRUE) %>% head(15)

##        say       fire     indeed         so       more       Lord       Mose 
## 0.14509073 0.06830736 0.06585466 0.06218651 0.05800923 0.03058829 0.03046530 
##      enemy   Merciful         be       most      deity       come       then 
## 0.02739062 0.02439024 0.02439024 0.02439024 0.02439024 0.02389786 0.02388842 
##    Pharaoh 
## 0.02359124

dfu <- data.frame(degS, closeS, betwS, eigen[[1]], page[[1]])
Pearson_correlation_matrix <- cor(dfu) # Pearson correlation matrix
Spearman_correlation_matrix <- cor(dfu, method = "spearman") # Spearman correlation matrix
cor(dfu, method = "kendall") # Kendall correlation matrix

##                 degS    closeS     betwS eigen..1..  page..1..
## degS       1.0000000 0.5692808 0.8182369 0.57440481 0.68519657
## closeS     0.5692808 1.0000000 0.5384856 0.88204673 0.10871067
## betwS      0.8182369 0.5384856 1.0000000 0.52761075 0.61998358
## eigen..1.. 0.5744048 0.8820467 0.5276108 1.00000000 0.06098661
## page..1..  0.6851966 0.1087107 0.6199836 0.06098661 1.00000000

# Basic Scatterplot Matrix
pairs(~deg + close + betw + eigen[[1]] + page[[1]],
      data=dfu,
      main="Simple Scatterplot Matrix")

5.3.10 Assembling a dataset of node-level measures for gd network

#assemble dataset
names = V(gd)$name
deg = degree(gd)
st = graph.strength(gd)
betw = betweenness(gd, normalized=F)
eigen <- evcent(gd)
page <- page.rank(gd)
dfd = data.frame(node.name=names, degree=deg, strength=st, betweenness=betw,
                 eigen = eigen[[1]], page = page[[1]]) 
head(dfd)

# plot the relationship between degree and strength
plot(strength~degree, data=dfd)

dfd %>% ggplot(aes(x = strength, y = degree)) + geom_point() +
        geom_text(label = rownames(dfd),
                size = 3, color = "darkblue",
                nudge_x = 0.25, nudge_y = 0.25, 
                check_overlap = T) +
        labs(title = "Word Co-occurrences Network",
             subtitle = "Strength vs Degree",
             caption = "Surah Taa-Haa (Saheeh International)")

Obviously, these are correlated, since strength is simply the weighted version of degree.

How about the relationship between betweenness and strength?

plot(betweenness~strength, data=dfd)

dfd %>% ggplot(aes(x = strength, y = betweenness)) + geom_point(color = "pink") +
      geom_text(label = rownames(dfd),
                size = 3, color = "darkblue",
                nudge_x = 0.25, nudge_y = 0.25, 
                check_overlap = T) +
      labs(title = "Word Co-occurrences Network",
           subtitle = "Strength vs betweenness",
           caption = "Surah Taa-Haa (Saheeh International)")

These are not well correlated, since they describe something different. Again the common words “say” and “indeed” have a dominant role in Suah Taa-Haa that narrates the true story of Prophet Mose. The common adverb “so” is often used for emphasis to stress some facts and lessons of the story.

5.4 Network-level measures

5.4.1 Size and density

Let’s start by getting some basic information for the network, such as the number of nodes and edges. There are a couple of functions to help us extract this information without having to look it up in the “object summary” (e.g., summary(gd)). Using these functions, you can store this information as separate objects, e.g., n for # nodes and m for # edges.

n = vcount(gd)
m = ecount(gd)

For gd the number of nodes n is 41 and the number of edges m is 50. For gu the number of nodes n is 41 and the number of edges m is 50.

The definition of network density is:

density = [# edges that exist] / [# edges that are possible]

In an undirected network with no loops, the number of edges that are possible is exactly the number of dyads that exist in the network. In turn, the number of dyads is n(n−1)/2 where n = number of nodes. With this information, we can calculate the density with the following:

dyads directed = n(n-1) = 41(41-1) = 1640 dyads undirected = n(n-1)/2 = 41(41-1)/2 = 820 density = m/dyads

There is a pre-packaged function for calculating density, edge_density():

edge_density(gd)

## [1] 0.0304878

edge_density(gu)

## [1] 0.06097561

5.4.2 Components

For ‘fully connected’ networks, we can follow edges from any given node to all other nodes in the network. Networks can also be composed of multiple components that are not connected to each other, as is obvious from the plots of our sample word network gd. We can get this information with a simple function.

components(gd)

## $membership
##       Mose       Lord      Allah        say     indeed   enduring        day 
##          1          1          1          1          1          2          1 
##     people    Pharaoh         so       cast       away       come   Merciful 
##          1          1          1          1          1          1          3 
##       more      enemy       fear       fire         go     hasten     follow 
##          2          1          1          1          1          1          1 
##  messenger         be      bring     family       find         do      throw 
##          1          4          1          1          1          1          1 
##       when        see       then       most punishment      river     severe 
##          1          1          1          3          2          1          2 
##     surely    therein       thus      deity   guidance       here 
##          2          1          1          4          1          1 
## 
## $csize
## [1] 32  5  2  2
## 
## $no
## [1] 4

plot(gd)

The output shows the node membership, component sizes, and number of components. The numbers for $no (number of components, 4) and $csize (size for each component) can be confirmed from the simple plot above.

5.4.3 Degree Distributions

Degree distribution, the statistical distribution of node degrees in a network, is a common and often powerful way to describe a network. We can simply look at the degree distribution as a histogram of degrees:

hist(degree(gd), breaks=5, col="steelblue")

hist(degree(gu), breaks=5, col="royalblue")

However, if we wanted to compare the degree distributions of different networks, it might be more useful to plot the probability densities of each degree: i.e., what proportion of nodes has degree = 1, degree = 2, etc. We can do this by using the function degree.distribution().

pkd = degree.distribution(gd)
plot(pkd, pch=20)

pku = degree.distribution(gu)
plot(pku, pch=20)

Degree and Degree Distribution Matters

higher density changes component structure, diffusion, learning…
variance and other moments matter (hub and spoke vs regular)
degree also is an individual node’s characteristic and reflects its position

5.4.4 Average Path Length and Diameter

The network “path” is typically a shorthand for “geodesic path” or “shortest path” — the fewest number of edges that you would have to go on to get from one node to another.

The average path length can be considered as the average “degrees of separation” between all pairs of nodes in the network.
The diameter (maximum path length) is the maximum degree of separation that exists in the network.

We can calculate path lengths with or without the edge weights (if using edge weights, you often simply count up the weights as you go along the path). The igraph package includes a convenient function for finding the shortest paths between every dyad in a network. Make sure you specify algorithm = “unweighted”.

pathd = distances(gd, algorithm=“unweighted”)
pathu = distances(gu, algorithm=“unweighted”)

This matrix (usually a large matrix, so we will not display the output) gives us the geodesic path length between each pair of nodes in the network. We can describe the network using some characteristics of the paths that exist in that network. This matrix contains a bunch of cells that are “Inf” (i.e., infinity). This is because the network is not connected, and we can’t calculate path lengths between nodes in different components.

How should we measure the average path length & diameter of a network with multiple components? There are two common solutions. First is to ignore pairs of nodes that are in different components and only measure average lengths of the paths that exist. This solution doesn’t really make sense for the diameter since the diameter of an unconnected network should be infinity. The second solution is to measure each component separately. We will do each of these in turn.

Option 1: To calculate the average path length while ignoring pairs of nodes that are in different components, we can first replace the “Inf” with “NA” in the path length matrix. Next, we want just the “upper triangle” or “lower triangle” of this matrix, which lists all the geodesic paths without duplicates.

pathd = distances(gd, algorithm="unweighted")
pathu = distances(gu, algorithm="unweighted")
pathd[pathd=="Inf"]=NA
mean(pathd[upper.tri(pathd)], na.rm=T)

## [1] 2.458661

pathu[pathu=="Inf"]=NA
mean(pathu[upper.tri(pathu)], na.rm=T)

## [1] 2.458661

This is what the canned function mean_distances() does for unconnected networks because we will get the same value:

mean_distance(gd)

## [1] 2.089202

mean_distance(gu)

## [1] 2.458661

Option 2: To calculate the average path lengths and diameter separately for each component, we will first ‘decompose’ the network into a list that contains each component as separate graph objects. We can then use the lapply() function to calculate separate path length matrices, and sapply() function to calculate the mean and max for each matrix.

comps = decompose(gd)
comps # a list object consisting of each component as graph object

## [[1]]
## IGRAPH 0f5a4fa DN-- 32 44 -- 
## + attr: name (v/c), cooc (e/n)
## + edges from 0f5a4fa (vertex names):
##  [1] Mose     ->say     Lord     ->say     Allah    ->say     say      ->so     
##  [5] indeed   ->say     Lord     ->so      say      ->throw   indeed   ->Lord   
##  [9] say      ->when    Allah    ->fear    Lord     ->Mose    day      ->say    
## [13] people   ->say     Pharaoh  ->say     say      ->see     indeed   ->so     
## [17] so       ->then    cast     ->enemy   away     ->indeed  come     ->Mose   
## [21] Mose     ->Pharaoh cast     ->river   enemy    ->river   enemy    ->say    
## [25] fear     ->say     fire     ->say     go       ->say     hasten   ->say    
## [29] follow   ->so      messenger->so      Pharaoh  ->so      come     ->then   
## + ... omitted several edges
## 
## [[2]]
## IGRAPH 0f5a4fa DN-- 5 4 -- 
## + attr: name (v/c), cooc (e/n)
## + edges from 0f5a4fa (vertex names):
## [1] enduring->more       more    ->punishment more    ->severe    
## [4] more    ->surely    
## 
## [[3]]
## IGRAPH 0f5a4fa DN-- 2 1 -- 
## + attr: name (v/c), cooc (e/n)
## + edge from 0f5a4fa (vertex names):
## [1] Merciful->most
## 
## [[4]]
## IGRAPH 0f5a4fa DN-- 2 1 -- 
## + attr: name (v/c), cooc (e/n)
## + edge from 0f5a4fa (vertex names):
## [1] be->deity

path.list = lapply(comps, function(x) distances(x, algorithm="unweighted")) #make list object with two path length matrices
avg.paths=sapply(path.list, mean) #average path length of each component
diams=sapply(path.list, max) #diameter of each component
avg.paths

## [1] 2.404297 1.280000 0.500000 0.500000

diams

## [1] 4 2 1 1

5.4.5 Path Distance Distribution

Path distribution: A frequency count of the occurrence of each path distance.

# shortest.paths(gu) # Long output
average.path.length(gu)

## [1] 2.458661

path.length.hist(gu)

## $res
## [1]  50 207 219  32
## 
## $unconnected
## [1] 312

# $res is the histogram of distances,
# $unconnected is the number of pairs for which the first node is not
# reachable from the second.

5.4.6 Path distance distribution for directed graph

# shortest.paths(gd, mode="out") # Long output
# shortest.paths(gd, mode="in") # Long output
average.path.length(gd)

## [1] 2.089202

path.length.hist (gd)

## $res
## [1]  50 106  48   6   3
## 
## $unconnected
## [1] 1427

# $res is the histogram of distances,
# $unconnected is the number of pairs for which the first node is not
# reachable from the second.

5.4.7 Why do we care about path?

Path means connectivity.
Path captures the indirect interactions in a network, and individual nodes benefit (or suffer) from indirect relationships because friends might provide access to favors from their friends and information might spread through the links of a network.
Path is closely related to small-world phenomenon.
Path is related to many centrality measures.

5.4.8 Clustering Coefficient (Transitivity) Distribution

There are two formal definitions of the Clustering Coefficient (or Transitivity): “global clustering coefficient” and “local clustering coefficient”. They are slightly different, but both deal with the probability of two nodes that are connected to a common node being connected themselves (e.g., the probability of two of your friends knowing each other).

Global Clustering Coefficient = “ratio of triangles to connected triples”
Local Clustering Coefficient = for each node, the proportion of their neighbors that are connected to each other
Average Local Clustering Coefficient: If Ci is the proportion of two nodes connected to node i that are also connected to each other (i.e., the Local Clustering Coefficient), then Average Local Clustering Coefficient = sum(Ci)/n

# global clustering: the ratio of the triangles and the connected triples in the graph
g.cluster = transitivity(gd, "global")
l.cluster = transitivity(gd, "local") # local clustering
av.l.cluster = transitivity(gd, "localaverage") # average clustering
g.cluster

## [1] 0.1358491

l.cluster

##  [1] 0.33333333 0.66666667 1.00000000 0.06535948 0.14285714        NaN
##  [7]        NaN        NaN 0.66666667 0.21428571 1.00000000        NaN
## [13] 0.00000000        NaN 0.00000000 0.33333333 1.00000000 0.04761905
## [19]        NaN        NaN        NaN        NaN        NaN        NaN
## [25]        NaN        NaN        NaN 1.00000000        NaN        NaN
## [31] 0.33333333        NaN        NaN 1.00000000        NaN        NaN
## [37]        NaN        NaN        NaN        NaN        NaN

av.l.cluster

## [1] 0.4877159

# undirected
g.cluster = transitivity(gu, "global")
l.cluster = transitivity(gu, "local")
av.l.cluster = transitivity(gu, "localaverage")
g.cluster

## [1] 0.1358491

l.cluster

##  [1] 0.33333333 0.66666667 1.00000000 0.06535948 0.14285714        NaN
##  [7]        NaN        NaN 0.66666667 0.21428571 1.00000000        NaN
## [13] 0.00000000        NaN 0.00000000 0.33333333 1.00000000 0.04761905
## [19]        NaN        NaN        NaN        NaN        NaN        NaN
## [25]        NaN        NaN        NaN 1.00000000        NaN        NaN
## [31] 0.33333333        NaN        NaN 1.00000000        NaN        NaN
## [37]        NaN        NaN        NaN        NaN        NaN

av.l.cluster

## [1] 0.4877159

5.4.9 Why do we care about clustering coefficient?

A clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. In most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterized by a relatively high density of links; this likelihood tends to be greater than the average probability of a link randomly established between two nodes.
Nodes with higher degrees have a lower local clustering coefficient on average.
Local clustering can be used as a probe for the existence of so-called structural holes in a network, which are missing links between neighbors of a node.
Structural holes can be bad when we are interested in efficient spread of information or other traffic around a network because they reduce the number of alternative routes information can take through the network.
Structural holes can be a good thing for the central node whose friends lack connections because they give power over information flow between those friends.
The local clustering coefficient measures how influential a node is in this sense, taking lower values the more structural holes there are in the network around it.
Local clustering can be regarded as a type of centrality measure, albeit one that takes small values for powerful individuals rather than large ones.

6 Community Structure and Assortment

6.1 Community Structure in Networks

Networks exhibit community structure, the presence of discrete clusters of nodes that are densely connected, which themselves are only loosely connected to other clusters. These may be clusters of individuals that form social groups. How do we detect the presence of such clusters or communities, and how can we quantify the degree of community structure?

6.1.1 Modularity and Community Detection

Modularity-based methods of community detection are not fool-proof. There is no perfect approach to community detection. There are several functions available for community detection in igraph and other packages.

edge.betweenness.community()
- One of the first in the class of “modularity optimization” algorithms. It is a “divisive” method. Cut the edge with highest edge betweenness, and recalculate. Eventually, you end up cutting the network into different groups.
fastgreedy.community()
- Hierarchical agglomerative method that is designed to run well in large networks. Creates “multigraphs” where you lump groups of nodes together in the process of agglomeration in order to save time on sparse graphs.
walktrap.community()
- Uses random walks to calculate distances, and then use agglomerative method to optimize modularity
spinglass.community()
- This method uses the analogy of the lowest-energy state of a collection of magnets (a so-called spin glass model).
leading.eigenvector.community()
- This is a “spectral partitioning” method. You first define a ‘modularity matrix’, which sums to 0 when there is no community structure. The leading eigenvector of this matrix ends up being useful as a community membership vector.
cluster_louvain()
- The “Louvain” method, so-called because it was created by a group of researchers at Louvain University in Belgium.

6.1.2 Modularity & Community Detection: A Simple Example

Our undirected word co-occurrence network gu appears to have a clear community structure from the earlier plots.

set.seed(7)
l = layout_with_kk(gu)
plot(gu, layout=l, edge.color="cyan3")

Because the community division in this example is clear, we can choose any of the community detection methods in the list above and we are likely to come up with the same answer.

eb = edge.betweenness.community(gu) 
eb

## IGRAPH clustering edge betweenness, groups: 7, mod: 0.53
## + groups:
##   $`1`
##    [1] "Mose"      "Lord"      "indeed"    "Pharaoh"   "so"        "away"     
##    [7] "come"      "follow"    "messenger" "do"        "then"      "therein"  
##   
##   $`2`
##    [1] "Allah"  "say"    "day"    "people" "fear"   "go"     "hasten" "throw" 
##    [9] "when"   "see"    "thus"  
##   
##   $`3`
##   [1] "enduring"   "more"       "punishment" "severe"     "surely"    
##   + ... omitted several groups/vertices

length(eb) #number of communities

## [1] 7

modularity(eb) #modularity

## [1] 0.533

membership(eb) #assignment of nodes to communities

##       Mose       Lord      Allah        say     indeed   enduring        day 
##          1          1          2          2          1          3          2 
##     people    Pharaoh         so       cast       away       come   Merciful 
##          2          1          1          4          1          1          5 
##       more      enemy       fear       fire         go     hasten     follow 
##          3          4          2          6          2          2          1 
##  messenger         be      bring     family       find         do      throw 
##          1          7          6          6          6          1          2 
##       when        see       then       most punishment      river     severe 
##          2          2          1          5          3          4          3 
##     surely    therein       thus      deity   guidance       here 
##          3          1          2          7          6          6

The resulting object is a ‘communities object’, which includes a few pieces of information - the number of communities (groups), the modularity value based on the node assignments, and the membership of nodes to each community. We can call each of these values separately.

We can also use this communities object to show the community structure.

plot(eb, gu, layout=l)

An example with the Louvain method:

el = cluster_louvain(gu)
el

## IGRAPH clustering multi level, groups: 8, mod: 0.53
## + groups:
##   $`1`
##   [1] "indeed"  "away"    "do"      "therein"
##   
##   $`2`
##   [1] "enduring"   "more"       "punishment" "severe"     "surely"    
##   
##   $`3`
##    [1] "Allah"  "say"    "day"    "people" "fear"   "go"     "hasten" "when"  
##    [9] "see"    "thus"  
##   
##   + ... omitted several groups/vertices

length(el) #number of communities

## [1] 8

modularity(el) #modularity

## [1] 0.5332

membership(el) #assignment of nodes to communities

##       Mose       Lord      Allah        say     indeed   enduring        day 
##          5          5          3          3          1          2          3 
##     people    Pharaoh         so       cast       away       come   Merciful 
##          3          5          5          7          1          5          6 
##       more      enemy       fear       fire         go     hasten     follow 
##          2          7          3          4          3          3          5 
##  messenger         be      bring     family       find         do      throw 
##          5          8          4          4          4          1          5 
##       when        see       then       most punishment      river     severe 
##          3          3          5          6          2          7          2 
##     surely    therein       thus      deity   guidance       here 
##          2          1          3          8          4          4

Plot the network with the communities assigned.

set.seed(2)
plot(el, gu, vertex.label="", edge.color="red")

We can customize it. We use the RColorBrewer package to assign colors.

library(RColorBrewer)
colors= brewer.pal(length(el),'Accent') #make a color palette
V(gu)$color = colors[membership(el)] #assign node color based on the community assignment
set.seed(2)
plot(gu, vertex.label="")

The two different methods yield different results; one with 7 and the other 8 communities (groups).

6.1.3 Another Example of Clustering

The goal of clustering (also referred to as “community detection”) is to find cohesive subgroups within a network. We have mentioned earlier that there are various algorithms for graph clustering in igraph. It is important to note that there is no real theoretical basis for what constitutes a cluster in a network. Only the vague “internally dense” versus “externally sparse” argument. As such, there is no clear argument for or against certain algorithms/methods.

No matter which algorithm is chosen, the workflow is always the same.

clu <- cluster_louvain(gu)
#membership vector
mem <- membership(clu)
head(mem)

##     Mose     Lord    Allah      say   indeed enduring 
##        5        5        3        3        1        2

#communities as list
com <- communities(clu)
com[[1]]

## [1] "indeed"  "away"    "do"      "therein"

An example for selected graph clustering algorithms our gu network is shown below.

imc <- cluster_infomap(gu)
lec <- cluster_leading_eigen(gu)
loc <- cluster_louvain(gu)
# sgc <- cluster_spinglass(gu) Cannot work with unconnected graph
wtc <- cluster_walktrap(gu)
scores <- c(infomap = modularity(gu,membership(imc)),
            eigen = modularity(gu,membership(lec)),
            louvain = modularity(gu,membership(loc)),
            walk = modularity(gu,membership(wtc)))
scores

## infomap   eigen louvain    walk 
##  0.4850  0.5332  0.5332  0.5332

For the gu network, the modularity score is around 0.5 despite the different functions. In general, though, it is advisable to use cluster_louvain() since it has the best speed/performance trade-off.

6.2 Assortment (Homophily)

One major pattern common to many social networks (and other types of networks) is homophily or assortment — the tendency for nodes that share a trait to be connected. The assortment coefficient is a commonly used measure of homophily. It is similar to the modularity index used in community detection, but the assortativity coefficient is used when we know a priori the ‘type’ or ‘value’ of nodes. For example, we can use the assortment coefficient to examine whether discrete node types (e.g., gender, ethnicity, etc.) are more or less connected to each other. Assortment coefficient can also be used with “scalar attributes” (i.e. continuously varying traits).

6.2.1 The Assortment Coefficient

There are at least two easy ways to calculate the assortment coefficient. In the igraph package, there is a function for assortativity(). One benefit to this function is that it can calculate assortment in directed or undirected networks. However, the major downside is that it cannot handle weighted networks.

6.2.2 Assortment: a simple example with igraph

Let’s use the same example network to demonstrate how to calculate assortativity, and to compare the difference between modularity and assortativity.

Let’s start by assigning each node a value–let’s say nodes vary in size.

set.seed(3)
l = layout_with_kk(gu)
V(gu)$size = 2*degree(gu) #assign sizes to nodes using two normal distributions with different means

plot(gu, layout=l, edge.color="black", repel=T)

assortativity(gu, V(gu)$size, directed=F)

## [1] -0.3318498

This network exhibits negative (low) levels of assortativity by node size.

We can also convert the size variable into a binary (i.e., discrete) trait and calculate the assortment coefficient.

V(gu)$size.discrete = (V(gu)$size > 5) + 0
#shortcut to make the values = 1 if large individual and 0 if small individual, with cutoff at size = 5
plot(gu, layout=l, edge.color="black", repel=T)

assortativity(gu, V(gu)$size.discrete, directed=F)

## [1] -0.1868132

As a comparison, we create a node attribute that varies randomly across all nodes in the network, and then measure the assortativity coefficient based on this trait. We will plot the figure with square nodes, just to make it clear that we are plotting a different trait.

set.seed(3)
V(gu)$random = rnorm(10, mean=20, sd=5) #create a node trait that varies randomly for all nodes 
assortativity(gu, V(gu)$random, directed=F)

## [1] -0.1487094

plot(gu, layout=l, edge.color="black", vertex.size=V(gu)$random, vertex.shape="square")

We can see that there is little assortment based on this trait.

Just to be extra clear, this network still exhibits community structure, but the trait we are measuring does not exhibit assortativity.

7 Repeat Analysis Using tidygraph

Earlier we created the wordnetwork using igraph. In this section we will show similar (and different) examples using tidygraph.

Two functions of tidygraph package can be used to create network objects, they are:

tbl_graph() creates a network object from nodes and edges data.
as_tbl_graph() converts network data and objects to a tbl_graph network.

A central aspect of tidygraph is to directly manipulate node and edge data from the tbl_graph object by activating nodes or edges. When we first create a tbl_graph object, the nodes will be activated. We can then directly calculate node or edge measures, like centrality, using tidyverse functions.

library(tidygraph)
gt <- as_tbl_graph(wordnetwork)
class(wordnetwork)

## [1] "igraph"

class(gt)

## [1] "tbl_graph" "igraph"

gt

## # A tbl_graph: 41 nodes and 50 edges
## #
## # A directed acyclic simple graph with 4 components
## #
## # Node Data: 41 x 1 (active)
##   name    
##   <chr>   
## 1 Mose    
## 2 Lord    
## 3 Allah   
## 4 say     
## 5 indeed  
## 6 enduring
## # ... with 35 more rows
## #
## # Edge Data: 50 x 3
##    from    to  cooc
##   <int> <int> <dbl>
## 1     1     4    16
## 2     2     4    15
## 3     3     4    10
## # ... with 47 more rows

Notice how the igraph wordnetwork is converted into two separate tibbles, Node Data and Edge Data. But both wordnetwork and gt are of the same igraph class.

7.1 Direct ggraph integration

gt can directly be used with our preferred ggraph package for visualizing networks.

ggraph(gt, layout = 'fr', weights = cooc) + 
  geom_edge_link() + 
  geom_node_point()

Now it is much easier to experiment with modifications to node and edge parameters affecting layouts as it is not necessary to modify the underlying graph but only the plotting code, e.g.:

ggraph(gt, layout = 'fr', weights = log(cooc)) + 
  geom_edge_link(color = "cyan4") + 
  geom_node_point(color = "gold4", size = 3)

7.2 Use Selected Measures from tidygraph and Plot

We show below how to collate some of the node related measures. Some of the measures we have shown earlier using igraph. Now it appears in a tidy dataframe or tibble format.

node_measures <- gt %>%
     activate(nodes) %>%
     mutate(
         degree_in = centrality_degree(mode = "in"),
         degree_out = centrality_degree(mode = "out"),
         degree = degree_in + degree_out,
         betweenness = centrality_betweenness(),
         closeness = centrality_closeness(normalized = T),
         pg_rank = centrality_pagerank(),
         eigen = centrality_eigen(),
         br_score = node_bridging_score(),
         coreness = node_coreness()
     ) %>% as_tibble()
node_measures

Now we plot the various measures from the resulting node_measures dataframe.

node_measures %>% arrange(desc(degree)) %>% 
# First sort by degree. This sorts the dataframe but NOT the factor levels
     head(30) %>% 
     mutate(name=fct_reorder(name, degree, .desc = F)) %>%
# This updates the factor levels. desc = F because of coord_flip() below
     ggplot(aes(x = name, y = degree)) +
     geom_segment(aes(xend=name, yend=0)) +
     geom_point(size=3, color="tomato") +
     coord_flip() +
     theme_bw() +
     xlab("")

# Another code
node_measures %>% arrange(desc(degree)) %>% 
# First sort by degree. This sorts the dataframe but NOT the factor levels
     head(30) %>%
     ggplot(aes(reorder(name, degree, FUN = min), degree)) +
     geom_point(size=3, color="salmon") +
     labs(x = "Word", y = "Degree") +
     coord_flip()

Plot degree degree_in and degree_out together.

NMtall <- node_measures %>% gather(key = center, value = Value, degree_in:degree_out)
NMtall %>% arrange(desc(Value)) %>%
     head(50) %>% 
     mutate(name=fct_reorder(name, Value, .desc = F)) %>% 
     ggplot(aes(name, Value, color = center)) + 
     geom_point(size = 2) + 
     coord_cartesian(ylim = c(0, 12)) +
     labs(x = "Word", y = "Measure") +
     coord_flip()

Plot degree (degree_in + degree_out) and betweenness together.

NMtall <- node_measures %>% gather(key = center, value = Value, degree:betweenness)
NMtall %>% arrange(desc(Value)) %>%
     head(50) %>% 
     mutate(name=fct_reorder(name, Value, .desc = F)) %>% 
     ggplot(aes(name, Value, fill = center)) + 
     geom_col(position = "identity") + 
     labs(x = "Word", y = "Measure") +
     coord_flip()

Plot closeness, pg_rank, eigen, br_score, coreness together.

NMtall <- node_measures %>% gather(key = measure, value = Value, closeness:coreness)
NMtall %>% arrange(desc(Value)) %>%
     head(50) %>% 
     mutate(name=fct_reorder(name, Value, .desc = F)) %>% 
     ggplot(aes(name, Value, color = measure)) + 
     geom_point(size = 2) +
     coord_cartesian(ylim = c(0, 1)) + 
     labs(x = "Word", y = "Measure") +
     coord_flip()

Despite using coord_cartesian(ylim = c(0, 1)) to scale the Measure coordinate, the values for br_score, and coreness are 0 or very small. Without any doubt “say”, “Lord” together with “Allah” and “Mose” are the most influential and important words in Surah Taa-Haa.

Notice the slight difference in using the network measures with tidygraph. We can easily assemble the required measures for the nodes in a tidy dataframe. tidygraph has many functions that can give us information about nodes. We show examples of some measures that seem to measure slightly different things about the nodes.

degree: Number of direct connections
betweenness: How many shortest paths go through this node
pagerank: How many links pointed to me come from lot of pointed-to-nodes
eigen centrality: Something like the page rank but slightly different
closeness: How central is this node to the rest of the network
bridge score: Average decrease in cohesiveness if each of its edges were removed from the graph
coreness: K-core decomposition

7.3 Example Combining Selected Node and Edge Measures from tidygraph

The following is an interesting example in true tidyverse fashion that combines some measures that we have not covered.⁵

gtexample <- gt %>%
   mutate(n_rank_trv = node_rank_traveller(),
         neighbors = centrality_degree(),
         group = group_infomap(),
         center = node_is_center(),
         dist_to_center = node_distance_to(node_is_center()),
         keyplayer = node_is_keyplayer(k = 10)) %>%
   activate(edges) %>% 
   filter(!edge_is_multiple()) %>%
   mutate(centrality_e = centrality_edge_betweenness())

We can also convert our active node or edge table back to a tibble:

gtexample %>%
  activate(nodes) %>% # %N>%
  as.tibble()

gtexample %>%
  activate(edges) %>% # %E>%
  as.tibble()

We plot the output.

ggraph(gtexample, layout="fr") + 
    geom_edge_density(aes(fill = cooc)) +
    geom_edge_link(aes(width = cooc), alpha = 0.2) + 
    geom_node_point(aes(color = factor(group)), size = 5) +
    geom_node_text(aes(label = name), size = 3, repel = TRUE) +
    scale_color_brewer(palette = "Set1") +
    theme_graph() +
    labs(title = "Surah Taa-Haa Word Co-occurrence Network",
         subtitle = "Nodes are colored by group")

For the next plot, we define our own specific colors. The center-most characters are in red and the distance to center is the node size.

got_palette = c("red", "blue", "green", "gold", "coral", "cyan4", "maroon", "deeppink")
ggraph(gtexample, layout="fr") + 
    geom_edge_density(aes(fill = cooc)) +
    geom_edge_link(aes(width = cooc), alpha = 0.2) + 
    geom_node_point(aes(color = factor(center), size = dist_to_center)) +
    geom_node_text(aes(label = name), size = 3, repel = TRUE) +
    scale_fill_manual(values = got_palette) +
    theme_graph() +
    theme(legend.position = "bottom") +
    labs(title = "Surah Taa-Haa Word Co-occurrence Network",
         subtitle = "Nodes are colored by centeredness")

Clearly, “say” is the keyplayer for the main group.

8 Who is the Most Important Influencer?

In this section, we ask some questions about which is the “most important” node. We want to understand important concepts of network centrality and how to calculate those in R.

What is the most important word in this network? What does “most important” mean? It of course depends on the definition and this is where network centrality measures come into play. We will have a look at three of those (there are many more out there…).

Degree centrality Degree centrality tells you the most connected word: it is simply the number of nodes connected to each node. It can denote Popularity [3].

node_measures %>% arrange(desc(degree))

This is often the only measure given to identify “influencers”: how many followers do they have?. So far “say” has the highest, 18 (12 in and 6 out).

Closeness centrality Closeness centrality tells us who can propagate information quickest. One application that comes to mind is identifying so-called superspreaders of infectious diseases, like COVID-19. “say” is no longer the highest.

Betweenness centrality Betweenness centrality tells us who is most important in maintaining connections throughout the network: it is the number of times your node is on the shortest path between any other pair of nodes. It can denote Brokerage and Bridging [3]. “say” is prominent here. As a Surah that narrates the story of Prophet Mose, that is understandable.

Eigenvector Centrality Is a word (person) connected to other “well-connected” words (people)? It can denote Connections [3]. Again, “say” dominates.

Diffusion Centrality Can a given word (person) reach many others within a short number of hops in the network. It can denote Reach [3]

As we have seen, there is more than one definition of “most important”. It will depend on the context (and the available information) which one to choose. Based on the previous plots, without any doubt “say”, “Lord” together with “Allah” and “Mose” are the most influential and important words in Surah Taa-Haa. Indeed, it is about “Allah” narrating the true story of “Mose”.

8.1 Build communities and calculate measures

We’ll do the analysis using tidygraph.

set.seed(123)
network_ego1 <- gt %>% 
  mutate(community = as.factor(group_walktrap())) %>%
  mutate(degree_c = centrality_degree()) %>%
  mutate(betweenness_c = centrality_betweenness(directed = F, normalized = T)) %>%
  mutate(closeness_c = centrality_closeness(normalized = T)) %>%
  mutate(eigen = centrality_eigen(directed = F))
network_ego1

## # A tbl_graph: 41 nodes and 50 edges
## #
## # A directed acyclic simple graph with 4 components
## #
## # Node Data: 41 x 6 (active)
##   name     community degree_c betweenness_c closeness_c    eigen
##   <chr>    <fct>        <dbl>         <dbl>       <dbl>    <dbl>
## 1 Mose     1                2       0.00816      0.0302 3.89e- 1
## 2 Lord     2                3       0.00299      0.0311 4.76e- 1
## 3 Allah    3                2       0            0.0302 2.29e- 1
## 4 say      4                6       0.434        0.0286 1.00e+ 0
## 5 indeed   5                4       0.143        0.0331 5.57e- 1
## 6 enduring 6                1       0            0.0270 4.86e-17
## # ... with 35 more rows
## #
## # Edge Data: 50 x 3
##    from    to  cooc
##   <int> <int> <dbl>
## 1     1     4    16
## 2     2     4    15
## 3     3     4    10
## # ... with 47 more rows

We can easily convert it to dataframe using as.data.frame(). We need to this to specify who is the key player in our ego network

network_ego_df <- as.data.frame(network_ego1 %>% activate(nodes))
network_ego_df

8.1.1 Identify prominent word in the network

We have converted the table_graph to a dataframe. The last thing we need to do is to find the top account in each centrality and pull the key player.

Key player is a term for the most influential nodes in the network based on different centrality measures. Each centrality has different uses and interpretations. A node that appears in the top of most centrality measures will be considered as the key player of the whole network.

# take 20 highest users by its centrality
kp_ego <- data.frame(
  network_ego_df %>% arrange(-degree_c) %>% select(name) %>% slice(1:20),
  network_ego_df %>% arrange(-betweenness_c) %>% select(name) %>% slice(1:20),
  network_ego_df %>% arrange(-closeness_c) %>% select(name) %>% slice(1:20),
  network_ego_df %>% arrange(-eigen) %>% select(name) %>% slice(1:20)
) %>% setNames(c("degree","betweenness","closeness","eigen"))
kp_ego

Top 10 words based on its centrality From the table above, “say” tops in most centrality measures.

8.1.2 Visualize Network

We’ll scale the nodes by degree centrality, and color it by community. We’ll filter by only showing community 1 to 10.

network_ego1 %>%
  filter(community %in% 1:10) %>%
  top_n(100,degree_c) %>%
  mutate(node_size = ifelse(degree_c >= 1,degree_c,0)) %>%
  mutate(node_label = ifelse(closeness_c >= 0.001,name,"")) %>%
  ggraph(layout = "stress") +
  geom_edge_fan(alpha = 0.05) +
  geom_node_point(aes(color = as.factor(community), size = 1.5*node_size)) +
  geom_node_label(aes(label = node_label),repel = T,
                 show.legend = F, fontface = "bold", label.size = 0,
                 segment.color="royalblue", fill = "wheat") +
  coord_fixed() +
  theme_graph() + theme(legend.position = "none") +
  labs(title = "Word Mutual Communities",
       subtitle = "Top 10 Communities")

8.1.3 ego() function

The neighbors of a specific node can be extracted with the ego() function. Below, we are looking for all words that are linked with “say”, directly (order = 1) and indirecly (order > 1)

focusnode <- which(V(gd)$name == "say")
ego(gd,order = 1, nodes = focusnode, mode = "all", mindist = 1)

## [[1]]
## + 18/41 vertices, named, from 0b65863:
##  [1] Mose    Lord    Allah   indeed  day     people  Pharaoh so      enemy  
## [10] fear    fire    go      hasten  throw   when    see     then    thus

ego(gd,order = 2, nodes = focusnode, mode = "all", mindist = 1)

## [[1]]
## + 31/41 vertices, named, from 0b65863:
##  [1] Mose      Lord      Allah     indeed    day       people    Pharaoh  
##  [8] so        enemy     fear      fire      go        hasten    throw    
## [15] when      see       then      thus      come      away      do       
## [22] therein   follow    messenger cast      river     bring     family   
## [29] find      guidance  here

ego(gd,order = 3, nodes = focusnode, mode = "all", mindist = 1)

## [[1]]
## + 31/41 vertices, named, from 0b65863:
##  [1] Mose      Lord      Allah     indeed    day       people    Pharaoh  
##  [8] so        enemy     fear      fire      go        hasten    throw    
## [15] when      see       then      thus      come      away      do       
## [22] therein   follow    messenger cast      river     bring     family   
## [29] find      guidance  here

We use this to test small world and 6 degrees. Here “say” reaches every other word after 2 degrees/orders.

9 Basic Tutorial on graphlayouts Using Surah Taa-Haa Word Network

We adapt this tutorial based on a good reference.⁶

We directly jump into some code and work through it one line at a time.

deg = degree(gd)
ggraph(gd,layout = "stress") +
  geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
  geom_node_point(aes(size = deg), shape = 21) +
  geom_node_text(aes(filter = deg >= 3, label = name), 
                 family = "serif", color = "darkblue") +
  theme_graph() +
  theme(legend.position = "none")

ggraph works with layers. Each layer adds a new feature to the plot and thus builds the figure step-by-step. The following sections work through each of them separately.

9.1 Layout

ggraph(gd,layout = “stress”)

The first step is to calculate a layout. The layout parameter specifies the algorithm to use. The “stress” layout is part of the graphlayouts package and is always a safe choice since it is deterministic and produces nice layouts for almost any graph. Other algorithms for, e.g., concentric layouts and clustered networks are described further down in this tutorial. Here is a list of layout algorithms of igraph.

c(“layout_with_dh”, “layout_with_drl”, “layout_with_fr”, “layout_with_gem”, “layout_with_graphopt”, “layout_with_kk”, “layout_with_lgl”, “layout_with_mds”, “layout_with_sugiyama”, “layout_as_bipartite”, “layout_as_star”, “layout_as_tree”)

To use them, we just need the last part of the name.

ggraph(gd,layout = “dh”) + …

A good tutorial on ggraph layouts can be found here.⁷

9.2 Edges

geom_edge_link0(aes(width = weight), edge_color = “grey66”)

The second layer specifies how to draw the edges. Edges can be drawn in many different ways as the list below shows.

c(“geom_edge_arc”, “geom_edge_arc0”, “geom_edge_arc2”, “geom_edge_density”, “geom_edge_diagonal”, “geom_edge_diagonal0”, “geom_edge_diagonal2”, “geom_edge_elbow”, “geom_edge_elbow0”, “geom_edge_elbow2”, “geom_edge_fan”, “geom_edge_fan0”, “geom_edge_fan2”, “geom_edge_hive”, “geom_edge_hive0”, “geom_edge_hive2”, “geom_edge_link”, “geom_edge_link0”, “geom_edge_link2”, “geom_edge_loop”, “geom_edge_loop0”)

It is good to stick with geom_edge_link0 since it simply draws a straight line between the endpoints. Some tools draw curved edges by default. While this may add some artistic value, it reduces readability. Always go with straight lines! If your network has multiple edges between two nodes, then you can switch to geom_edge_parallel().

What does the “0” stand for? The standard geom_edge_link() draws 100 dots on each edge compared to only two dots (the endpoints) in geom_edge_link0(). This is done to allow, e.g., gradients along the edge. The drawback of using geom_edge_link() is that the time to render the plot increases and so does the size of the file if we export the plot. Typically, we do not need gradients along an edge. Hence, geom_edge_link0() should be our default to draw edges.

Within geom_edge_link0, we can specify the appearance of the edge, either by mapping edge attributes to aesthetics or setting them globally for the graph. Mapping attributes to aesthetics is done within aes(). In the example, we map the edge width to the edge attribute “weight”. ggraph then automatically scales the edge width according to the attribute. We can control this scale. The color of all edges is globally set to “grey66”.

The following aesthetics can be used within geom_edge_link0 either within aes() or globally:

edge_color (color of the edge)
edge_width (width of the edge)
edge_linetype (linetype of the edge, defaults to “solid”)
edge_alpha (opacity; a value between 0 and 1)

ggraph does not automatically plot arrows if your graph is directed. We need to do this manually using the arrow parameter.

geom_edge_link0(aes(…),…, arrow = arrow(angle = 30, length = unit(0.15, “inches”), ends = “last”, type = “closed”)) The default arrowhead type is “open”, yet “closed” usually has a nicer appearance

A good tutorial on ggraph edges can be found here.⁸

9.3 Nodes

geom_node_point(aes(size = deg), shape = 21) geom_node_text(aes(filter = deg >= 3, label = name), family = “serif”, color = “darkblue”)

On top of the edge layer, we draw the node layer. Always draw the node layer above the edge layer. Otherwise, edges will be visible on top of nodes. There are slightly less geoms available for nodes.

c(“geom_node_arc_bar”, “geom_node_circle”, “geom_node_label”, “geom_node_point”, “geom_node_text”, “geom_node_tile”, “geom_node_treemap”)

The most important ones here are geom_node_point() to draw nodes as simple geometric objects (circles, squares,…) and geom_node_text() to add node labels. You can also use geom_node_label(), but this draws labels within a box.

The mapping of node attributes to aesthetics is similar to edge attributes. In the example, we map the degree of the nodes to the attribute “size”. The shape of the node is globally set to 21.

The figure below shows all possible shapes that can be used for the nodes. visual.

“21” draws a border around the nodes. If you prefer another shape, say “19”, you have to be aware of several things. To change the color of shapes 1-20, you need to use the color parameter. For shapes 21-25 you need to use fill. The color parameter only controls the border for these shapes.

The following aesthetics can be used within geom_node_point() either within aes() or globally:

alpha (opacity; a value between 0 and 1)
color (color of shapes 0-20 and border color for 21-25)
fill (fill color for shape 21-25)
shape (node shape; a value between 0 and 25)
size (size of node)
stroke (size of node border)

For geom_node_text(), there are more options available, but the most important are:

label (attribute to be displayed as node label)
color (text color)
family (font to be used)
size (font size)

Note that we also used a filter within aes() of geom_node_text(). The filter parameter allows you to specify a rule for when to apply the aesthetic mappings. The most frequent use case is for node labels (but can also be used for edges or nodes). In the example, we only display the node label if the size attribute is larger than 2.

A good tutorial on ggraph nodes can be found here.⁹

9.4 Themes

theme_graph() + theme(legend.position = “none”)

themes control the overall look of the plot. There are many options within the theme() function. theme_graph() is used to erase all the default ggplot theme (e.g. axis, background, grids, etc.) since they are irrelevant for networks. The only option worthwhile in theme() is legend.position, which we set to “none”, i.e. don’t show the legend.

The code below gives an example for a plot with a legend.

ggraph(gd,layout = "stress") +
  geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
  geom_node_point(aes(fill = deg), shape = 21, size = deg) +
  geom_node_text(aes(label = name, size = 10*deg),
                 family = "serif", repel = F)+
  scale_edge_color_brewer(palette = "Dark2")+
  theme_graph() +
  theme(legend.position = "bottom")

9.5 Concentric layouts

Concentric circles help to emphasize the position of certain nodes in the network. The graphlayouts package has two functions for concentric layouts, layout_with_focus() and layout_with_centrality().

The first one allows to focus the network on a specific node and arrange all other nodes in concentric circles (depending on the geodesic distance) around it. Below we focus on the character Mose. However, it must be a connected graph. From previous plots, both gd and gu are not fully connected. So we focus on the largest cluster.

9.5.1 Taking the largest component

cld <- clusters(gd)
jg1 <- induced_subgraph(gd, which(cld$membership == which.max(cld$csize)))
jg1

## IGRAPH 13af31b DN-- 32 44 -- 
## + attr: name (v/c), cooc (e/n)
## + edges from 13af31b (vertex names):
##  [1] Mose     ->say     Lord     ->say     Allah    ->say     say      ->so     
##  [5] indeed   ->say     Lord     ->so      say      ->throw   indeed   ->Lord   
##  [9] say      ->when    Allah    ->fear    Lord     ->Mose    day      ->say    
## [13] people   ->say     Pharaoh  ->say     say      ->see     indeed   ->so     
## [17] so       ->then    cast     ->enemy   away     ->indeed  come     ->Mose   
## [21] Mose     ->Pharaoh cast     ->river   enemy    ->river   enemy    ->say    
## [25] fear     ->say     fire     ->say     go       ->say     hasten   ->say    
## [29] follow   ->so      messenger->so      Pharaoh  ->so      come     ->then   
## + ... omitted several edges

V(jg1)

## + 32/32 vertices, named, from 13af31b:
##  [1] Mose      Lord      Allah     say       indeed    day       people   
##  [8] Pharaoh   so        cast      away      come      enemy     fear     
## [15] fire      go        hasten    follow    messenger bring     family   
## [22] find      do        throw     when      see       then      river    
## [29] therein   thus      guidance  here

jg2 <- simplify(as.undirected(jg1))
jg2

## IGRAPH 13aff47 UN-- 32 44 -- 
## + attr: name (v/c)
## + edges from 13aff47 (vertex names):
##  [1] Mose   --Lord    Mose   --say     Mose   --Pharaoh Mose   --come   
##  [5] Lord   --say     Lord   --indeed  Lord   --so      Allah  --say    
##  [9] Allah  --fear    say    --indeed  say    --day     say    --people 
## [13] say    --Pharaoh say    --so      say    --enemy   say    --fear   
## [17] say    --fire    say    --go      say    --hasten  say    --throw  
## [21] say    --when    say    --see     say    --then    say    --thus   
## [25] indeed --so      indeed --away    indeed --come    indeed --fire   
## [29] indeed --do      indeed --therein Pharaoh--so      so     --follow 
## + ... omitted several edges

The parameter focus in the first line is used to choose the node id of the focal node (Mose = 1). The function coord_fixed() is used to always keep the aspect ratio at one (i.e. the circles are always displayed as a circle and not an ellipse).

The function draw_circle() can be used to add the circles explicitly.

got_palette = c("red", "blue", "green", "gold", "coral", "cyan4", "maroon", "deeppink")
focusnode <- which(V(jg1)$name == "Mose")
deg = degree(jg1)
ggraph(jg1,layout = "focus", focus = focusnode) +
    draw_circle(col = "darkblue", use = "focus", max.circle = 3) +
    geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
    geom_node_point(aes(size = deg), shape = 19) +
    geom_node_text(aes(filter = (name == "Mose"), size = 2*deg, label = name),
                   family = "serif") +
    scale_edge_width_continuous(range = c(0.1, 2.0)) +
    scale_size_continuous(range = c(1,5)) +
    scale_fill_manual(values = got_palette) +
    coord_fixed() +
    theme_graph() +
    theme(legend.position = "bottom")

Repeat with a change of the focus node and displaying all the words.

deg = degree(jg1)
focusnode <- which(V(jg1)$name == "say")
ggraph(jg1,layout = "focus", focus = focusnode) +
    draw_circle(col = "darkred", use = "focus", max.circle = 3) +
    geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
    geom_node_point(aes(size = deg), shape = 21) +
    geom_node_text(aes(label = name, size = 3),
                 family = "serif", repel = F) +
    scale_edge_width_continuous(range = c(0.1, 2.0)) +
    scale_size_continuous(range = c(1,10)) +
    scale_fill_manual(values = got_palette) +
    coord_fixed() +
    theme_graph() +
    theme(legend.position = "bottom")

layout_with_centrality() works in a similar way. We can specify any centrality index (or numeric vector for that matter), and create a concentric layout where the most central nodes are put in the center and the most peripheral nodes in the biggest circle. The numeric attribute used for the layout is specified with the cent parameter. Here, we use the weighted degree of the characters.

ggraph(jg1,layout = "centrality", cent = graph.strength(jg1)) +
  geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
  geom_node_point(aes(color = deg, size = deg), shape = 20) +
  geom_node_text(aes(label = name, size = 3*deg),
                 family = "serif", repel = TRUE) +
  scale_edge_width_continuous(range = c(0,2)) +
  scale_size_continuous(range = c(1,7)) +
  coord_fixed() +
  theme_graph(title_family = "Arial", title_size = 16) +
  labs(title = "Surah Taa-Haa Word Co-Occurrence Network",
       subtitle = "Weighted Degree Centrality Layout") +
  theme(legend.position = "bottom")

We repeat with betweenness centrality.

ggraph(jg1,layout = "centrality", 
       cent = betweenness(jg1, directed = F, normalized = T)) +
  geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
  geom_node_point(aes(size = 2*deg), shape = 21) +
  geom_node_text(aes(label = name, size = 5*deg),
                 family = "serif", repel = T) +
  scale_edge_width_continuous(range = c(0,2)) +
  scale_size_continuous(range = c(1,7)) +
  coord_fixed() +
  theme_graph(title_family = "Arial", title_size = 16) +
  labs(title = "Surah Taa-Haa Word Co-Occurrence Network",
       subtitle = "Betweenness Centrality Layout with Degree for Size") +
  theme(legend.position = "bottom")

9.5.2 Stress Layout and Clustering

Focus again on gd and gu. Some clustering functions do not work on directed graphs. We show two different examples here.

cld <- clusters(gd)
V(gd)$clu <- as.character(cld$membership)
V(gd)$size <- graph.strength(gd)
gd

## IGRAPH 0b65863 DN-- 41 50 -- 
## + attr: name (v/c), clu (v/c), size (v/n), cooc (e/n)
## + edges from 0b65863 (vertex names):
##  [1] Mose    ->say        Lord    ->say        Allah   ->say       
##  [4] say     ->so         indeed  ->say        Lord    ->so        
##  [7] say     ->throw      indeed  ->Lord       say     ->when      
## [10] enduring->more       Allah   ->fear       Lord    ->Mose      
## [13] day     ->say        people  ->say        Pharaoh ->say       
## [16] say     ->see        indeed  ->so         so      ->then      
## [19] cast    ->enemy      away    ->indeed     come    ->Mose      
## [22] Merciful->most       Mose    ->Pharaoh    more    ->punishment
## + ... omitted several edges

ggraph(gd,layout = "stress") +
    geom_edge_link0(aes(width=cooc), edge_color="grey66") +
    geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
    geom_node_text(aes(size=2.5, label=name), family = "serif", repel=T) +
    scale_edge_width_continuous(range=c(0.1, 2.0)) +
    scale_size_continuous(range=c(1,10)) +
    scale_fill_manual(values=got_palette) +
    theme_graph(title_size = 16, subtitle_size = 14) +
    labs(title = "Surah Taa-Haa Word Network",
         subtitle = "Directed With Clusters")+
    theme(legend.position = "bottom")

Repeat with undirected graph gu and cluster_louvain which does not work with directed graphs. gu does not have edge properties so we remove the aes(width=cooc)

clu <- cluster_louvain(gu)
V(gu)$clu <- as.character(clu$membership)
V(gu)$size <- graph.strength(gu)
gu

## IGRAPH 0bec5fe UN-- 41 50 -- 
## + attr: name (v/c), color (v/c), size (v/n), size.discrete (v/n),
## | random (v/n), clu (v/c)
## + edges from 0bec5fe (vertex names):
##  [1] Mose  --Lord    Mose  --say     Mose  --Pharaoh Mose  --come   
##  [5] Lord  --say     Lord  --indeed  Lord  --so      Allah --say    
##  [9] Allah --fear    say   --indeed  say   --day     say   --people 
## [13] say   --Pharaoh say   --so      say   --enemy   say   --fear   
## [17] say   --fire    say   --go      say   --hasten  say   --throw  
## [21] say   --when    say   --see     say   --then    say   --thus   
## [25] indeed--so      indeed--away    indeed--come    indeed--fire   
## + ... omitted several edges

ggraph(gu,layout = "stress") +
    geom_edge_link0(edge_color="grey66") +
    geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
    geom_node_text(aes(size=2.5,label=name), family="serif", repel=T) +
    scale_edge_width_continuous(range=c(0.1, 2.0)) +
    scale_size_continuous(range=c(1,10)) +
    scale_fill_manual(values=got_palette) +
    theme_graph(title_size = 16, subtitle_size = 14) +
    labs(title = "Surah Taa-Haa Word Network",
         subtitle = "Undirected With Clusters") +
    theme(legend.position = "bottom")

Interestingly, the cluster functions give 4 for gd and 8 for gu.

9.5.3 Focus Layout and Clustering - Focus on Selected Words

Earlier we have shown how layout_with_focus() allows us to focus the network on a specific word and order all other nodes in concentric circles (depending on distance) around it. Here we combine it with clustering. The limitation is that it can only work with a fully connected network.

clu <- clusters(jg1)
V(jg1)$clu <- as.character(clu$membership)
V(jg1)$size <- graph.strength(jg1)
jg1

## IGRAPH 13af31b DN-- 32 44 -- 
## + attr: name (v/c), clu (v/c), size (v/n), cooc (e/n)
## + edges from 13af31b (vertex names):
##  [1] Mose     ->say     Lord     ->say     Allah    ->say     say      ->so     
##  [5] indeed   ->say     Lord     ->so      say      ->throw   indeed   ->Lord   
##  [9] say      ->when    Allah    ->fear    Lord     ->Mose    day      ->say    
## [13] people   ->say     Pharaoh  ->say     say      ->see     indeed   ->so     
## [17] so       ->then    cast     ->enemy   away     ->indeed  come     ->Mose   
## [21] Mose     ->Pharaoh cast     ->river   enemy    ->river   enemy    ->say    
## [25] fear     ->say     fire     ->say     go       ->say     hasten   ->say    
## [29] follow   ->so      messenger->so      Pharaoh  ->so      come     ->then   
## + ... omitted several edges

focusnode <- which(V(jg1)$name == "Mose")
ggraph(jg1, "focus", focus=focusnode) +
  draw_circle(col = "darkblue", use = "focus", max.circle = 3) +
  geom_edge_link0(aes(width=cooc), edge_color="grey66") +
  geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
  geom_node_text(aes(size=3,label=name), repel=T) +
  scale_edge_width_continuous(range=c(0.1,2.0)) +
  scale_size_continuous(range=c(1,10)) +
  scale_fill_manual(values=got_palette) +
  theme_graph(title_size = 16, subtitle_size = 14) +
  labs(title = "Surah Taa-Haa Directed Word Network",
       subtitle = "Focus Layout With All Words") +
    theme(legend.position = "bottom")

ggraph(jg1, "focus", focus=focusnode) +
  draw_circle(col = "darkblue", use = "focus", max.circle = 3) +
  geom_edge_link0(aes(width=cooc), edge_color="grey66") +
  geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
  geom_node_text(aes(filter=(name=="Mose"), size=size,label=name), repel=T) +
  scale_edge_width_continuous(range=c(0.1,2.0)) +
  scale_size_continuous(range=c(1,10)) +
  scale_fill_manual(values=got_palette) +
  theme_graph(title_size = 16, subtitle_size = 14) +
  labs(title = "Surah Taa-Haa Directed Word Network",
       subtitle = "Focus Layout With Only The Focus Word") +
    theme(legend.position = "bottom")

Based on a similar principle is layout_with_centrality(). We have shown this earlier. But here we repeat with clustering (using gu) and also look at the coreness centrality measure. Earlier we have also seen that cluster_louvain() gives different results than clusters().

clu <- cluster_louvain(jg2)
V(jg2)$clu <- as.character(clu$membership)
V(jg2)$size <- graph.strength(jg2)
jg2

## IGRAPH 13aff47 UN-- 32 44 -- 
## + attr: name (v/c), clu (v/c), size (v/n)
## + edges from 13aff47 (vertex names):
##  [1] Mose   --Lord    Mose   --say     Mose   --Pharaoh Mose   --come   
##  [5] Lord   --say     Lord   --indeed  Lord   --so      Allah  --say    
##  [9] Allah  --fear    say    --indeed  say    --day     say    --people 
## [13] say    --Pharaoh say    --so      say    --enemy   say    --fear   
## [17] say    --fire    say    --go      say    --hasten  say    --throw  
## [21] say    --when    say    --see     say    --then    say    --thus   
## [25] indeed --so      indeed --away    indeed --come    indeed --fire   
## [29] indeed --do      indeed --therein Pharaoh--so      so     --follow 
## + ... omitted several edges

ggraph(jg2, "centrality", cent=graph.strength(jg2)) +
  draw_circle(col = "darkblue", use = "focus", max.circle = 3) +
  geom_edge_link0(edge_color="grey66") +
  geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
  geom_node_text(aes(size=2.5,label=name), repel=T) +
  scale_edge_width_continuous(range=c(0.1,2.0)) +
  scale_size_continuous(range=c(1,10)) +
  scale_fill_manual(values=got_palette) +
  theme_graph(title_size = 16, subtitle_size = 14) +
  labs(title = "Surah Taa-Haa Undirected Word Network",
       subtitle = "Centrality Layout : graph.strength with clusters") +
    theme(legend.position = "bottom")

ggraph(jg2, "centrality", cent=graph.coreness(jg2)) +
  draw_circle(col = "darkblue", use = "focus", max.circle = 3) +
  geom_edge_link0(edge_color="grey66") +
  geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
  geom_node_text(aes(size=2.5,label=name), repel=T) +
  scale_edge_width_continuous(range=c(0.1,2.0)) +
  scale_size_continuous(range=c(1,10)) +
  scale_fill_manual(values=got_palette) +
  theme_graph(title_size = 16, subtitle_size = 14) +
  labs(title = "Surah Taa-Haa Undirected Word Network",
       subtitle = "Centrality Layout : graph.coreness with clusters") +
    theme(legend.position = "bottom")

10 Summary

We covered the characteristics of networks using our example of the word co-occurrence network from Surah Taa-Haa. We showed how to use the functions from igraph and tidygraph that measure these characteristics. We also showed different ways to use ggraph and its layout formats to visualize the network and its related measures.

The main objective is to understand the position and/or importance of a node in the network. The individual characteristics of nodes can be described by

Degree
Clustering
Distance to other nodes

The centrality of a node reflects its influence, power, and importance. There are four different types of centrality measures.

Degree - connectedness
Eigenvectors - Influence, Prestige, “not what you know, but who you know”
Betweenness - importance as an intermediary, connector
Closeness, Decay – ease of reaching other nodes

Simplifying the Complexity

Global patterns of networks: degree distributions
Local patterns: Clustering
Positions in networks: Centrality
Segregation Patterns: node types and homophily

We ended the tutorial with examples of using ggraph with the stress, layout_with_focus() and layout_with_centrality() from the graphlayouts package. These examples will be very useful in our future work.

In concluding, we refer to some interesting points about areas of research in networks.¹⁰ Why Study Networks?

Many economic, political, and social interac;ons are shaped by the local
structure of relationships:
- trade of goods and services
- sharing of information, favors, risk, …
- transmission of viruses, opinions…
- access to info about jobs…
- choices of behavior, education, …
- political alliances, trade alliances…
Social networks influence behavior
- crime, employment, human capital, voting, smoking,…
- networks exhibit heterogeneity, but also have enough underlying
  structure to model
Pure interest in social structure
- understand social network structure

Primary Questions:

What do we know about network structure?
How do networks form? How do the efficient networks form?
How do networks influence behavior?
- Diffusion, learning, peer effects, trade, inequality, polarization …
Dynamics, feedback,

What are important areas for future research? Three Areas for Research

Theory
- network formation, dynamics, design…
- how networks influence behavior
- coevolution?
Empirical and experimental work
- observe networks, patterns, influence
- test theory and identify regularities
Methodology
- how to measure and analyze networks

The points in bold and italics are what we can relate to our work on #qurananalytics.

11 Appendix 1 : Network Characteristic Measures

We run selected network measures available in graph_measures: Graph measurements *

wordnetwork %>% edge_connectivity()

## [1] 0

# graph_adhesion() Gives the minimum edge connectivity. Wraps igraph::edge_connectivity()

# wordnetwork %>% assortativity(1) # graph_assortativity() Measures the propensity of similar nodes to be connected. Wraps igraph::assortativity()

wordnetwork %>% automorphisms()

## $nof_nodes
## [1] 33
## 
## $nof_leaf_nodes
## [1] 5
## 
## $nof_bad_nodes
## [1] 0
## 
## $nof_canupdates
## [1] 1
## 
## $max_level
## [1] 13
## 
## $group_size
## [1] "82944"

# graph_automorphisms: Calculate the number of automorphisms of the graph. Wraps igraph::automorphisms()

wordnetwork %>% clique_num()

## [1] 4

# graph_clique_num: Get the size of the largest clique. Wraps igraph::clique_num()

wordnetwork %>% count_max_cliques()

## [1] 36

# graph_clique_count: Get the number of maximal cliques in the graph. Wraps igraph::count_max_cliques()

wordnetwork %>% count_components()

## [1] 4

# graph_component_count: Count the number of unconnected componenets in the graph. Wraps igraph::count_components()

wordnetwork %>% count_motifs()

## [1] 241

# graph_motif_count: Count the number of motifs in a graph. Wraps igraph::count_motifs()

wordnetwork %>% diameter()

## [1] 5

# graph_diameter: Measures the length of the longest geodesic. Wraps igraph::diameter()

wordnetwork %>% girth()

## $girth
## [1] 3
## 
## $circle
## + 3/41 vertices, named, from 0b65863:
## [1] Lord Mose say

# graph_girth: Measrues the length of the shortest circle in the graph. Wraps igraph::girth()

wordnetwork %>% radius() # graph_radius: Measures the smallest eccentricity in the graph. Wraps igraph::radius()

## [1] 1

wordnetwork %>% dyad_census()

## $mut
## [1] 0
## 
## $asym
## [1] 50
## 
## $null
## [1] 770

# graph_mutual_count: Counts the number of mutually connected nodes. Wraps igraph::dyad_census()

wordnetwork %>% dyad_census()

## $mut
## [1] 0
## 
## $asym
## [1] 50
## 
## $null
## [1] 770

# graph_asym_count: Counts the number of asymmetrically connected nodes. Wraps igraph::dyad_census()

wordnetwork %>% dyad_census()

## $mut
## [1] 0
## 
## $asym
## [1] 50
## 
## $null
## [1] 770

# graph_unconn_count: Counts the number of unconnected node pairs. Wraps igraph::dyad_census()

wordnetwork %>% gsize()

## [1] 50

# graph_size: Counts the number of edges in the graph. Wraps igraph::gsize()

wordnetwork %>% gorder()

## [1] 41

# graph_order: Counts the number of nodes in the graph. Wraps igraph::gorder()

wordnetwork %>% reciprocity()

## [1] 0

# graph_reciprocity: Measures the proportion of mutual connections in the graph. Wraps igraph::reciprocity()

wordnetwork %>% min_cut()

## [1] 0

# graph_min_cut: Calculates the minimum number of edges to remove in order to split the graph into two clusters. Wraps igraph::min_cut()

wordnetwork %>% mean_distance()

## [1] 2.089202

# graph_mean_dist: Calculates the mean distance between all node pairs in the graph. Wraps igraph::mean_distance()

# wordnetwork %>% graph_modularity: Calculates the modularity of the graph contingent on a provided node grouping

12 Appendix 2 : Network Centrality Measures

We run selected network measures available in centrality: Calculate node and edge centrality.

The centrality of a node measures the importance of node in the network. As the concept of importance is ill-defined and dependent on the network and the questions under consideration, many centrality measures exist. tidygraph provides a consistent set of wrappers for all the centrality measures implemented in igraph for use inside dplyr::mutate() and other relevant verbs. All functions provided by tidygraph have a consistent naming scheme and automatically calls the function on the graph, returning a vector with measures ready to be added to the node data. Further tidygraph provides access to the netrankr engine for centrality calculations and define a number of centrality measures based on that, as well as provide a manual mode for specifying more-or-less any centrality score.

Same comment. tidygraph example later. Also includes

wordnetwork %>% alpha_centrality() # centrality_alpha(): Wrapper for igraph::alpha_centrality()

##       Mose       Lord      Allah        say     indeed   enduring        day 
##         11          9          1         54          8          1          1 
##     people    Pharaoh         so       cast       away       come   Merciful 
##          1         12         86          1          1          1          1 
##       more      enemy       fear       fire         go     hasten     follow 
##          2          2          2          4          1          1          1 
##  messenger         be      bring     family       find         do      throw 
##          1          1          1          1          1          1        141 
##       when        see       then       most punishment      river     severe 
##         55         55        142          2          3          4          3 
##     surely    therein       thus      deity   guidance       here 
##          3          9         55          2          5          5

wordnetwork %>% authority_score() # centrality_authority: Wrapper for igraph::authority_score()

## $vector
##         Mose         Lord        Allah          say       indeed     enduring 
## 1.363766e-01 1.178793e-01 3.188827e-17 1.000000e+00 1.374766e-01 1.594413e-17 
##          day       people      Pharaoh           so         cast         away 
## 1.594413e-17 1.594413e-17 7.538405e-02 4.458380e-01 3.188827e-17 1.594413e-17 
##         come     Merciful         more        enemy         fear         fire 
## 0.000000e+00 1.594413e-17 0.000000e+00 6.184079e-03 7.538405e-02 6.377654e-17 
##           go       hasten       follow    messenger           be        bring 
## 1.594413e-17 1.594413e-17 1.594413e-17 1.594413e-17 1.594413e-17 1.594413e-17 
##       family         find           do        throw         when          see 
## 1.594413e-17 1.594413e-17 1.594413e-17 6.393580e-02 5.318703e-02 5.318703e-02 
##         then         most   punishment        river       severe       surely 
## 8.939971e-02 0.000000e+00 0.000000e+00 8.203431e-02 0.000000e+00 0.000000e+00 
##      therein         thus        deity     guidance         here 
## 1.178793e-01 5.318703e-02 0.000000e+00 9.273861e-02 9.273861e-02 
## 
## $value
## [1] 14.26541
## 
## $options
## $options$bmat
## [1] "I"
## 
## $options$n
## [1] 41
## 
## $options$which
## [1] "LM"
## 
## $options$nev
## [1] 1
## 
## $options$tol
## [1] 0
## 
## $options$ncv
## [1] 0
## 
## $options$ldv
## [1] 0
## 
## $options$ishift
## [1] 1
## 
## $options$maxiter
## [1] 1000
## 
## $options$nb
## [1] 1
## 
## $options$mode
## [1] 1
## 
## $options$start
## [1] 1
## 
## $options$sigma
## [1] 0
## 
## $options$sigmai
## [1] 0
## 
## $options$info
## [1] 0
## 
## $options$iter
## [1] 1
## 
## $options$nconv
## [1] 1
## 
## $options$numop
## [1] 20
## 
## $options$numopb
## [1] 0
## 
## $options$numreo
## [1] 18

wordnetwork %>% estimate_betweenness(cutoff = NULL) # centrality_betweenness: Wrapper for igraph::betweenness() and igraph::estimate_betweenness()

##       Mose       Lord      Allah        say     indeed   enduring        day 
##          0          0          0          0          0          0          0 
##     people    Pharaoh         so       cast       away       come   Merciful 
##          0          0          0          0          0          0          0 
##       more      enemy       fear       fire         go     hasten     follow 
##          0          0          0          0          0          0          0 
##  messenger         be      bring     family       find         do      throw 
##          0          0          0          0          0          0          0 
##       when        see       then       most punishment      river     severe 
##          0          0          0          0          0          0          0 
##     surely    therein       thus      deity   guidance       here 
##          0          0          0          0          0          0

wordnetwork %>% power_centrality() # centrality_power: Wrapper for igraph::power_centrality()

##       Mose       Lord      Allah        say     indeed   enduring        day 
## 0.79871626 1.27068496 0.68980041 0.29044228 1.77895894 0.14522114 0.32674756 
##     people    Pharaoh         so       cast       away       come   Merciful 
## 0.32674756 0.43566341 0.07261057 0.43566341 1.81526423 2.68659105 0.03630528 
##       more      enemy       fear       fire         go     hasten     follow 
## 0.10891585 0.36305285 0.32674756 2.21462236 0.32674756 0.32674756 0.10891585 
##  messenger         be      bring     family       find         do      throw 
## 0.10891585 0.03630528 2.25092764 2.25092764 2.25092764 1.81526423 0.00000000 
##       when        see       then       most punishment      river     severe 
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
##     surely    therein       thus      deity   guidance       here 
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000

wordnetwork %>% estimate_closeness(cutoff = NULL) # centrality_closeness: Wrapper for igraph::closeness() and igraph::estimate_closeness()

##         Mose         Lord        Allah          say       indeed     enduring 
## 0.0006410256 0.0006578947 0.0006410256 0.0007142857 0.0006756757 0.0006250000 
##          day       people      Pharaoh           so         cast         away 
## 0.0006250000 0.0006250000 0.0006410256 0.0006410256 0.0006410256 0.0006250000 
##         come     Merciful         more        enemy         fear         fire 
## 0.0006578947 0.0006250000 0.0006578947 0.0006410256 0.0006250000 0.0006756757 
##           go       hasten       follow    messenger           be        bring 
## 0.0006250000 0.0006250000 0.0006250000 0.0006250000 0.0006250000 0.0006250000 
##       family         find           do        throw         when          see 
## 0.0006250000 0.0006250000 0.0006250000 0.0006097561 0.0006097561 0.0006097561 
##         then         most   punishment        river       severe       surely 
## 0.0006097561 0.0006097561 0.0006097561 0.0006097561 0.0006097561 0.0006097561 
##      therein         thus        deity     guidance         here 
## 0.0006097561 0.0006097561 0.0006097561 0.0006097561 0.0006097561

wordnetwork %>% eigen_centrality() # centrality_eigen: Wrapper for igraph::eigen_centrality()

## $vector
##       Mose       Lord      Allah        say     indeed   enduring        day 
## 0.38927855 0.47638549 0.22895116 1.00000000 0.55703876 0.00000000 0.18629801 
##     people    Pharaoh         so       cast       away       come   Merciful 
## 0.18629801 0.37261027 0.61079790 0.04663107 0.10377522 0.24055174 0.00000000 
##       more      enemy       fear       fire         go     hasten     follow 
## 0.00000000 0.20367257 0.22895116 0.35098055 0.18629801 0.18629801 0.11379044 
##  messenger         be      bring     family       find         do      throw 
## 0.11379044 0.00000000 0.06538698 0.06538698 0.06538698 0.10377522 0.30008845 
##       when        see       then       most punishment      river     severe 
## 0.18629801 0.18629801 0.34490276 0.00000000 0.00000000 0.04663107 0.00000000 
##     surely    therein       thus      deity   guidance       here 
## 0.00000000 0.10377522 0.18629801 0.00000000 0.06538698 0.06538698 
## 
## $value
## [1] 5.367744
## 
## $options
## $options$bmat
## [1] "I"
## 
## $options$n
## [1] 41
## 
## $options$which
## [1] "LA"
## 
## $options$nev
## [1] 1
## 
## $options$tol
## [1] 0
## 
## $options$ncv
## [1] 0
## 
## $options$ldv
## [1] 0
## 
## $options$ishift
## [1] 1
## 
## $options$maxiter
## [1] 1000
## 
## $options$nb
## [1] 1
## 
## $options$mode
## [1] 1
## 
## $options$start
## [1] 1
## 
## $options$sigma
## [1] 0
## 
## $options$sigmai
## [1] 0
## 
## $options$info
## [1] 0
## 
## $options$iter
## [1] 1
## 
## $options$nconv
## [1] 1
## 
## $options$numop
## [1] 20
## 
## $options$numopb
## [1] 0
## 
## $options$numreo
## [1] 13

wordnetwork %>% hub_score() # centrality_hub: Wrapper for igraph::hub_score()

## $vector
##       Mose       Lord      Allah        say     indeed   enduring        day 
## 0.63950176 0.94090016 0.63950176 0.45119892 1.00000000 0.00000000 0.59467291 
##     people    Pharaoh         so       cast       away       come   Merciful 
## 0.59467291 0.85980069 0.09118447 0.05246109 0.08175364 0.21601669 0.00000000 
##       more      enemy       fear       fire         go     hasten     follow 
## 0.00000000 0.64345649 0.59467291 0.78672482 0.59467291 0.59467291 0.26512778 
##  messenger         be      bring     family       find         do      throw 
## 0.26512778 0.00000000 0.00000000 0.00000000 0.00000000 0.08175364 0.00000000 
##       when        see       then       most punishment      river     severe 
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
##     surely    therein       thus      deity   guidance       here 
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
## 
## $value
## [1] 14.26541
## 
## $options
## $options$bmat
## [1] "I"
## 
## $options$n
## [1] 41
## 
## $options$which
## [1] "LM"
## 
## $options$nev
## [1] 1
## 
## $options$tol
## [1] 0
## 
## $options$ncv
## [1] 0
## 
## $options$ldv
## [1] 0
## 
## $options$ishift
## [1] 1
## 
## $options$maxiter
## [1] 1000
## 
## $options$nb
## [1] 1
## 
## $options$mode
## [1] 1
## 
## $options$start
## [1] 1
## 
## $options$sigma
## [1] 0
## 
## $options$sigmai
## [1] 0
## 
## $options$info
## [1] 0
## 
## $options$iter
## [1] 1
## 
## $options$nconv
## [1] 1
## 
## $options$numop
## [1] 20
## 
## $options$numopb
## [1] 0
## 
## $options$numreo
## [1] 16

wordnetwork %>% page_rank() # centrality_pagerank: Wrapper for igraph::page_rank()

## $vector
##       Mose       Lord      Allah        say     indeed   enduring        day 
## 0.02129679 0.02132716 0.01188631 0.12134718 0.04442754 0.01188631 0.01188631 
##     people    Pharaoh         so       cast       away       come   Merciful 
## 0.01188631 0.02093744 0.07366583 0.01188631 0.01188631 0.01188631 0.01188631 
##       more      enemy       fear       fire         go     hasten     follow 
## 0.02198966 0.01693799 0.01693799 0.04219638 0.01188631 0.01188631 0.01188631 
##  messenger         be      bring     family       find         do      throw 
## 0.01188631 0.01188631 0.01188631 0.01188631 0.01188631 0.01188631 0.06038514 
##       when        see       then       most punishment      river     severe 
## 0.02907716 0.02907716 0.06375292 0.02198966 0.01811671 0.02413663 0.01811671 
##     surely    therein       thus      deity   guidance       here 
## 0.01811671 0.02132716 0.02907716 0.02198966 0.02085304 0.02085304 
## 
## $value
## [1] 1
## 
## $options
## NULL

wordnetwork %>% subgraph_centrality() # centrality_subgraph: Wrapper for igraph::subgraph_centrality()

##       Mose       Lord      Allah        say     indeed   enduring        day 
##          0          0          2          0          0          5          1 
##     people    Pharaoh         so       cast       away       come   Merciful 
##          1          0          0          3          1          1          2 
##       more      enemy       fear       fire         go     hasten     follow 
##          0          0          0          0          1          1          1 
##  messenger         be      bring     family       find         do      throw 
##          1          2          6          6          6          1          0 
##       when        see       then       most punishment      river     severe 
##          0          0          0          0          0          0          0 
##     surely    therein       thus      deity   guidance       here 
##          0          0          0          0          0          0

wordnetwork %>% strength() # centrality_degree: Wrapper for igraph::degree() and igraph::strength()

##       Mose       Lord      Allah        say     indeed   enduring        day 
##          4          4          2         18          8          1          1 
##     people    Pharaoh         so       cast       away       come   Merciful 
##          1          3          8          2          1          3          1 
##       more      enemy       fear       fire         go     hasten     follow 
##          4          3          2          7          1          1          1 
##  messenger         be      bring     family       find         do      throw 
##          1          1          1          1          1          1          2 
##       when        see       then       most punishment      river     severe 
##          1          1          3          1          1          2          1 
##     surely    therein       thus      deity   guidance       here 
##          1          1          1          1          1          1

wordnetwork %>% edge_betweenness() # centrality_edge_betweenness: Wrapper for igraph::edge_betweenness()

##  [1]  8.833333  5.000000  7.000000 11.500000 17.333333  2.000000 17.166667
##  [8] 22.000000 20.000000  4.000000  1.000000 16.000000  7.000000  7.000000
## [15]  5.000000 20.000000  9.333333  5.500000  8.000000 12.000000  4.333333
## [22]  1.000000 10.500000  2.000000  1.000000  1.000000 14.000000  7.000000
## [29] 26.000000  7.000000  7.000000  2.000000  3.000000  3.000000  2.500000
## [36]  2.000000  1.000000 16.500000  8.000000  5.833333 20.000000  1.000000
## [43] 15.000000 15.000000 15.000000  4.000000  4.000000  6.666667 12.000000
## [50] 22.000000

There are others that tidygraph adapts from the netrankr package. It requires the package to be installed. We list it here just to show the variety of centrality measures.

wordnetwork %>% centrality_manual # Manually specify centrality score using the netrankr framework

wordnetwork %>% centrality_closeness_harmonic # centrality based on inverse shortest path

wordnetwork %>% centrality_closeness_residual # centrality based on 2-to-the-power-of negative shortest path

wordnetwork %>% centrality_closeness_generalised # centrality based on alpha-to-the-power-of negative shortest path

wordnetwork %>% centrality_integration # centrality based on 1 - (x - 1)/max(x) transformation of shortest path

wordnetwork %>% centrality_communicability # centrality an exponential transformation of walk counts

wordnetwork %>% centrality_communicability_odd # centrality an exponential transformation of odd walk counts

wordnetwork %>% centrality_communicability_even # centrality an exponential transformation of even walk counts

wordnetwork %>% centrality_subgraph_odd # subgraph centrality based on odd walk counts

wordnetwork %>% centrality_subgraph_even # subgraph centrality based on even walk counts

centrality_katz: centrality based on walks penalizing distant nodes (netrankr)

centrality_betweenness_network: Betweenness centrality based on network flow (netrankr)

centrality_betweenness_current: Betweenness centrality based on current flow (netrankr)

centrality_betweenness_communicability: Betweenness centrality based on communicability (netrankr)

centrality_betweenness_rsp_simple: Betweenness centrality based on simple randomised shortest path dependencies (netrankr)

centrality_betweenness_rsp_net: Betweenness centrality based on net randomised shortest path dependencies (netrankr)

centrality_information: centrality based on inverse sum of resistance distance between nodes (netrankr)

centrality_decay: based on a power transformation of the shortest path (netrankr)

centrality_random_walk: centrality based on the inverse sum of expected random walk length between nodes (netrankr)

centrality_expected: Expected centrality ranking based on exact rank probability (netrankr)

Network Analysis of Word Co-occurences in Surah Taa-Haa

Customized Tutorial on Characteristics and Statistics of Networks

Azman Hussin

2020-12-30