This post is a continuation of our previous post on our project on #qurananalytics. In the summary, we mentioned that the study opens other investigation avenues like looking into other Surahs of the Quran or analyzing different aspects of the Quran structure. So, in this post we analyze Surah Taa-Haa (Surah 20) which details the story of Prophet Musa (Moses/Mose). It will also serve as a simple tutorial on the characteristics and statistics of networks. All our posts on #qurananalytics so far, have used the tools of #networkscience with the igraph, tidygraph and ggraph packages. We will also discuss centrality and other network graph characteristics while exploring some layouts provided by tidygraph and ggraph.
packages=c('dplyr', 'tidyverse', 'udpipe', 'ggplot2', 'graphlayouts',
'igraph', 'tidygraph', 'ggraph', 'knitr', 'quRan')
for (p in packages){
if (! require (p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
# during first time model download execute the below line too
# udmodel <- udpipe_download_model(language = "english")
setwd("F:/RProjects")
# Load the model
udmodel <- udpipe_load_model(file = 'english-ewt-ud-2.5-191206.udpipe')
We start by annotating Surah Taa-Haa. The annotated dataframe can next be used for basic text analytics.
# Select the surah
Q01 <- quran_en_sahih %>% filter(surah == 20)
x <- udpipe_annotate(udmodel, x = Q01$text, doc_id = Q01$ayah_title)
x <- as.data.frame(x)
Analyzing multi-word expressions should be interesting. We can get multi-word expressions by looking either at collocations (words following one another), at word co-occurrences within each sentence or at word co-occurrences of words which are close in the neighborhood of one another.
Co-occurrences allow to see how words are used either in the same sentence or next to each other. The udpipe package makes creating co-occurrence graphs using the relevant POS (Parts of Speech) tags easy.
We look how many times nouns, proper nouns, adjectives, verbs, adverbs, and numbers are used in the same verse.
cooccur <- cooccurrence(x = subset(x, upos %in% c("NOUN", "PROPN", "VERB",
"ADJ", "ADV", "NUM")),
term = "lemma",
group = c("doc_id", "paragraph_id", "sentence_id"))
head(cooccur)
The result can be easily visualized using the igraph and ggraph R packages. We chose the top 50 occurrences for the tutorial so the plots do not clutter.
library(igraph)
library(ggraph)
library(ggplot2)
wordnetwork <- head(cooccur, 50)
wordnetwork <- graph_from_data_frame(wordnetwork)
wordnetwork
## IGRAPH 0b65863 DN-- 41 50 --
## + attr: name (v/c), cooc (e/n)
## + edges from 0b65863 (vertex names):
## [1] Mose ->say Lord ->say Allah ->say
## [4] say ->so indeed ->say Lord ->so
## [7] say ->throw indeed ->Lord say ->when
## [10] enduring->more Allah ->fear Lord ->Mose
## [13] day ->say people ->say Pharaoh ->say
## [16] say ->see indeed ->so so ->then
## [19] cast ->enemy away ->indeed come ->Mose
## [22] Merciful->most Mose ->Pharaoh more ->punishment
## + ... omitted several edges
ggraph(wordnetwork, layout = "kk") +
geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_color = "deeppink") +
geom_node_point(aes(size = igraph::degree(wordnetwork)), shape = 1, color = "black") +
geom_node_text(aes(label = name), col = "darkblue", size = 3) +
labs(title = "Co-occurrences Within Sentence",
subtitle = "Top 50 Nouns, Names, Adjectives, Verbs, Adverbs",
caption = "Surah Taa-Haa (Saheeh International)")
The base wordnetwork graph is a directed network with 41 nodes and 50 edges (DN– 41 50 –). The nodes have the name attribute. The edges have the cooc attribute (+ attr: name (v/c), cooc (e/n))
The story is revealed by Allah (SWT). The main characters are Mose, his brother Aaron, Pharaoh, and the magicians. So the verb “say” dominates since it is a narrated story. It is interesting to see the strong link and occurrence of “fear” with “Allah”.
This “introductory tutorial” should be useful for those new to #networkscience.1 We will be using the network graph tools frequently in our work on #qurananalytics, thus a tutorial on the network characteristics using an example from the Quran should be helpful.
We will be looking at various functions related to
This set of functions provide wrappers to a number of graph statistic algorithms in ìgraph. They are intended for use inside the tidygraph framework and some should not be called directly. Thus we will mix and match the use of the relevant igraph and/or tidygraph functions.
We will follow the structured presentation on Network Characteristics2 and Centrality Measures3 or easy reference.
The Appendix section shows the results of the selected igraph functions for our wordnetwork graph that we may not use in the following discussion.
Our wordnetwork is a directed network. We will also create an undirected version of network.
gd <- wordnetwork
gu <- simplify(as.undirected(wordnetwork))
gd
## IGRAPH 0b65863 DN-- 41 50 --
## + attr: name (v/c), cooc (e/n)
## + edges from 0b65863 (vertex names):
## [1] Mose ->say Lord ->say Allah ->say
## [4] say ->so indeed ->say Lord ->so
## [7] say ->throw indeed ->Lord say ->when
## [10] enduring->more Allah ->fear Lord ->Mose
## [13] day ->say people ->say Pharaoh ->say
## [16] say ->see indeed ->so so ->then
## [19] cast ->enemy away ->indeed come ->Mose
## [22] Merciful->most Mose ->Pharaoh more ->punishment
## + ... omitted several edges
gu
## IGRAPH 0bec5fe UN-- 41 50 --
## + attr: name (v/c)
## + edges from 0bec5fe (vertex names):
## [1] Mose --Lord Mose --say Mose --Pharaoh Mose --come
## [5] Lord --say Lord --indeed Lord --so Allah --say
## [9] Allah --fear say --indeed say --day say --people
## [13] say --Pharaoh say --so say --enemy say --fear
## [17] say --fire say --go say --hasten say --throw
## [21] say --when say --see say --then say --thus
## [25] indeed --so indeed --away indeed --come indeed --fire
## [29] indeed --do indeed --therein enduring--more Pharaoh --so
## + ... omitted several edges
The igraph notation for directed edges uses -> (Mose ->say) and – for undirected edges (Mose –Allah).
Generally speaking, we can measure network properties at the level of nodes (centrality measures) or at the level of the network (global measures). If we are interested in the position of nodes within a network, then we are measuring something at the node-level. If we want to understand the structure of the network as a whole, we are measuring something at the network-level. Network analysis often combines both.
We will use the two networks, gu and gd, that we created in this section. For the early examples we will mainly use the igraph package and the basic plot functions. In the later sections we will show some of the same examples together with new ones using the tidygraph and ggraph packages.
set.seed(10)
l = layout_with_kk(gd)
plot(gd, layout=l, vertex.label="", vertex.color="gold", edge.color="limegreen", edge.width = E(gd)$cooc)
# Using ggraph
ggraph(gd, layout = "kk") +
geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_color = "deeppink") +
geom_node_point(size = 3, color = "steelblue") +
labs(title = "Co-occurrences Within Sentence",
subtitle = "Top 50 Nouns, Names, Adjectives, Verbs, Adverbs",
caption = "Surah Taa-Haa (Saheeh International)")
Centrality relates to measures of a node’s position in the network. The main objective is to understand the position and/or importance of a node in the network. The individual characteristics of nodes can be described by4
The centrality of a node reflects its influence, power, and importance. There are four different types of centrality measures.
There are many such centrality measures. It can be difficult to go through all of the available centrality measures. We will introduce just a few examples.
Appendix 1 shows other measures.
The most straight-forward centrality measure is degree centrality. Degree centrality is simply the number of edges connected to a given node. In a social network, this might mean the number of friends an individual has. We will calculate and visualize the degree centrality by varying the node sizes proportional to degree centrality.
degree(gd)
## Mose Lord Allah say indeed enduring day
## 4 4 2 18 8 1 1
## people Pharaoh so cast away come Merciful
## 1 3 8 2 1 3 1
## more enemy fear fire go hasten follow
## 4 3 2 7 1 1 1
## messenger be bring family find do throw
## 1 1 1 1 1 1 2
## when see then most punishment river severe
## 1 1 3 1 1 2 1
## surely therein thus deity guidance here
## 1 1 1 1 1 1
set.seed(10)
deg = igraph::degree(gd)
sort(deg, decreasing = TRUE)
## say indeed so fire Mose Lord more
## 18 8 8 7 4 4 4
## Pharaoh come enemy then Allah cast fear
## 3 3 3 3 2 2 2
## throw river enduring day people away Merciful
## 2 2 1 1 1 1 1
## go hasten follow messenger be bring family
## 1 1 1 1 1 1 1
## find do when see most punishment severe
## 1 1 1 1 1 1 1
## surely therein thus deity guidance here
## 1 1 1 1 1 1
plot(gd, layout=l, vertex.label="", vertex.color="gold", edge.color="royalblue", vertex.size=deg*2, edge.width=E(gd)$coor)
# Using ggraph
ggraph(gd, layout = "kk") +
geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_color = "lightseagreen") +
geom_node_point(size = deg, color = "gold3") +
labs(title = "Word Co-occurrences Network",
subtitle = "Node size = degree",
caption = "Surah Taa-Haa (Saheeh International)")
In weighted networks, we can also use node strength, which is the sum of the weights of edges connected to the node. Let’s calculate node strength and plot the node sizes as proportional to these values.
set.seed(10)
st = graph.strength(gd)
sort(st, decreasing = TRUE)
## say indeed so fire Mose Lord more
## 18 8 8 7 4 4 4
## Pharaoh come enemy then Allah cast fear
## 3 3 3 3 2 2 2
## throw river enduring day people away Merciful
## 2 2 1 1 1 1 1
## go hasten follow messenger be bring family
## 1 1 1 1 1 1 1
## find do when see most punishment severe
## 1 1 1 1 1 1 1
## surely therein thus deity guidance here
## 1 1 1 1 1 1
plot(gd, layout=l, vertex.label="", vertex.color="gold", edge.color="royalblue", edge.width=E(gd)$coor, vertex.size=st)
# Using ggraph
ggraph(gd, layout = "kk") +
geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_color = "lightseagreen") +
geom_node_point(size = st, color = "gold3") +
geom_node_text(aes(filter=(st >= 3), size=st*2, label=name), repel=F) +
labs(title = "Word Co-occurrences Network",
subtitle = "Node size = graph.strength",
caption = "Surah Taa-Haa (Saheeh International)")
Compare the relative node sizes when plotting by degree vs. strength. What differences do you notice?. The top six words are the same say (18), indeed (8), so (8), fire (7), Mose (4), Lord (4).
Degree distribution: A frequency count of the occurrence of each degree.
Average degree: Let N be the number of nodes, and L be the number of edges:
sort(degree(gu), decreasing = TRUE)
## say indeed so fire Mose Lord more
## 18 8 8 7 4 4 4
## Pharaoh come enemy then Allah cast fear
## 3 3 3 3 2 2 2
## throw river enduring day people away Merciful
## 2 2 1 1 1 1 1
## go hasten follow messenger be bring family
## 1 1 1 1 1 1 1
## find do when see most punishment severe
## 1 1 1 1 1 1 1
## surely therein thus deity guidance here
## 1 1 1 1 1 1
mean(degree(gu))
## [1] 2.439024
degree(gu) %>% sum()
## [1] 100
degree.distribution(gu)
## [1] 0.00000000 0.60975610 0.12195122 0.09756098 0.07317073 0.00000000
## [7] 0.00000000 0.02439024 0.04878049 0.00000000 0.00000000 0.00000000
## [13] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## [19] 0.02439024
hist(degree.distribution(gu))
# Using ggplot
# Let's count the frequencies of each degree
deg = degree(gu)
d.histogram <- as.data.frame(table(deg))
# Need to convert the first column to numbers, otherwise
# the log-log thing will not work
d.histogram[,1] <- as.numeric(d.histogram[,1])
# Now, plot it!
ggplot(d.histogram, aes(x = deg, y = Freq)) +
geom_point() +
scale_x_continuous("Degree\n(nodes with this amount of connections)",
breaks = c(1, 3, 10, 30, 100, 300),
trans = "log10") +
scale_y_continuous("Frequency\n(how many of them)",
breaks = c(1, 3, 10, 30, 100, 300, 1000),
trans = "log10") +
ggtitle("Degree Distribution (log-log)")
degree(gd,mode="in",loops = FALSE)
## Mose Lord Allah say indeed enduring day
## 2 1 0 12 4 0 0
## people Pharaoh so cast away come Merciful
## 0 1 6 0 0 0 0
## more enemy fear fire go hasten follow
## 1 1 1 3 0 0 0
## messenger be bring family find do throw
## 0 0 0 0 0 0 2
## when see then most punishment river severe
## 1 1 3 1 1 2 1
## surely therein thus deity guidance here
## 1 1 1 1 1 1
degree(gd,mode="out",loops = FALSE)
## Mose Lord Allah say indeed enduring day
## 2 3 2 6 4 1 1
## people Pharaoh so cast away come Merciful
## 1 2 2 2 1 3 1
## more enemy fear fire go hasten follow
## 3 2 1 4 1 1 1
## messenger be bring family find do throw
## 1 1 1 1 1 1 0
## when see then most punishment river severe
## 0 0 0 0 0 0 0
## surely therein thus deity guidance here
## 0 0 0 0 0 0
degree(gd,mode="in",loops = FALSE) %>% mean()
## [1] 1.219512
degree(gd,mode="out",loops = FALSE) %>% mean()
## [1] 1.219512
hist(degree.distribution(gd, mode="in"))
hist(degree.distribution(gd, mode="out"))
We now do the same for betweenness centrality. It is defined as the number of geodesic paths (shortest paths) that go through a given node. Nodes with high betweenness might be influential in a network if, for example, they capture the most amount of information flowing through the network because the information tends to flow through them.
betw = betweenness(gd, normalized=F)
plot(gd, layout=l, vertex.label="", vertex.color="gold", edge.color="royalblue", vertex.size=betw*0.2, edge.width=E(gd)$cooc)
# calculate the betweenness centrality
sort(betweenness(gu), decreasing = TRUE)
## say fire indeed so enemy Mose then
## 338.600000 140.000000 111.733333 67.566667 58.000000 6.366667 6.233333
## more come Lord Pharaoh Allah enduring day
## 6.000000 3.166667 2.333333 1.000000 0.000000 0.000000 0.000000
## people cast away Merciful fear go hasten
## 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## follow messenger be bring family find do
## 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## throw when see most punishment river severe
## 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## surely therein thus deity guidance here
## 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# calculate the standardized betweenness centrality
betwS <- 2*betweenness(gu)/((vcount(gu) - 1)*(vcount(gu)-2))
sort(betwS, decreasing = TRUE)
## say fire indeed so enemy Mose
## 0.434102564 0.179487179 0.143247863 0.086623932 0.074358974 0.008162393
## then more come Lord Pharaoh Allah
## 0.007991453 0.007692308 0.004059829 0.002991453 0.001282051 0.000000000
## enduring day people cast away Merciful
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
## fear go hasten follow messenger be
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
## bring family find do throw when
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
## see most punishment river severe surely
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
## therein thus deity guidance here
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
# Using ggraph
ggraph(gd, layout = "kk") +
geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_color = "lightseagreen") +
geom_node_point(size = betw*0.2, color = "gold3") +
geom_node_text(aes(filter=(betw >= 5), size=betw*2, label=name), repel=F) +
labs(title = "Word Co-occurrences Network",
subtitle = "Node size = betweenness centrality",
caption = "Surah Taa-Haa (Saheeh International)")
We can see that there are three nodes (words = “say”, “indeed”, “fire”) that have qualitatively higher betweenness values than all other nodes in the network. One way to interpret this is that these are nodes that tend to act as “bridges” between different clusters of nodes in the network.
# calculate the degree centrality for business network
deg <- degree(gu, loops = FALSE)
sort(deg, decreasing = TRUE)
## say indeed so fire Mose Lord more
## 18 8 8 7 4 4 4
## Pharaoh come enemy then Allah cast fear
## 3 3 3 3 2 2 2
## throw river enduring day people away Merciful
## 2 2 1 1 1 1 1
## go hasten follow messenger be bring family
## 1 1 1 1 1 1 1
## find do when see most punishment severe
## 1 1 1 1 1 1 1
## surely therein thus deity guidance here
## 1 1 1 1 1 1
# calculate the standardized degree centrality
degS <- degree(gu, loops = FALSE)/(vcount(gu) - 1)
sort(degS, decreasing = TRUE)
## say indeed so fire Mose Lord more
## 0.450 0.200 0.200 0.175 0.100 0.100 0.100
## Pharaoh come enemy then Allah cast fear
## 0.075 0.075 0.075 0.075 0.050 0.050 0.050
## throw river enduring day people away Merciful
## 0.050 0.050 0.025 0.025 0.025 0.025 0.025
## go hasten follow messenger be bring family
## 0.025 0.025 0.025 0.025 0.025 0.025 0.025
## find do when see most punishment severe
## 0.025 0.025 0.025 0.025 0.025 0.025 0.025
## surely therein thus deity guidance here
## 0.025 0.025 0.025 0.025 0.025 0.025
sort(deg, decreasing = TRUE) %>% hist()
sort(degree(gd, mode='in'), decreasing = TRUE) %>% head(10)
## say so indeed fire then Mose throw river Lord Pharaoh
## 12 6 4 3 3 2 2 2 1 1
sort(degree(gd, mode='out'), decreasing = TRUE) %>% head(10)
## say indeed fire Lord come more Mose Allah Pharaoh so
## 6 4 4 3 3 3 2 2 2 2
plot(gd)
What does this say about the importance of these nodes? Well, that depends on the network and the questions–in particular how we might quantify ‘importance’ in our network. But clearly “say”, “indeed” “Mose”, “Allah, Lord”, “Pharaoh” are important words in the Sura. We have explained the importance of “say” in that the Surah is telling a story. “indeed” shows that the Quran is stressing the truth of some of the narrations.
Here’s a short list of some commonly-used centrality measures:
sort(closeness(gu), decreasing = TRUE) %>% head(10)
## say indeed fire so Lord Pharaoh
## 0.002421308 0.002352941 0.002336449 0.002325581 0.002304147 0.002283105
## then Mose enemy throw
## 0.002283105 0.002277904 0.002277904 0.002272727
# calculate the standardized closeness centrality
closeS <- closeness(gu)*(vcount(gu) - 1)
sort(closeS, decreasing = TRUE) %>% head(15)
## say indeed fire so Lord Pharaoh then
## 0.09685230 0.09411765 0.09345794 0.09302326 0.09216590 0.09132420 0.09132420
## Mose enemy throw Allah fear day people
## 0.09111617 0.09111617 0.09090909 0.09049774 0.09049774 0.09029345 0.09029345
## go
## 0.09029345
From the various plots in Section 6.3, there are some words outside the main network cluster. They are disconnected from the main network. Hence, there are some warning messages for disconnected graphs.
We will calculate the Eigenvector and PageRank centrality measures in the next section as we assemble a dataframe of node-level measures.
# calculate the degree centrality
deg <- degree(gu, loops = FALSE)
sort(deg, decreasing = TRUE) %>% head(15) # sort the nodes in decreasing order
## say indeed so fire Mose Lord more Pharaoh come enemy
## 18 8 8 7 4 4 4 3 3 3
## then Allah cast fear throw
## 3 2 2 2 2
# calculate the standardized degree centrality
degS <- degree(gu, loops = FALSE)/(vcount(gu) - 1)
sort(degS, decreasing = TRUE) %>% head(15) # sort the nodes in decreasing order
## say indeed so fire Mose Lord more Pharaoh come enemy
## 0.450 0.200 0.200 0.175 0.100 0.100 0.100 0.075 0.075 0.075
## then Allah cast fear throw
## 0.075 0.050 0.050 0.050 0.050
# calculate the closeness centrality
close <- closeness(gu)
sort(close, decreasing = TRUE) %>% head(15)
## say indeed fire so Lord Pharaoh
## 0.002421308 0.002352941 0.002336449 0.002325581 0.002304147 0.002283105
## then Mose enemy throw Allah fear
## 0.002283105 0.002277904 0.002277904 0.002272727 0.002262443 0.002262443
## day people go
## 0.002257336 0.002257336 0.002257336
# calculate the standardized closeness centrality
closeS <- closeness(gu)*(vcount(gu) - 1)
sort(closeS, decreasing = TRUE) %>% head(15)
## say indeed fire so Lord Pharaoh then
## 0.09685230 0.09411765 0.09345794 0.09302326 0.09216590 0.09132420 0.09132420
## Mose enemy throw Allah fear day people
## 0.09111617 0.09111617 0.09090909 0.09049774 0.09049774 0.09029345 0.09029345
## go
## 0.09029345
# calculate the Betweenness centrality
betw <- betweenness(gu)
sort(betw, decreasing = TRUE) %>% head(15)
## say fire indeed so enemy Mose then
## 338.600000 140.000000 111.733333 67.566667 58.000000 6.366667 6.233333
## more come Lord Pharaoh Allah enduring day
## 6.000000 3.166667 2.333333 1.000000 0.000000 0.000000 0.000000
## people
## 0.000000
# calculate the standardized Betweenness centrality
betwS <- 2 * betweenness(gu)/((vcount(gu) - 1) * (vcount(gu)-2))
sort(betwS, decreasing = TRUE) %>% head(15)
## say fire indeed so enemy Mose
## 0.434102564 0.179487179 0.143247863 0.086623932 0.074358974 0.008162393
## then more come Lord Pharaoh Allah
## 0.007991453 0.007692308 0.004059829 0.002991453 0.001282051 0.000000000
## enduring day people
## 0.000000000 0.000000000 0.000000000
# calculate the Eigenvector centrality
eigen <- evcent(gu)
sort(eigen[[1]], decreasing = TRUE) %>% head(15)
## say so indeed Lord Mose Pharaoh fire then
## 1.0000000 0.6107979 0.5570388 0.4763855 0.3892785 0.3726103 0.3509806 0.3449028
## throw come fear Allah enemy day people
## 0.3000885 0.2405517 0.2289512 0.2289512 0.2036726 0.1862980 0.1862980
# calculate the PageRank centrality
page <- page.rank(gu)
sort(page[[1]], decreasing = TRUE) %>% head(15)
## say fire indeed so more Lord Mose
## 0.14509073 0.06830736 0.06585466 0.06218651 0.05800923 0.03058829 0.03046530
## enemy Merciful be most deity come then
## 0.02739062 0.02439024 0.02439024 0.02439024 0.02439024 0.02389786 0.02388842
## Pharaoh
## 0.02359124
dfu <- data.frame(degS, closeS, betwS, eigen[[1]], page[[1]])
Pearson_correlation_matrix <- cor(dfu) # Pearson correlation matrix
Spearman_correlation_matrix <- cor(dfu, method = "spearman") # Spearman correlation matrix
cor(dfu, method = "kendall") # Kendall correlation matrix
## degS closeS betwS eigen..1.. page..1..
## degS 1.0000000 0.5692808 0.8182369 0.57440481 0.68519657
## closeS 0.5692808 1.0000000 0.5384856 0.88204673 0.10871067
## betwS 0.8182369 0.5384856 1.0000000 0.52761075 0.61998358
## eigen..1.. 0.5744048 0.8820467 0.5276108 1.00000000 0.06098661
## page..1.. 0.6851966 0.1087107 0.6199836 0.06098661 1.00000000
# Basic Scatterplot Matrix
pairs(~deg + close + betw + eigen[[1]] + page[[1]],
data=dfu,
main="Simple Scatterplot Matrix")
#assemble dataset
names = V(gd)$name
deg = degree(gd)
st = graph.strength(gd)
betw = betweenness(gd, normalized=F)
eigen <- evcent(gd)
page <- page.rank(gd)
dfd = data.frame(node.name=names, degree=deg, strength=st, betweenness=betw,
eigen = eigen[[1]], page = page[[1]])
head(dfd)
# plot the relationship between degree and strength
plot(strength~degree, data=dfd)
dfd %>% ggplot(aes(x = strength, y = degree)) + geom_point() +
geom_text(label = rownames(dfd),
size = 3, color = "darkblue",
nudge_x = 0.25, nudge_y = 0.25,
check_overlap = T) +
labs(title = "Word Co-occurrences Network",
subtitle = "Strength vs Degree",
caption = "Surah Taa-Haa (Saheeh International)")
Obviously, these are correlated, since strength is simply the weighted version of degree.
How about the relationship between betweenness and strength?
plot(betweenness~strength, data=dfd)
dfd %>% ggplot(aes(x = strength, y = betweenness)) + geom_point(color = "pink") +
geom_text(label = rownames(dfd),
size = 3, color = "darkblue",
nudge_x = 0.25, nudge_y = 0.25,
check_overlap = T) +
labs(title = "Word Co-occurrences Network",
subtitle = "Strength vs betweenness",
caption = "Surah Taa-Haa (Saheeh International)")
These are not well correlated, since they describe something different. Again the common words “say” and “indeed” have a dominant role in Suah Taa-Haa that narrates the true story of Prophet Mose. The common adverb “so” is often used for emphasis to stress some facts and lessons of the story.
Let’s start by getting some basic information for the network, such as the number of nodes and edges. There are a couple of functions to help us extract this information without having to look it up in the “object summary” (e.g., summary(gd)). Using these functions, you can store this information as separate objects, e.g., n for # nodes and m for # edges.
n = vcount(gd)
m = ecount(gd)
For gd the number of nodes n is 41 and the number of edges m is 50. For gu the number of nodes n is 41 and the number of edges m is 50.
The definition of network density is:
density = [# edges that exist] / [# edges that are possible]
In an undirected network with no loops, the number of edges that are possible is exactly the number of dyads that exist in the network. In turn, the number of dyads is n(n−1)/2 where n = number of nodes. With this information, we can calculate the density with the following:
dyads directed = n(n-1) = 41(41-1) = 1640 dyads undirected = n(n-1)/2 = 41(41-1)/2 = 820 density = m/dyads
There is a pre-packaged function for calculating density, edge_density():
edge_density(gd)
## [1] 0.0304878
edge_density(gu)
## [1] 0.06097561
For ‘fully connected’ networks, we can follow edges from any given node to all other nodes in the network. Networks can also be composed of multiple components that are not connected to each other, as is obvious from the plots of our sample word network gd. We can get this information with a simple function.
components(gd)
## $membership
## Mose Lord Allah say indeed enduring day
## 1 1 1 1 1 2 1
## people Pharaoh so cast away come Merciful
## 1 1 1 1 1 1 3
## more enemy fear fire go hasten follow
## 2 1 1 1 1 1 1
## messenger be bring family find do throw
## 1 4 1 1 1 1 1
## when see then most punishment river severe
## 1 1 1 3 2 1 2
## surely therein thus deity guidance here
## 2 1 1 4 1 1
##
## $csize
## [1] 32 5 2 2
##
## $no
## [1] 4
plot(gd)
The output shows the node membership, component sizes, and number of components. The numbers for $no (number of components, 4) and $csize (size for each component) can be confirmed from the simple plot above.
Degree distribution, the statistical distribution of node degrees in a network, is a common and often powerful way to describe a network. We can simply look at the degree distribution as a histogram of degrees:
hist(degree(gd), breaks=5, col="steelblue")
hist(degree(gu), breaks=5, col="royalblue")
However, if we wanted to compare the degree distributions of different networks, it might be more useful to plot the probability densities of each degree: i.e., what proportion of nodes has degree = 1, degree = 2, etc. We can do this by using the function degree.distribution().
pkd = degree.distribution(gd)
plot(pkd, pch=20)
pku = degree.distribution(gu)
plot(pku, pch=20)
Degree and Degree Distribution Matters
The network “path” is typically a shorthand for “geodesic path” or “shortest path” — the fewest number of edges that you would have to go on to get from one node to another.
We can calculate path lengths with or without the edge weights (if using edge weights, you often simply count up the weights as you go along the path). The igraph package includes a convenient function for finding the shortest paths between every dyad in a network. Make sure you specify algorithm = “unweighted”.
This matrix (usually a large matrix, so we will not display the output) gives us the geodesic path length between each pair of nodes in the network. We can describe the network using some characteristics of the paths that exist in that network. This matrix contains a bunch of cells that are “Inf” (i.e., infinity). This is because the network is not connected, and we can’t calculate path lengths between nodes in different components.
How should we measure the average path length & diameter of a network with multiple components? There are two common solutions. First is to ignore pairs of nodes that are in different components and only measure average lengths of the paths that exist. This solution doesn’t really make sense for the diameter since the diameter of an unconnected network should be infinity. The second solution is to measure each component separately. We will do each of these in turn.
Option 1: To calculate the average path length while ignoring pairs of nodes that are in different components, we can first replace the “Inf” with “NA” in the path length matrix. Next, we want just the “upper triangle” or “lower triangle” of this matrix, which lists all the geodesic paths without duplicates.
pathd = distances(gd, algorithm="unweighted")
pathu = distances(gu, algorithm="unweighted")
pathd[pathd=="Inf"]=NA
mean(pathd[upper.tri(pathd)], na.rm=T)
## [1] 2.458661
pathu[pathu=="Inf"]=NA
mean(pathu[upper.tri(pathu)], na.rm=T)
## [1] 2.458661
This is what the canned function mean_distances() does for unconnected networks because we will get the same value:
mean_distance(gd)
## [1] 2.089202
mean_distance(gu)
## [1] 2.458661
Option 2: To calculate the average path lengths and diameter separately for each component, we will first ‘decompose’ the network into a list that contains each component as separate graph objects. We can then use the lapply() function to calculate separate path length matrices, and sapply() function to calculate the mean and max for each matrix.
comps = decompose(gd)
comps # a list object consisting of each component as graph object
## [[1]]
## IGRAPH 0f5a4fa DN-- 32 44 --
## + attr: name (v/c), cooc (e/n)
## + edges from 0f5a4fa (vertex names):
## [1] Mose ->say Lord ->say Allah ->say say ->so
## [5] indeed ->say Lord ->so say ->throw indeed ->Lord
## [9] say ->when Allah ->fear Lord ->Mose day ->say
## [13] people ->say Pharaoh ->say say ->see indeed ->so
## [17] so ->then cast ->enemy away ->indeed come ->Mose
## [21] Mose ->Pharaoh cast ->river enemy ->river enemy ->say
## [25] fear ->say fire ->say go ->say hasten ->say
## [29] follow ->so messenger->so Pharaoh ->so come ->then
## + ... omitted several edges
##
## [[2]]
## IGRAPH 0f5a4fa DN-- 5 4 --
## + attr: name (v/c), cooc (e/n)
## + edges from 0f5a4fa (vertex names):
## [1] enduring->more more ->punishment more ->severe
## [4] more ->surely
##
## [[3]]
## IGRAPH 0f5a4fa DN-- 2 1 --
## + attr: name (v/c), cooc (e/n)
## + edge from 0f5a4fa (vertex names):
## [1] Merciful->most
##
## [[4]]
## IGRAPH 0f5a4fa DN-- 2 1 --
## + attr: name (v/c), cooc (e/n)
## + edge from 0f5a4fa (vertex names):
## [1] be->deity
path.list = lapply(comps, function(x) distances(x, algorithm="unweighted")) #make list object with two path length matrices
avg.paths=sapply(path.list, mean) #average path length of each component
diams=sapply(path.list, max) #diameter of each component
avg.paths
## [1] 2.404297 1.280000 0.500000 0.500000
diams
## [1] 4 2 1 1
Path distribution: A frequency count of the occurrence of each path distance.
# shortest.paths(gu) # Long output
average.path.length(gu)
## [1] 2.458661
path.length.hist(gu)
## $res
## [1] 50 207 219 32
##
## $unconnected
## [1] 312
# $res is the histogram of distances,
# $unconnected is the number of pairs for which the first node is not
# reachable from the second.
# shortest.paths(gd, mode="out") # Long output
# shortest.paths(gd, mode="in") # Long output
average.path.length(gd)
## [1] 2.089202
path.length.hist (gd)
## $res
## [1] 50 106 48 6 3
##
## $unconnected
## [1] 1427
# $res is the histogram of distances,
# $unconnected is the number of pairs for which the first node is not
# reachable from the second.
There are two formal definitions of the Clustering Coefficient (or Transitivity): “global clustering coefficient” and “local clustering coefficient”. They are slightly different, but both deal with the probability of two nodes that are connected to a common node being connected themselves (e.g., the probability of two of your friends knowing each other).
# global clustering: the ratio of the triangles and the connected triples in the graph
g.cluster = transitivity(gd, "global")
l.cluster = transitivity(gd, "local") # local clustering
av.l.cluster = transitivity(gd, "localaverage") # average clustering
g.cluster
## [1] 0.1358491
l.cluster
## [1] 0.33333333 0.66666667 1.00000000 0.06535948 0.14285714 NaN
## [7] NaN NaN 0.66666667 0.21428571 1.00000000 NaN
## [13] 0.00000000 NaN 0.00000000 0.33333333 1.00000000 0.04761905
## [19] NaN NaN NaN NaN NaN NaN
## [25] NaN NaN NaN 1.00000000 NaN NaN
## [31] 0.33333333 NaN NaN 1.00000000 NaN NaN
## [37] NaN NaN NaN NaN NaN
av.l.cluster
## [1] 0.4877159
# undirected
g.cluster = transitivity(gu, "global")
l.cluster = transitivity(gu, "local")
av.l.cluster = transitivity(gu, "localaverage")
g.cluster
## [1] 0.1358491
l.cluster
## [1] 0.33333333 0.66666667 1.00000000 0.06535948 0.14285714 NaN
## [7] NaN NaN 0.66666667 0.21428571 1.00000000 NaN
## [13] 0.00000000 NaN 0.00000000 0.33333333 1.00000000 0.04761905
## [19] NaN NaN NaN NaN NaN NaN
## [25] NaN NaN NaN 1.00000000 NaN NaN
## [31] 0.33333333 NaN NaN 1.00000000 NaN NaN
## [37] NaN NaN NaN NaN NaN
av.l.cluster
## [1] 0.4877159
Networks exhibit community structure, the presence of discrete clusters of nodes that are densely connected, which themselves are only loosely connected to other clusters. These may be clusters of individuals that form social groups. How do we detect the presence of such clusters or communities, and how can we quantify the degree of community structure?
Modularity-based methods of community detection are not fool-proof. There is no perfect approach to community detection. There are several functions available for community detection in igraph and other packages.
Our undirected word co-occurrence network gu appears to have a clear community structure from the earlier plots.
set.seed(7)
l = layout_with_kk(gu)
plot(gu, layout=l, edge.color="cyan3")
Because the community division in this example is clear, we can choose any of the community detection methods in the list above and we are likely to come up with the same answer.
eb = edge.betweenness.community(gu)
eb
## IGRAPH clustering edge betweenness, groups: 7, mod: 0.53
## + groups:
## $`1`
## [1] "Mose" "Lord" "indeed" "Pharaoh" "so" "away"
## [7] "come" "follow" "messenger" "do" "then" "therein"
##
## $`2`
## [1] "Allah" "say" "day" "people" "fear" "go" "hasten" "throw"
## [9] "when" "see" "thus"
##
## $`3`
## [1] "enduring" "more" "punishment" "severe" "surely"
## + ... omitted several groups/vertices
length(eb) #number of communities
## [1] 7
modularity(eb) #modularity
## [1] 0.533
membership(eb) #assignment of nodes to communities
## Mose Lord Allah say indeed enduring day
## 1 1 2 2 1 3 2
## people Pharaoh so cast away come Merciful
## 2 1 1 4 1 1 5
## more enemy fear fire go hasten follow
## 3 4 2 6 2 2 1
## messenger be bring family find do throw
## 1 7 6 6 6 1 2
## when see then most punishment river severe
## 2 2 1 5 3 4 3
## surely therein thus deity guidance here
## 3 1 2 7 6 6
The resulting object is a ‘communities object’, which includes a few pieces of information - the number of communities (groups), the modularity value based on the node assignments, and the membership of nodes to each community. We can call each of these values separately.
We can also use this communities object to show the community structure.
plot(eb, gu, layout=l)
An example with the Louvain method:
el = cluster_louvain(gu)
el
## IGRAPH clustering multi level, groups: 8, mod: 0.53
## + groups:
## $`1`
## [1] "indeed" "away" "do" "therein"
##
## $`2`
## [1] "enduring" "more" "punishment" "severe" "surely"
##
## $`3`
## [1] "Allah" "say" "day" "people" "fear" "go" "hasten" "when"
## [9] "see" "thus"
##
## + ... omitted several groups/vertices
length(el) #number of communities
## [1] 8
modularity(el) #modularity
## [1] 0.5332
membership(el) #assignment of nodes to communities
## Mose Lord Allah say indeed enduring day
## 5 5 3 3 1 2 3
## people Pharaoh so cast away come Merciful
## 3 5 5 7 1 5 6
## more enemy fear fire go hasten follow
## 2 7 3 4 3 3 5
## messenger be bring family find do throw
## 5 8 4 4 4 1 5
## when see then most punishment river severe
## 3 3 5 6 2 7 2
## surely therein thus deity guidance here
## 2 1 3 8 4 4
Plot the network with the communities assigned.
set.seed(2)
plot(el, gu, vertex.label="", edge.color="red")
We can customize it. We use the RColorBrewer package to assign colors.
library(RColorBrewer)
colors= brewer.pal(length(el),'Accent') #make a color palette
V(gu)$color = colors[membership(el)] #assign node color based on the community assignment
set.seed(2)
plot(gu, vertex.label="")
The two different methods yield different results; one with 7 and the other 8 communities (groups).
The goal of clustering (also referred to as “community detection”) is to find cohesive subgroups within a network. We have mentioned earlier that there are various algorithms for graph clustering in igraph. It is important to note that there is no real theoretical basis for what constitutes a cluster in a network. Only the vague “internally dense” versus “externally sparse” argument. As such, there is no clear argument for or against certain algorithms/methods.
No matter which algorithm is chosen, the workflow is always the same.
clu <- cluster_louvain(gu)
#membership vector
mem <- membership(clu)
head(mem)
## Mose Lord Allah say indeed enduring
## 5 5 3 3 1 2
#communities as list
com <- communities(clu)
com[[1]]
## [1] "indeed" "away" "do" "therein"
An example for selected graph clustering algorithms our gu network is shown below.
imc <- cluster_infomap(gu)
lec <- cluster_leading_eigen(gu)
loc <- cluster_louvain(gu)
# sgc <- cluster_spinglass(gu) Cannot work with unconnected graph
wtc <- cluster_walktrap(gu)
scores <- c(infomap = modularity(gu,membership(imc)),
eigen = modularity(gu,membership(lec)),
louvain = modularity(gu,membership(loc)),
walk = modularity(gu,membership(wtc)))
scores
## infomap eigen louvain walk
## 0.4850 0.5332 0.5332 0.5332
For the gu network, the modularity score is around 0.5 despite the different functions. In general, though, it is advisable to use cluster_louvain() since it has the best speed/performance trade-off.
One major pattern common to many social networks (and other types of networks) is homophily or assortment — the tendency for nodes that share a trait to be connected. The assortment coefficient is a commonly used measure of homophily. It is similar to the modularity index used in community detection, but the assortativity coefficient is used when we know a priori the ‘type’ or ‘value’ of nodes. For example, we can use the assortment coefficient to examine whether discrete node types (e.g., gender, ethnicity, etc.) are more or less connected to each other. Assortment coefficient can also be used with “scalar attributes” (i.e. continuously varying traits).
There are at least two easy ways to calculate the assortment coefficient. In the igraph package, there is a function for assortativity(). One benefit to this function is that it can calculate assortment in directed or undirected networks. However, the major downside is that it cannot handle weighted networks.
Let’s use the same example network to demonstrate how to calculate assortativity, and to compare the difference between modularity and assortativity.
Let’s start by assigning each node a value–let’s say nodes vary in size.
set.seed(3)
l = layout_with_kk(gu)
V(gu)$size = 2*degree(gu) #assign sizes to nodes using two normal distributions with different means
plot(gu, layout=l, edge.color="black", repel=T)
assortativity(gu, V(gu)$size, directed=F)
## [1] -0.3318498
This network exhibits negative (low) levels of assortativity by node size.
We can also convert the size variable into a binary (i.e., discrete) trait and calculate the assortment coefficient.
V(gu)$size.discrete = (V(gu)$size > 5) + 0
#shortcut to make the values = 1 if large individual and 0 if small individual, with cutoff at size = 5
plot(gu, layout=l, edge.color="black", repel=T)
assortativity(gu, V(gu)$size.discrete, directed=F)
## [1] -0.1868132
As a comparison, we create a node attribute that varies randomly across all nodes in the network, and then measure the assortativity coefficient based on this trait. We will plot the figure with square nodes, just to make it clear that we are plotting a different trait.
set.seed(3)
V(gu)$random = rnorm(10, mean=20, sd=5) #create a node trait that varies randomly for all nodes
assortativity(gu, V(gu)$random, directed=F)
## [1] -0.1487094
plot(gu, layout=l, edge.color="black", vertex.size=V(gu)$random, vertex.shape="square")
We can see that there is little assortment based on this trait.
Just to be extra clear, this network still exhibits community structure, but the trait we are measuring does not exhibit assortativity.
Earlier we created the wordnetwork using igraph. In this section we will show similar (and different) examples using tidygraph.
Two functions of tidygraph package can be used to create network objects, they are:
A central aspect of tidygraph is to directly manipulate node and edge data from the tbl_graph object by activating nodes or edges. When we first create a tbl_graph object, the nodes will be activated. We can then directly calculate node or edge measures, like centrality, using tidyverse functions.
library(tidygraph)
gt <- as_tbl_graph(wordnetwork)
class(wordnetwork)
## [1] "igraph"
class(gt)
## [1] "tbl_graph" "igraph"
gt
## # A tbl_graph: 41 nodes and 50 edges
## #
## # A directed acyclic simple graph with 4 components
## #
## # Node Data: 41 x 1 (active)
## name
## <chr>
## 1 Mose
## 2 Lord
## 3 Allah
## 4 say
## 5 indeed
## 6 enduring
## # ... with 35 more rows
## #
## # Edge Data: 50 x 3
## from to cooc
## <int> <int> <dbl>
## 1 1 4 16
## 2 2 4 15
## 3 3 4 10
## # ... with 47 more rows
Notice how the igraph wordnetwork is converted into two separate tibbles, Node Data and Edge Data. But both wordnetwork and gt are of the same igraph class.
gt can directly be used with our preferred ggraph package for visualizing networks.
ggraph(gt, layout = 'fr', weights = cooc) +
geom_edge_link() +
geom_node_point()
Now it is much easier to experiment with modifications to node and edge parameters affecting layouts as it is not necessary to modify the underlying graph but only the plotting code, e.g.:
ggraph(gt, layout = 'fr', weights = log(cooc)) +
geom_edge_link(color = "cyan4") +
geom_node_point(color = "gold4", size = 3)
We show below how to collate some of the node related measures. Some of the measures we have shown earlier using igraph. Now it appears in a tidy dataframe or tibble format.
node_measures <- gt %>%
activate(nodes) %>%
mutate(
degree_in = centrality_degree(mode = "in"),
degree_out = centrality_degree(mode = "out"),
degree = degree_in + degree_out,
betweenness = centrality_betweenness(),
closeness = centrality_closeness(normalized = T),
pg_rank = centrality_pagerank(),
eigen = centrality_eigen(),
br_score = node_bridging_score(),
coreness = node_coreness()
) %>% as_tibble()
node_measures
Now we plot the various measures from the resulting node_measures dataframe.
node_measures %>% arrange(desc(degree)) %>%
# First sort by degree. This sorts the dataframe but NOT the factor levels
head(30) %>%
mutate(name=fct_reorder(name, degree, .desc = F)) %>%
# This updates the factor levels. desc = F because of coord_flip() below
ggplot(aes(x = name, y = degree)) +
geom_segment(aes(xend=name, yend=0)) +
geom_point(size=3, color="tomato") +
coord_flip() +
theme_bw() +
xlab("")
# Another code
node_measures %>% arrange(desc(degree)) %>%
# First sort by degree. This sorts the dataframe but NOT the factor levels
head(30) %>%
ggplot(aes(reorder(name, degree, FUN = min), degree)) +
geom_point(size=3, color="salmon") +
labs(x = "Word", y = "Degree") +
coord_flip()
Plot degree degree_in and degree_out together.
NMtall <- node_measures %>% gather(key = center, value = Value, degree_in:degree_out)
NMtall %>% arrange(desc(Value)) %>%
head(50) %>%
mutate(name=fct_reorder(name, Value, .desc = F)) %>%
ggplot(aes(name, Value, color = center)) +
geom_point(size = 2) +
coord_cartesian(ylim = c(0, 12)) +
labs(x = "Word", y = "Measure") +
coord_flip()
Plot degree (degree_in + degree_out) and betweenness together.
NMtall <- node_measures %>% gather(key = center, value = Value, degree:betweenness)
NMtall %>% arrange(desc(Value)) %>%
head(50) %>%
mutate(name=fct_reorder(name, Value, .desc = F)) %>%
ggplot(aes(name, Value, fill = center)) +
geom_col(position = "identity") +
labs(x = "Word", y = "Measure") +
coord_flip()
Plot closeness, pg_rank, eigen, br_score, coreness together.
NMtall <- node_measures %>% gather(key = measure, value = Value, closeness:coreness)
NMtall %>% arrange(desc(Value)) %>%
head(50) %>%
mutate(name=fct_reorder(name, Value, .desc = F)) %>%
ggplot(aes(name, Value, color = measure)) +
geom_point(size = 2) +
coord_cartesian(ylim = c(0, 1)) +
labs(x = "Word", y = "Measure") +
coord_flip()
Despite using coord_cartesian(ylim = c(0, 1)) to scale the Measure coordinate, the values for br_score, and coreness are 0 or very small. Without any doubt “say”, “Lord” together with “Allah” and “Mose” are the most influential and important words in Surah Taa-Haa.
Notice the slight difference in using the network measures with tidygraph. We can easily assemble the required measures for the nodes in a tidy dataframe. tidygraph has many functions that can give us information about nodes. We show examples of some measures that seem to measure slightly different things about the nodes.
The following is an interesting example in true tidyverse fashion that combines some measures that we have not covered.5
gtexample <- gt %>%
mutate(n_rank_trv = node_rank_traveller(),
neighbors = centrality_degree(),
group = group_infomap(),
center = node_is_center(),
dist_to_center = node_distance_to(node_is_center()),
keyplayer = node_is_keyplayer(k = 10)) %>%
activate(edges) %>%
filter(!edge_is_multiple()) %>%
mutate(centrality_e = centrality_edge_betweenness())
We can also convert our active node or edge table back to a tibble:
gtexample %>%
activate(nodes) %>% # %N>%
as.tibble()
gtexample %>%
activate(edges) %>% # %E>%
as.tibble()
We plot the output.
ggraph(gtexample, layout="fr") +
geom_edge_density(aes(fill = cooc)) +
geom_edge_link(aes(width = cooc), alpha = 0.2) +
geom_node_point(aes(color = factor(group)), size = 5) +
geom_node_text(aes(label = name), size = 3, repel = TRUE) +
scale_color_brewer(palette = "Set1") +
theme_graph() +
labs(title = "Surah Taa-Haa Word Co-occurrence Network",
subtitle = "Nodes are colored by group")
For the next plot, we define our own specific colors. The center-most characters are in red and the distance to center is the node size.
got_palette = c("red", "blue", "green", "gold", "coral", "cyan4", "maroon", "deeppink")
ggraph(gtexample, layout="fr") +
geom_edge_density(aes(fill = cooc)) +
geom_edge_link(aes(width = cooc), alpha = 0.2) +
geom_node_point(aes(color = factor(center), size = dist_to_center)) +
geom_node_text(aes(label = name), size = 3, repel = TRUE) +
scale_fill_manual(values = got_palette) +
theme_graph() +
theme(legend.position = "bottom") +
labs(title = "Surah Taa-Haa Word Co-occurrence Network",
subtitle = "Nodes are colored by centeredness")
Clearly, “say” is the keyplayer for the main group.
In this section, we ask some questions about which is the “most important” node. We want to understand important concepts of network centrality and how to calculate those in R.
What is the most important word in this network? What does “most important” mean? It of course depends on the definition and this is where network centrality measures come into play. We will have a look at three of those (there are many more out there…).
Degree centrality Degree centrality tells you the most connected word: it is simply the number of nodes connected to each node. It can denote Popularity [3].
node_measures %>% arrange(desc(degree))
This is often the only measure given to identify “influencers”: how many followers do they have?. So far “say” has the highest, 18 (12 in and 6 out).
Closeness centrality Closeness centrality tells us who can propagate information quickest. One application that comes to mind is identifying so-called superspreaders of infectious diseases, like COVID-19. “say” is no longer the highest.
Betweenness centrality Betweenness centrality tells us who is most important in maintaining connections throughout the network: it is the number of times your node is on the shortest path between any other pair of nodes. It can denote Brokerage and Bridging [3]. “say” is prominent here. As a Surah that narrates the story of Prophet Mose, that is understandable.
Eigenvector Centrality Is a word (person) connected to other “well-connected” words (people)? It can denote Connections [3]. Again, “say” dominates.
Diffusion Centrality Can a given word (person) reach many others within a short number of hops in the network. It can denote Reach [3]
As we have seen, there is more than one definition of “most important”. It will depend on the context (and the available information) which one to choose. Based on the previous plots, without any doubt “say”, “Lord” together with “Allah” and “Mose” are the most influential and important words in Surah Taa-Haa. Indeed, it is about “Allah” narrating the true story of “Mose”.
We’ll do the analysis using tidygraph.
set.seed(123)
network_ego1 <- gt %>%
mutate(community = as.factor(group_walktrap())) %>%
mutate(degree_c = centrality_degree()) %>%
mutate(betweenness_c = centrality_betweenness(directed = F, normalized = T)) %>%
mutate(closeness_c = centrality_closeness(normalized = T)) %>%
mutate(eigen = centrality_eigen(directed = F))
network_ego1
## # A tbl_graph: 41 nodes and 50 edges
## #
## # A directed acyclic simple graph with 4 components
## #
## # Node Data: 41 x 6 (active)
## name community degree_c betweenness_c closeness_c eigen
## <chr> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Mose 1 2 0.00816 0.0302 3.89e- 1
## 2 Lord 2 3 0.00299 0.0311 4.76e- 1
## 3 Allah 3 2 0 0.0302 2.29e- 1
## 4 say 4 6 0.434 0.0286 1.00e+ 0
## 5 indeed 5 4 0.143 0.0331 5.57e- 1
## 6 enduring 6 1 0 0.0270 4.86e-17
## # ... with 35 more rows
## #
## # Edge Data: 50 x 3
## from to cooc
## <int> <int> <dbl>
## 1 1 4 16
## 2 2 4 15
## 3 3 4 10
## # ... with 47 more rows
We can easily convert it to dataframe using as.data.frame(). We need to this to specify who is the key player in our ego network
network_ego_df <- as.data.frame(network_ego1 %>% activate(nodes))
network_ego_df
We have converted the table_graph to a dataframe. The last thing we need to do is to find the top account in each centrality and pull the key player.
Key player is a term for the most influential nodes in the network based on different centrality measures. Each centrality has different uses and interpretations. A node that appears in the top of most centrality measures will be considered as the key player of the whole network.
# take 20 highest users by its centrality
kp_ego <- data.frame(
network_ego_df %>% arrange(-degree_c) %>% select(name) %>% slice(1:20),
network_ego_df %>% arrange(-betweenness_c) %>% select(name) %>% slice(1:20),
network_ego_df %>% arrange(-closeness_c) %>% select(name) %>% slice(1:20),
network_ego_df %>% arrange(-eigen) %>% select(name) %>% slice(1:20)
) %>% setNames(c("degree","betweenness","closeness","eigen"))
kp_ego
Top 10 words based on its centrality From the table above, “say” tops in most centrality measures.
We’ll scale the nodes by degree centrality, and color it by community. We’ll filter by only showing community 1 to 10.
network_ego1 %>%
filter(community %in% 1:10) %>%
top_n(100,degree_c) %>%
mutate(node_size = ifelse(degree_c >= 1,degree_c,0)) %>%
mutate(node_label = ifelse(closeness_c >= 0.001,name,"")) %>%
ggraph(layout = "stress") +
geom_edge_fan(alpha = 0.05) +
geom_node_point(aes(color = as.factor(community), size = 1.5*node_size)) +
geom_node_label(aes(label = node_label),repel = T,
show.legend = F, fontface = "bold", label.size = 0,
segment.color="royalblue", fill = "wheat") +
coord_fixed() +
theme_graph() + theme(legend.position = "none") +
labs(title = "Word Mutual Communities",
subtitle = "Top 10 Communities")
The neighbors of a specific node can be extracted with the ego() function. Below, we are looking for all words that are linked with “say”, directly (order = 1) and indirecly (order > 1)
focusnode <- which(V(gd)$name == "say")
ego(gd,order = 1, nodes = focusnode, mode = "all", mindist = 1)
## [[1]]
## + 18/41 vertices, named, from 0b65863:
## [1] Mose Lord Allah indeed day people Pharaoh so enemy
## [10] fear fire go hasten throw when see then thus
ego(gd,order = 2, nodes = focusnode, mode = "all", mindist = 1)
## [[1]]
## + 31/41 vertices, named, from 0b65863:
## [1] Mose Lord Allah indeed day people Pharaoh
## [8] so enemy fear fire go hasten throw
## [15] when see then thus come away do
## [22] therein follow messenger cast river bring family
## [29] find guidance here
ego(gd,order = 3, nodes = focusnode, mode = "all", mindist = 1)
## [[1]]
## + 31/41 vertices, named, from 0b65863:
## [1] Mose Lord Allah indeed day people Pharaoh
## [8] so enemy fear fire go hasten throw
## [15] when see then thus come away do
## [22] therein follow messenger cast river bring family
## [29] find guidance here
We use this to test small world and 6 degrees. Here “say” reaches every other word after 2 degrees/orders.
We adapt this tutorial based on a good reference.6
We directly jump into some code and work through it one line at a time.
deg = degree(gd)
ggraph(gd,layout = "stress") +
geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
geom_node_point(aes(size = deg), shape = 21) +
geom_node_text(aes(filter = deg >= 3, label = name),
family = "serif", color = "darkblue") +
theme_graph() +
theme(legend.position = "none")
ggraph works with layers. Each layer adds a new feature to the plot and thus builds the figure step-by-step. The following sections work through each of them separately.
ggraph(gd,layout = “stress”)
The first step is to calculate a layout. The layout parameter specifies the algorithm to use. The “stress” layout is part of the graphlayouts package and is always a safe choice since it is deterministic and produces nice layouts for almost any graph. Other algorithms for, e.g., concentric layouts and clustered networks are described further down in this tutorial. Here is a list of layout algorithms of igraph.
c(“layout_with_dh”, “layout_with_drl”, “layout_with_fr”, “layout_with_gem”, “layout_with_graphopt”, “layout_with_kk”, “layout_with_lgl”, “layout_with_mds”, “layout_with_sugiyama”, “layout_as_bipartite”, “layout_as_star”, “layout_as_tree”)
To use them, we just need the last part of the name.
ggraph(gd,layout = “dh”) + …
A good tutorial on ggraph layouts can be found here.7
geom_edge_link0(aes(width = weight), edge_color = “grey66”)
The second layer specifies how to draw the edges. Edges can be drawn in many different ways as the list below shows.
c(“geom_edge_arc”, “geom_edge_arc0”, “geom_edge_arc2”, “geom_edge_density”, “geom_edge_diagonal”, “geom_edge_diagonal0”, “geom_edge_diagonal2”, “geom_edge_elbow”, “geom_edge_elbow0”, “geom_edge_elbow2”, “geom_edge_fan”, “geom_edge_fan0”, “geom_edge_fan2”, “geom_edge_hive”, “geom_edge_hive0”, “geom_edge_hive2”, “geom_edge_link”, “geom_edge_link0”, “geom_edge_link2”, “geom_edge_loop”, “geom_edge_loop0”)
It is good to stick with geom_edge_link0 since it simply draws a straight line between the endpoints. Some tools draw curved edges by default. While this may add some artistic value, it reduces readability. Always go with straight lines! If your network has multiple edges between two nodes, then you can switch to geom_edge_parallel().
What does the “0” stand for? The standard geom_edge_link() draws 100 dots on each edge compared to only two dots (the endpoints) in geom_edge_link0(). This is done to allow, e.g., gradients along the edge. The drawback of using geom_edge_link() is that the time to render the plot increases and so does the size of the file if we export the plot. Typically, we do not need gradients along an edge. Hence, geom_edge_link0() should be our default to draw edges.
Within geom_edge_link0, we can specify the appearance of the edge, either by mapping edge attributes to aesthetics or setting them globally for the graph. Mapping attributes to aesthetics is done within aes(). In the example, we map the edge width to the edge attribute “weight”. ggraph then automatically scales the edge width according to the attribute. We can control this scale. The color of all edges is globally set to “grey66”.
The following aesthetics can be used within geom_edge_link0 either within aes() or globally:
ggraph does not automatically plot arrows if your graph is directed. We need to do this manually using the arrow parameter.
geom_edge_link0(aes(…),…, arrow = arrow(angle = 30, length = unit(0.15, “inches”), ends = “last”, type = “closed”)) The default arrowhead type is “open”, yet “closed” usually has a nicer appearance
A good tutorial on ggraph edges can be found here.8
geom_node_point(aes(size = deg), shape = 21) geom_node_text(aes(filter = deg >= 3, label = name), family = “serif”, color = “darkblue”)
On top of the edge layer, we draw the node layer. Always draw the node layer above the edge layer. Otherwise, edges will be visible on top of nodes. There are slightly less geoms available for nodes.
c(“geom_node_arc_bar”, “geom_node_circle”, “geom_node_label”, “geom_node_point”, “geom_node_text”, “geom_node_tile”, “geom_node_treemap”)
The most important ones here are geom_node_point() to draw nodes as simple geometric objects (circles, squares,…) and geom_node_text() to add node labels. You can also use geom_node_label(), but this draws labels within a box.
The mapping of node attributes to aesthetics is similar to edge attributes. In the example, we map the degree of the nodes to the attribute “size”. The shape of the node is globally set to 21.
The figure below shows all possible shapes that can be used for the nodes.
“21” draws a border around the nodes. If you prefer another shape, say “19”, you have to be aware of several things. To change the color of shapes 1-20, you need to use the color parameter. For shapes 21-25 you need to use fill. The color parameter only controls the border for these shapes.
The following aesthetics can be used within geom_node_point() either within aes() or globally:
For geom_node_text(), there are more options available, but the most important are:
Note that we also used a filter within aes() of geom_node_text(). The filter parameter allows you to specify a rule for when to apply the aesthetic mappings. The most frequent use case is for node labels (but can also be used for edges or nodes). In the example, we only display the node label if the size attribute is larger than 2.
A good tutorial on ggraph nodes can be found here.9
theme_graph() + theme(legend.position = “none”)
themes control the overall look of the plot. There are many options within the theme() function. theme_graph() is used to erase all the default ggplot theme (e.g. axis, background, grids, etc.) since they are irrelevant for networks. The only option worthwhile in theme() is legend.position, which we set to “none”, i.e. don’t show the legend.
The code below gives an example for a plot with a legend.
ggraph(gd,layout = "stress") +
geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
geom_node_point(aes(fill = deg), shape = 21, size = deg) +
geom_node_text(aes(label = name, size = 10*deg),
family = "serif", repel = F)+
scale_edge_color_brewer(palette = "Dark2")+
theme_graph() +
theme(legend.position = "bottom")
Concentric circles help to emphasize the position of certain nodes in the network. The graphlayouts package has two functions for concentric layouts, layout_with_focus() and layout_with_centrality().
The first one allows to focus the network on a specific node and arrange all other nodes in concentric circles (depending on the geodesic distance) around it. Below we focus on the character Mose. However, it must be a connected graph. From previous plots, both gd and gu are not fully connected. So we focus on the largest cluster.
cld <- clusters(gd)
jg1 <- induced_subgraph(gd, which(cld$membership == which.max(cld$csize)))
jg1
## IGRAPH 13af31b DN-- 32 44 --
## + attr: name (v/c), cooc (e/n)
## + edges from 13af31b (vertex names):
## [1] Mose ->say Lord ->say Allah ->say say ->so
## [5] indeed ->say Lord ->so say ->throw indeed ->Lord
## [9] say ->when Allah ->fear Lord ->Mose day ->say
## [13] people ->say Pharaoh ->say say ->see indeed ->so
## [17] so ->then cast ->enemy away ->indeed come ->Mose
## [21] Mose ->Pharaoh cast ->river enemy ->river enemy ->say
## [25] fear ->say fire ->say go ->say hasten ->say
## [29] follow ->so messenger->so Pharaoh ->so come ->then
## + ... omitted several edges
V(jg1)
## + 32/32 vertices, named, from 13af31b:
## [1] Mose Lord Allah say indeed day people
## [8] Pharaoh so cast away come enemy fear
## [15] fire go hasten follow messenger bring family
## [22] find do throw when see then river
## [29] therein thus guidance here
jg2 <- simplify(as.undirected(jg1))
jg2
## IGRAPH 13aff47 UN-- 32 44 --
## + attr: name (v/c)
## + edges from 13aff47 (vertex names):
## [1] Mose --Lord Mose --say Mose --Pharaoh Mose --come
## [5] Lord --say Lord --indeed Lord --so Allah --say
## [9] Allah --fear say --indeed say --day say --people
## [13] say --Pharaoh say --so say --enemy say --fear
## [17] say --fire say --go say --hasten say --throw
## [21] say --when say --see say --then say --thus
## [25] indeed --so indeed --away indeed --come indeed --fire
## [29] indeed --do indeed --therein Pharaoh--so so --follow
## + ... omitted several edges
The parameter focus in the first line is used to choose the node id of the focal node (Mose = 1). The function coord_fixed() is used to always keep the aspect ratio at one (i.e. the circles are always displayed as a circle and not an ellipse).
The function draw_circle() can be used to add the circles explicitly.
got_palette = c("red", "blue", "green", "gold", "coral", "cyan4", "maroon", "deeppink")
focusnode <- which(V(jg1)$name == "Mose")
deg = degree(jg1)
ggraph(jg1,layout = "focus", focus = focusnode) +
draw_circle(col = "darkblue", use = "focus", max.circle = 3) +
geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
geom_node_point(aes(size = deg), shape = 19) +
geom_node_text(aes(filter = (name == "Mose"), size = 2*deg, label = name),
family = "serif") +
scale_edge_width_continuous(range = c(0.1, 2.0)) +
scale_size_continuous(range = c(1,5)) +
scale_fill_manual(values = got_palette) +
coord_fixed() +
theme_graph() +
theme(legend.position = "bottom")
Repeat with a change of the focus node and displaying all the words.
deg = degree(jg1)
focusnode <- which(V(jg1)$name == "say")
ggraph(jg1,layout = "focus", focus = focusnode) +
draw_circle(col = "darkred", use = "focus", max.circle = 3) +
geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
geom_node_point(aes(size = deg), shape = 21) +
geom_node_text(aes(label = name, size = 3),
family = "serif", repel = F) +
scale_edge_width_continuous(range = c(0.1, 2.0)) +
scale_size_continuous(range = c(1,10)) +
scale_fill_manual(values = got_palette) +
coord_fixed() +
theme_graph() +
theme(legend.position = "bottom")
layout_with_centrality() works in a similar way. We can specify any centrality index (or numeric vector for that matter), and create a concentric layout where the most central nodes are put in the center and the most peripheral nodes in the biggest circle. The numeric attribute used for the layout is specified with the cent parameter. Here, we use the weighted degree of the characters.
ggraph(jg1,layout = "centrality", cent = graph.strength(jg1)) +
geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
geom_node_point(aes(color = deg, size = deg), shape = 20) +
geom_node_text(aes(label = name, size = 3*deg),
family = "serif", repel = TRUE) +
scale_edge_width_continuous(range = c(0,2)) +
scale_size_continuous(range = c(1,7)) +
coord_fixed() +
theme_graph(title_family = "Arial", title_size = 16) +
labs(title = "Surah Taa-Haa Word Co-Occurrence Network",
subtitle = "Weighted Degree Centrality Layout") +
theme(legend.position = "bottom")
We repeat with betweenness centrality.
ggraph(jg1,layout = "centrality",
cent = betweenness(jg1, directed = F, normalized = T)) +
geom_edge_link0(aes(edge_width = cooc), edge_color = "grey66") +
geom_node_point(aes(size = 2*deg), shape = 21) +
geom_node_text(aes(label = name, size = 5*deg),
family = "serif", repel = T) +
scale_edge_width_continuous(range = c(0,2)) +
scale_size_continuous(range = c(1,7)) +
coord_fixed() +
theme_graph(title_family = "Arial", title_size = 16) +
labs(title = "Surah Taa-Haa Word Co-Occurrence Network",
subtitle = "Betweenness Centrality Layout with Degree for Size") +
theme(legend.position = "bottom")
Focus again on gd and gu. Some clustering functions do not work on directed graphs. We show two different examples here.
cld <- clusters(gd)
V(gd)$clu <- as.character(cld$membership)
V(gd)$size <- graph.strength(gd)
gd
## IGRAPH 0b65863 DN-- 41 50 --
## + attr: name (v/c), clu (v/c), size (v/n), cooc (e/n)
## + edges from 0b65863 (vertex names):
## [1] Mose ->say Lord ->say Allah ->say
## [4] say ->so indeed ->say Lord ->so
## [7] say ->throw indeed ->Lord say ->when
## [10] enduring->more Allah ->fear Lord ->Mose
## [13] day ->say people ->say Pharaoh ->say
## [16] say ->see indeed ->so so ->then
## [19] cast ->enemy away ->indeed come ->Mose
## [22] Merciful->most Mose ->Pharaoh more ->punishment
## + ... omitted several edges
ggraph(gd,layout = "stress") +
geom_edge_link0(aes(width=cooc), edge_color="grey66") +
geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
geom_node_text(aes(size=2.5, label=name), family = "serif", repel=T) +
scale_edge_width_continuous(range=c(0.1, 2.0)) +
scale_size_continuous(range=c(1,10)) +
scale_fill_manual(values=got_palette) +
theme_graph(title_size = 16, subtitle_size = 14) +
labs(title = "Surah Taa-Haa Word Network",
subtitle = "Directed With Clusters")+
theme(legend.position = "bottom")
Repeat with undirected graph gu and cluster_louvain which does not work with directed graphs. gu does not have edge properties so we remove the aes(width=cooc)
clu <- cluster_louvain(gu)
V(gu)$clu <- as.character(clu$membership)
V(gu)$size <- graph.strength(gu)
gu
## IGRAPH 0bec5fe UN-- 41 50 --
## + attr: name (v/c), color (v/c), size (v/n), size.discrete (v/n),
## | random (v/n), clu (v/c)
## + edges from 0bec5fe (vertex names):
## [1] Mose --Lord Mose --say Mose --Pharaoh Mose --come
## [5] Lord --say Lord --indeed Lord --so Allah --say
## [9] Allah --fear say --indeed say --day say --people
## [13] say --Pharaoh say --so say --enemy say --fear
## [17] say --fire say --go say --hasten say --throw
## [21] say --when say --see say --then say --thus
## [25] indeed--so indeed--away indeed--come indeed--fire
## + ... omitted several edges
ggraph(gu,layout = "stress") +
geom_edge_link0(edge_color="grey66") +
geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
geom_node_text(aes(size=2.5,label=name), family="serif", repel=T) +
scale_edge_width_continuous(range=c(0.1, 2.0)) +
scale_size_continuous(range=c(1,10)) +
scale_fill_manual(values=got_palette) +
theme_graph(title_size = 16, subtitle_size = 14) +
labs(title = "Surah Taa-Haa Word Network",
subtitle = "Undirected With Clusters") +
theme(legend.position = "bottom")
Interestingly, the cluster functions give 4 for gd and 8 for gu.
Earlier we have shown how layout_with_focus() allows us to focus the network on a specific word and order all other nodes in concentric circles (depending on distance) around it. Here we combine it with clustering. The limitation is that it can only work with a fully connected network.
clu <- clusters(jg1)
V(jg1)$clu <- as.character(clu$membership)
V(jg1)$size <- graph.strength(jg1)
jg1
## IGRAPH 13af31b DN-- 32 44 --
## + attr: name (v/c), clu (v/c), size (v/n), cooc (e/n)
## + edges from 13af31b (vertex names):
## [1] Mose ->say Lord ->say Allah ->say say ->so
## [5] indeed ->say Lord ->so say ->throw indeed ->Lord
## [9] say ->when Allah ->fear Lord ->Mose day ->say
## [13] people ->say Pharaoh ->say say ->see indeed ->so
## [17] so ->then cast ->enemy away ->indeed come ->Mose
## [21] Mose ->Pharaoh cast ->river enemy ->river enemy ->say
## [25] fear ->say fire ->say go ->say hasten ->say
## [29] follow ->so messenger->so Pharaoh ->so come ->then
## + ... omitted several edges
focusnode <- which(V(jg1)$name == "Mose")
ggraph(jg1, "focus", focus=focusnode) +
draw_circle(col = "darkblue", use = "focus", max.circle = 3) +
geom_edge_link0(aes(width=cooc), edge_color="grey66") +
geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
geom_node_text(aes(size=3,label=name), repel=T) +
scale_edge_width_continuous(range=c(0.1,2.0)) +
scale_size_continuous(range=c(1,10)) +
scale_fill_manual(values=got_palette) +
theme_graph(title_size = 16, subtitle_size = 14) +
labs(title = "Surah Taa-Haa Directed Word Network",
subtitle = "Focus Layout With All Words") +
theme(legend.position = "bottom")
ggraph(jg1, "focus", focus=focusnode) +
draw_circle(col = "darkblue", use = "focus", max.circle = 3) +
geom_edge_link0(aes(width=cooc), edge_color="grey66") +
geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
geom_node_text(aes(filter=(name=="Mose"), size=size,label=name), repel=T) +
scale_edge_width_continuous(range=c(0.1,2.0)) +
scale_size_continuous(range=c(1,10)) +
scale_fill_manual(values=got_palette) +
theme_graph(title_size = 16, subtitle_size = 14) +
labs(title = "Surah Taa-Haa Directed Word Network",
subtitle = "Focus Layout With Only The Focus Word") +
theme(legend.position = "bottom")
Based on a similar principle is layout_with_centrality(). We have shown this earlier. But here we repeat with clustering (using gu) and also look at the coreness centrality measure. Earlier we have also seen that cluster_louvain() gives different results than clusters().
clu <- cluster_louvain(jg2)
V(jg2)$clu <- as.character(clu$membership)
V(jg2)$size <- graph.strength(jg2)
jg2
## IGRAPH 13aff47 UN-- 32 44 --
## + attr: name (v/c), clu (v/c), size (v/n)
## + edges from 13aff47 (vertex names):
## [1] Mose --Lord Mose --say Mose --Pharaoh Mose --come
## [5] Lord --say Lord --indeed Lord --so Allah --say
## [9] Allah --fear say --indeed say --day say --people
## [13] say --Pharaoh say --so say --enemy say --fear
## [17] say --fire say --go say --hasten say --throw
## [21] say --when say --see say --then say --thus
## [25] indeed --so indeed --away indeed --come indeed --fire
## [29] indeed --do indeed --therein Pharaoh--so so --follow
## + ... omitted several edges
ggraph(jg2, "centrality", cent=graph.strength(jg2)) +
draw_circle(col = "darkblue", use = "focus", max.circle = 3) +
geom_edge_link0(edge_color="grey66") +
geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
geom_node_text(aes(size=2.5,label=name), repel=T) +
scale_edge_width_continuous(range=c(0.1,2.0)) +
scale_size_continuous(range=c(1,10)) +
scale_fill_manual(values=got_palette) +
theme_graph(title_size = 16, subtitle_size = 14) +
labs(title = "Surah Taa-Haa Undirected Word Network",
subtitle = "Centrality Layout : graph.strength with clusters") +
theme(legend.position = "bottom")
ggraph(jg2, "centrality", cent=graph.coreness(jg2)) +
draw_circle(col = "darkblue", use = "focus", max.circle = 3) +
geom_edge_link0(edge_color="grey66") +
geom_node_point(aes(fill=clu, size=size), shape=21, col="grey25") +
geom_node_text(aes(size=2.5,label=name), repel=T) +
scale_edge_width_continuous(range=c(0.1,2.0)) +
scale_size_continuous(range=c(1,10)) +
scale_fill_manual(values=got_palette) +
theme_graph(title_size = 16, subtitle_size = 14) +
labs(title = "Surah Taa-Haa Undirected Word Network",
subtitle = "Centrality Layout : graph.coreness with clusters") +
theme(legend.position = "bottom")
We covered the characteristics of networks using our example of the word co-occurrence network from Surah Taa-Haa. We showed how to use the functions from igraph and tidygraph that measure these characteristics. We also showed different ways to use ggraph and its layout formats to visualize the network and its related measures.
The main objective is to understand the position and/or importance of a node in the network. The individual characteristics of nodes can be described by
The centrality of a node reflects its influence, power, and importance. There are four different types of centrality measures.
Simplifying the Complexity
We ended the tutorial with examples of using ggraph with the stress, layout_with_focus() and layout_with_centrality() from the graphlayouts package. These examples will be very useful in our future work.
In concluding, we refer to some interesting points about areas of research in networks.10 Why Study Networks?
Primary Questions:
What are important areas for future research? Three Areas for Research
The points in bold and italics are what we can relate to our work on #qurananalytics.
We run selected network measures available in graph_measures: Graph measurements *
wordnetwork %>% edge_connectivity()
## [1] 0
# graph_adhesion() Gives the minimum edge connectivity. Wraps igraph::edge_connectivity()
# wordnetwork %>% assortativity(1) # graph_assortativity() Measures the propensity of similar nodes to be connected. Wraps igraph::assortativity()
wordnetwork %>% automorphisms()
## $nof_nodes
## [1] 33
##
## $nof_leaf_nodes
## [1] 5
##
## $nof_bad_nodes
## [1] 0
##
## $nof_canupdates
## [1] 1
##
## $max_level
## [1] 13
##
## $group_size
## [1] "82944"
# graph_automorphisms: Calculate the number of automorphisms of the graph. Wraps igraph::automorphisms()
wordnetwork %>% clique_num()
## [1] 4
# graph_clique_num: Get the size of the largest clique. Wraps igraph::clique_num()
wordnetwork %>% count_max_cliques()
## [1] 36
# graph_clique_count: Get the number of maximal cliques in the graph. Wraps igraph::count_max_cliques()
wordnetwork %>% count_components()
## [1] 4
# graph_component_count: Count the number of unconnected componenets in the graph. Wraps igraph::count_components()
wordnetwork %>% count_motifs()
## [1] 241
# graph_motif_count: Count the number of motifs in a graph. Wraps igraph::count_motifs()
wordnetwork %>% diameter()
## [1] 5
# graph_diameter: Measures the length of the longest geodesic. Wraps igraph::diameter()
wordnetwork %>% girth()
## $girth
## [1] 3
##
## $circle
## + 3/41 vertices, named, from 0b65863:
## [1] Lord Mose say
# graph_girth: Measrues the length of the shortest circle in the graph. Wraps igraph::girth()
wordnetwork %>% radius() # graph_radius: Measures the smallest eccentricity in the graph. Wraps igraph::radius()
## [1] 1
wordnetwork %>% dyad_census()
## $mut
## [1] 0
##
## $asym
## [1] 50
##
## $null
## [1] 770
# graph_mutual_count: Counts the number of mutually connected nodes. Wraps igraph::dyad_census()
wordnetwork %>% dyad_census()
## $mut
## [1] 0
##
## $asym
## [1] 50
##
## $null
## [1] 770
# graph_asym_count: Counts the number of asymmetrically connected nodes. Wraps igraph::dyad_census()
wordnetwork %>% dyad_census()
## $mut
## [1] 0
##
## $asym
## [1] 50
##
## $null
## [1] 770
# graph_unconn_count: Counts the number of unconnected node pairs. Wraps igraph::dyad_census()
wordnetwork %>% gsize()
## [1] 50
# graph_size: Counts the number of edges in the graph. Wraps igraph::gsize()
wordnetwork %>% gorder()
## [1] 41
# graph_order: Counts the number of nodes in the graph. Wraps igraph::gorder()
wordnetwork %>% reciprocity()
## [1] 0
# graph_reciprocity: Measures the proportion of mutual connections in the graph. Wraps igraph::reciprocity()
wordnetwork %>% min_cut()
## [1] 0
# graph_min_cut: Calculates the minimum number of edges to remove in order to split the graph into two clusters. Wraps igraph::min_cut()
wordnetwork %>% mean_distance()
## [1] 2.089202
# graph_mean_dist: Calculates the mean distance between all node pairs in the graph. Wraps igraph::mean_distance()
# wordnetwork %>% graph_modularity: Calculates the modularity of the graph contingent on a provided node grouping
We run selected network measures available in centrality: Calculate node and edge centrality.
The centrality of a node measures the importance of node in the network. As the concept of importance is ill-defined and dependent on the network and the questions under consideration, many centrality measures exist. tidygraph provides a consistent set of wrappers for all the centrality measures implemented in igraph for use inside dplyr::mutate() and other relevant verbs. All functions provided by tidygraph have a consistent naming scheme and automatically calls the function on the graph, returning a vector with measures ready to be added to the node data. Further tidygraph provides access to the netrankr engine for centrality calculations and define a number of centrality measures based on that, as well as provide a manual mode for specifying more-or-less any centrality score.
Same comment. tidygraph example later. Also includes
wordnetwork %>% alpha_centrality() # centrality_alpha(): Wrapper for igraph::alpha_centrality()
## Mose Lord Allah say indeed enduring day
## 11 9 1 54 8 1 1
## people Pharaoh so cast away come Merciful
## 1 12 86 1 1 1 1
## more enemy fear fire go hasten follow
## 2 2 2 4 1 1 1
## messenger be bring family find do throw
## 1 1 1 1 1 1 141
## when see then most punishment river severe
## 55 55 142 2 3 4 3
## surely therein thus deity guidance here
## 3 9 55 2 5 5
wordnetwork %>% authority_score() # centrality_authority: Wrapper for igraph::authority_score()
## $vector
## Mose Lord Allah say indeed enduring
## 1.363766e-01 1.178793e-01 3.188827e-17 1.000000e+00 1.374766e-01 1.594413e-17
## day people Pharaoh so cast away
## 1.594413e-17 1.594413e-17 7.538405e-02 4.458380e-01 3.188827e-17 1.594413e-17
## come Merciful more enemy fear fire
## 0.000000e+00 1.594413e-17 0.000000e+00 6.184079e-03 7.538405e-02 6.377654e-17
## go hasten follow messenger be bring
## 1.594413e-17 1.594413e-17 1.594413e-17 1.594413e-17 1.594413e-17 1.594413e-17
## family find do throw when see
## 1.594413e-17 1.594413e-17 1.594413e-17 6.393580e-02 5.318703e-02 5.318703e-02
## then most punishment river severe surely
## 8.939971e-02 0.000000e+00 0.000000e+00 8.203431e-02 0.000000e+00 0.000000e+00
## therein thus deity guidance here
## 1.178793e-01 5.318703e-02 0.000000e+00 9.273861e-02 9.273861e-02
##
## $value
## [1] 14.26541
##
## $options
## $options$bmat
## [1] "I"
##
## $options$n
## [1] 41
##
## $options$which
## [1] "LM"
##
## $options$nev
## [1] 1
##
## $options$tol
## [1] 0
##
## $options$ncv
## [1] 0
##
## $options$ldv
## [1] 0
##
## $options$ishift
## [1] 1
##
## $options$maxiter
## [1] 1000
##
## $options$nb
## [1] 1
##
## $options$mode
## [1] 1
##
## $options$start
## [1] 1
##
## $options$sigma
## [1] 0
##
## $options$sigmai
## [1] 0
##
## $options$info
## [1] 0
##
## $options$iter
## [1] 1
##
## $options$nconv
## [1] 1
##
## $options$numop
## [1] 20
##
## $options$numopb
## [1] 0
##
## $options$numreo
## [1] 18
wordnetwork %>% estimate_betweenness(cutoff = NULL) # centrality_betweenness: Wrapper for igraph::betweenness() and igraph::estimate_betweenness()
## Mose Lord Allah say indeed enduring day
## 0 0 0 0 0 0 0
## people Pharaoh so cast away come Merciful
## 0 0 0 0 0 0 0
## more enemy fear fire go hasten follow
## 0 0 0 0 0 0 0
## messenger be bring family find do throw
## 0 0 0 0 0 0 0
## when see then most punishment river severe
## 0 0 0 0 0 0 0
## surely therein thus deity guidance here
## 0 0 0 0 0 0
wordnetwork %>% power_centrality() # centrality_power: Wrapper for igraph::power_centrality()
## Mose Lord Allah say indeed enduring day
## 0.79871626 1.27068496 0.68980041 0.29044228 1.77895894 0.14522114 0.32674756
## people Pharaoh so cast away come Merciful
## 0.32674756 0.43566341 0.07261057 0.43566341 1.81526423 2.68659105 0.03630528
## more enemy fear fire go hasten follow
## 0.10891585 0.36305285 0.32674756 2.21462236 0.32674756 0.32674756 0.10891585
## messenger be bring family find do throw
## 0.10891585 0.03630528 2.25092764 2.25092764 2.25092764 1.81526423 0.00000000
## when see then most punishment river severe
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## surely therein thus deity guidance here
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
wordnetwork %>% estimate_closeness(cutoff = NULL) # centrality_closeness: Wrapper for igraph::closeness() and igraph::estimate_closeness()
## Mose Lord Allah say indeed enduring
## 0.0006410256 0.0006578947 0.0006410256 0.0007142857 0.0006756757 0.0006250000
## day people Pharaoh so cast away
## 0.0006250000 0.0006250000 0.0006410256 0.0006410256 0.0006410256 0.0006250000
## come Merciful more enemy fear fire
## 0.0006578947 0.0006250000 0.0006578947 0.0006410256 0.0006250000 0.0006756757
## go hasten follow messenger be bring
## 0.0006250000 0.0006250000 0.0006250000 0.0006250000 0.0006250000 0.0006250000
## family find do throw when see
## 0.0006250000 0.0006250000 0.0006250000 0.0006097561 0.0006097561 0.0006097561
## then most punishment river severe surely
## 0.0006097561 0.0006097561 0.0006097561 0.0006097561 0.0006097561 0.0006097561
## therein thus deity guidance here
## 0.0006097561 0.0006097561 0.0006097561 0.0006097561 0.0006097561
wordnetwork %>% eigen_centrality() # centrality_eigen: Wrapper for igraph::eigen_centrality()
## $vector
## Mose Lord Allah say indeed enduring day
## 0.38927855 0.47638549 0.22895116 1.00000000 0.55703876 0.00000000 0.18629801
## people Pharaoh so cast away come Merciful
## 0.18629801 0.37261027 0.61079790 0.04663107 0.10377522 0.24055174 0.00000000
## more enemy fear fire go hasten follow
## 0.00000000 0.20367257 0.22895116 0.35098055 0.18629801 0.18629801 0.11379044
## messenger be bring family find do throw
## 0.11379044 0.00000000 0.06538698 0.06538698 0.06538698 0.10377522 0.30008845
## when see then most punishment river severe
## 0.18629801 0.18629801 0.34490276 0.00000000 0.00000000 0.04663107 0.00000000
## surely therein thus deity guidance here
## 0.00000000 0.10377522 0.18629801 0.00000000 0.06538698 0.06538698
##
## $value
## [1] 5.367744
##
## $options
## $options$bmat
## [1] "I"
##
## $options$n
## [1] 41
##
## $options$which
## [1] "LA"
##
## $options$nev
## [1] 1
##
## $options$tol
## [1] 0
##
## $options$ncv
## [1] 0
##
## $options$ldv
## [1] 0
##
## $options$ishift
## [1] 1
##
## $options$maxiter
## [1] 1000
##
## $options$nb
## [1] 1
##
## $options$mode
## [1] 1
##
## $options$start
## [1] 1
##
## $options$sigma
## [1] 0
##
## $options$sigmai
## [1] 0
##
## $options$info
## [1] 0
##
## $options$iter
## [1] 1
##
## $options$nconv
## [1] 1
##
## $options$numop
## [1] 20
##
## $options$numopb
## [1] 0
##
## $options$numreo
## [1] 13
wordnetwork %>% hub_score() # centrality_hub: Wrapper for igraph::hub_score()
## $vector
## Mose Lord Allah say indeed enduring day
## 0.63950176 0.94090016 0.63950176 0.45119892 1.00000000 0.00000000 0.59467291
## people Pharaoh so cast away come Merciful
## 0.59467291 0.85980069 0.09118447 0.05246109 0.08175364 0.21601669 0.00000000
## more enemy fear fire go hasten follow
## 0.00000000 0.64345649 0.59467291 0.78672482 0.59467291 0.59467291 0.26512778
## messenger be bring family find do throw
## 0.26512778 0.00000000 0.00000000 0.00000000 0.00000000 0.08175364 0.00000000
## when see then most punishment river severe
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## surely therein thus deity guidance here
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
##
## $value
## [1] 14.26541
##
## $options
## $options$bmat
## [1] "I"
##
## $options$n
## [1] 41
##
## $options$which
## [1] "LM"
##
## $options$nev
## [1] 1
##
## $options$tol
## [1] 0
##
## $options$ncv
## [1] 0
##
## $options$ldv
## [1] 0
##
## $options$ishift
## [1] 1
##
## $options$maxiter
## [1] 1000
##
## $options$nb
## [1] 1
##
## $options$mode
## [1] 1
##
## $options$start
## [1] 1
##
## $options$sigma
## [1] 0
##
## $options$sigmai
## [1] 0
##
## $options$info
## [1] 0
##
## $options$iter
## [1] 1
##
## $options$nconv
## [1] 1
##
## $options$numop
## [1] 20
##
## $options$numopb
## [1] 0
##
## $options$numreo
## [1] 16
wordnetwork %>% page_rank() # centrality_pagerank: Wrapper for igraph::page_rank()
## $vector
## Mose Lord Allah say indeed enduring day
## 0.02129679 0.02132716 0.01188631 0.12134718 0.04442754 0.01188631 0.01188631
## people Pharaoh so cast away come Merciful
## 0.01188631 0.02093744 0.07366583 0.01188631 0.01188631 0.01188631 0.01188631
## more enemy fear fire go hasten follow
## 0.02198966 0.01693799 0.01693799 0.04219638 0.01188631 0.01188631 0.01188631
## messenger be bring family find do throw
## 0.01188631 0.01188631 0.01188631 0.01188631 0.01188631 0.01188631 0.06038514
## when see then most punishment river severe
## 0.02907716 0.02907716 0.06375292 0.02198966 0.01811671 0.02413663 0.01811671
## surely therein thus deity guidance here
## 0.01811671 0.02132716 0.02907716 0.02198966 0.02085304 0.02085304
##
## $value
## [1] 1
##
## $options
## NULL
wordnetwork %>% subgraph_centrality() # centrality_subgraph: Wrapper for igraph::subgraph_centrality()
## Mose Lord Allah say indeed enduring day
## 0 0 2 0 0 5 1
## people Pharaoh so cast away come Merciful
## 1 0 0 3 1 1 2
## more enemy fear fire go hasten follow
## 0 0 0 0 1 1 1
## messenger be bring family find do throw
## 1 2 6 6 6 1 0
## when see then most punishment river severe
## 0 0 0 0 0 0 0
## surely therein thus deity guidance here
## 0 0 0 0 0 0
wordnetwork %>% strength() # centrality_degree: Wrapper for igraph::degree() and igraph::strength()
## Mose Lord Allah say indeed enduring day
## 4 4 2 18 8 1 1
## people Pharaoh so cast away come Merciful
## 1 3 8 2 1 3 1
## more enemy fear fire go hasten follow
## 4 3 2 7 1 1 1
## messenger be bring family find do throw
## 1 1 1 1 1 1 2
## when see then most punishment river severe
## 1 1 3 1 1 2 1
## surely therein thus deity guidance here
## 1 1 1 1 1 1
wordnetwork %>% edge_betweenness() # centrality_edge_betweenness: Wrapper for igraph::edge_betweenness()
## [1] 8.833333 5.000000 7.000000 11.500000 17.333333 2.000000 17.166667
## [8] 22.000000 20.000000 4.000000 1.000000 16.000000 7.000000 7.000000
## [15] 5.000000 20.000000 9.333333 5.500000 8.000000 12.000000 4.333333
## [22] 1.000000 10.500000 2.000000 1.000000 1.000000 14.000000 7.000000
## [29] 26.000000 7.000000 7.000000 2.000000 3.000000 3.000000 2.500000
## [36] 2.000000 1.000000 16.500000 8.000000 5.833333 20.000000 1.000000
## [43] 15.000000 15.000000 15.000000 4.000000 4.000000 6.666667 12.000000
## [50] 22.000000
There are others that tidygraph adapts from the netrankr package. It requires the package to be installed. We list it here just to show the variety of centrality measures.
wordnetwork %>% centrality_manual # Manually specify centrality score using the netrankr framework
wordnetwork %>% centrality_closeness_harmonic # centrality based on inverse shortest path
wordnetwork %>% centrality_closeness_residual # centrality based on 2-to-the-power-of negative shortest path
wordnetwork %>% centrality_closeness_generalised # centrality based on alpha-to-the-power-of negative shortest path
wordnetwork %>% centrality_integration # centrality based on 1 - (x - 1)/max(x) transformation of shortest path
wordnetwork %>% centrality_communicability # centrality an exponential transformation of walk counts
wordnetwork %>% centrality_communicability_odd # centrality an exponential transformation of odd walk counts
wordnetwork %>% centrality_communicability_even # centrality an exponential transformation of even walk counts
wordnetwork %>% centrality_subgraph_odd # subgraph centrality based on odd walk counts
wordnetwork %>% centrality_subgraph_even # subgraph centrality based on even walk counts
centrality_katz: centrality based on walks penalizing distant nodes (netrankr)
centrality_betweenness_network: Betweenness centrality based on network flow (netrankr)
centrality_betweenness_current: Betweenness centrality based on current flow (netrankr)
centrality_betweenness_communicability: Betweenness centrality based on communicability (netrankr)
centrality_betweenness_rsp_simple: Betweenness centrality based on simple randomised shortest path dependencies (netrankr)
centrality_betweenness_rsp_net: Betweenness centrality based on net randomised shortest path dependencies (netrankr)
centrality_information: centrality based on inverse sum of resistance distance between nodes (netrankr)
centrality_decay: based on a power transformation of the shortest path (netrankr)
centrality_random_walk: centrality based on the inverse sum of expected random walk length between nodes (netrankr)
centrality_expected: Expected centrality ranking based on exact rank probability (netrankr)
https://dshizuka.github.io/networkanalysis/04_measuring.html↩︎
http://www2.unb.ca/~ddu/6634/Lecture_notes/Lecture_4_centrality_measure.pdf↩︎
http://web.stanford.edu/~jacksonm/Jackson-IntroConcepts.pdf↩︎
https://www.data-imaginist.com/2017/ggraph-introduction-layouts/↩︎
https://www.data-imaginist.com/2017/ggraph-introduction-edges/↩︎
https://www.data-imaginist.com/2017/ggraph-introduction-nodes/↩︎