This post further explores the word cooccurance network on our project on #qurananalytics. It will also serve as a simple tutorial on the igraph, tidygraph and ggraph packages in relation to the work on #qurananalytics.
packages=c('dplyr', 'tidyverse', 'udpipe', 'ggplot2',
'igraph', 'tidygraph', 'ggraph', 'knitr', 'quRan')
for (p in packages){
if (! require (p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
# during first time model download execute the below line too
# udmodel <- udpipe_download_model(language = "english")
setwd("F:/RProjects")
# Load the model
udmodel <- udpipe_load_model(file = 'english-ewt-ud-2.5-191206.udpipe')
Let’s start by annotating Surah Yusuf. The annotated data.frame can next be used for basic text analytics.
# Select the surah
Q01 <- quran_en_sahih %>% filter(surah == 12)
x <- udpipe_annotate(udmodel, x = Q01$text, doc_id = Q01$ayah_title)
x <- as.data.frame(x)
The resulting data.frame has a field called upos which is the Universal Parts of Speech tag and also a field called lemma which is the root form of each token in the text. These 2 fields give us a broad range of analytical possibilities.
Analyzing single words is a good start. Multi-word expressions should be more interesting. We can get multi-word expressions by looking either at collocations (words following one another), at word co-occurrences within each sentence or at word co-occurrences of words which are close in the neighbourhood of one another.
Co-occurrences allow to see how words are used either in the same sentence or next to each other. The udpipe package makes creating co-occurrence graphs using the relevant POS tags easy.
We look how many times nouns, proper nouns, adjectives, verbs, adverbs, and numbers are used in the same verse.
cooccur <- cooccurrence(x = subset(x, upos %in% c("NOUN", "PROPN", "VERB",
"ADJ", "ADV", "NUM")),
term = "lemma",
group = c("doc_id", "paragraph_id", "sentence_id"))
head(cooccur)
The result can be easily visualised using the igraph and ggraph R packages.
library(igraph)
library(ggraph)
library(ggplot2)
wordnetwork <- head(cooccur, 100)
wordnetwork <- graph_from_data_frame(wordnetwork)
ggraph(wordnetwork, layout = "fr") +
geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_colour = "#ed9de9") +
geom_node_point(aes(size = igraph::degree(wordnetwork)), shape = 1, color = "black") +
geom_node_text(aes(label = name), col = "darkblue", size = 3) +
labs(title = "Co-occurrences within sentence",
subtitle = "Top 100 Nouns, Names, Adjectives, Verbs, Adverbs",
caption = "Surah Yusuf (Saheeh International)")
The story is revealed by Allah (SWT). The main characters are Joseph, his father, his brothers, the king, and the wife of the minister (al-’Azeez). So the verb “say” dominates since it is a narrated story. Interesting to see the strong link and occurence of “know” with “Allah”.
This “introductory tutorial” should be useful for those new to #networkscience. We will be using the graph tools frequently in our work on #qurananalytics. Many of the examples are adapted from
library(igraph)
# Create data
set.seed(10)
data <- matrix(sample(0:2, 25, replace=TRUE), nrow=5)
colnames(data) = rownames(data) = LETTERS[1:5]
# build the graph object
g0a <- graph_from_adjacency_matrix(data)
# plot it
plot(g0a)
# data
set.seed(1)
data <- matrix(sample(0:2, 15, replace=TRUE), nrow=3)
colnames(data) <- letters[1:5]
rownames(data) <- LETTERS[1:3]
# create the network object
g0b <- graph_from_incidence_matrix(data)
# plot it
plot(g0b)
# create data:
links <- data.frame(
source=c("A","A", "A", "A", "A","F", "B"),
target=c("B","B", "C", "D", "F","A","E")
)
# create the network object
g0c <- graph_from_data_frame(d=links, directed=F)
# plot it
plot(g0c)
# create data:
g0d <- graph_from_literal( A-B-C-D, E-A-E-A, D-C-A, D-A-D-C )
# plot it
plot(g0d)
g0e <- graph(edges=c(1,2, 2,3, 3,1, 4,2), n=4, directed=T)
class(g0e) #checking the class
## [1] "igraph"
plot(g0e) # A simple plot to get started
# Create data
set.seed(1)
data <- matrix(sample(0:1, 100, replace=TRUE, prob=c(0.8,0.2)), nc=10)
g0f <- graph_from_adjacency_matrix(data , mode='undirected', diag=F )
# Default network
par(mar=c(0,0,0,0))
plot(g0f)
plot(g0f,
vertex.color = rgb(0.8,0.2,0.2,0.9), # Node color
vertex.frame.color = "Forestgreen", # Node border color
vertex.shape=c("circle","square"), # One of “none”, “circle”, “square”, “csquare”, “rectangle” “crectangle”, “vrectangle”, “pie”, “raster”, or “sphere”
vertex.size=c(15:24), # Size of the node (default is 15)
vertex.size2=NA, # The second size of the node (e.g. for a rectangle)
)
plot(g0f,
vertex.label=LETTERS[1:10], # Character vector used to label the nodes
vertex.label.color=c("red","blue"),
vertex.label.family="Times", # Font family of the label (e.g.“Times”, “Helvetica”)
vertex.label.font=c(1,2,3,4), # Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol
vertex.label.cex=c(0.5,1,1.5), # Font size (multiplication factor, device-dependent)
vertex.label.dist=0, # Distance between the label and the vertex
vertex.label.degree=0 , # The position of the label in relation to the vertex (use pi)
)
plot(g0f,
edge.color=rep(c("red","pink"),5), # Edge color
edge.width=seq(1,10), # Edge width, defaults to 1
edge.arrow.size=1, # Arrow size, defaults to 1
edge.arrow.width=1, # Arrow width, defaults to 1
edge.lty=c("solid") # Line type, could be 0 or “blank”, 1 or “solid”, 2 or “dashed”, 3 or “dotted”, 4 or “dotdash”, 5 or “longdash”, 6 or “twodash”
#edge.curved=c(rep(0,5), rep(1,5)) # Edge curvature, range 0-1 (FALSE sets it to 0, TRUE to 0.5)
)
# Count the number of degree for each node:
deg <- degree(g0f, mode="all")
# Plot
plot(g0f, vertex.size=deg*6, vertex.color=rgb(0.1,0.7,0.8,0.5) )
We come back to the graph model of how many times nouns, proper nouns, adjectives, verbs, adverbs, and numbers are used in the same verse in Surah Yusuf (111 verses).
head(cooccur, 50)
wordnetwork <- head(cooccur, 50)
gm <- graph_from_data_frame(wordnetwork)
gm
## IGRAPH 489b8ed DN-- 35 50 --
## + attr: name (v/c), cooc (e/n)
## + edges from 489b8ed (vertex names):
## [1] Joseph ->say indeed ->say say ->when father ->say
## [5] Allah ->say Allah ->know Joseph ->when Allah ->indeed
## [9] father ->indeed brother->say know ->say say ->so
## [13] one ->say indeed ->see so ->when bag ->brother
## [17] brother->Joseph father ->Joseph father ->when eat ->say
## [21] say ->see eat ->seven say ->seven brother->so
## [25] seven ->spike brother->when indeed ->Joseph Allah ->most
## [29] most ->people he ->rely enter ->say give ->say
## + ... omitted several edges
E(gm) # The edges of the "net" object
## + 50/50 edges from 489b8ed (vertex names):
## [1] Joseph ->say indeed ->say say ->when father ->say
## [5] Allah ->say Allah ->know Joseph ->when Allah ->indeed
## [9] father ->indeed brother->say know ->say say ->so
## [13] one ->say indeed ->see so ->when bag ->brother
## [17] brother->Joseph father ->Joseph father ->when eat ->say
## [21] say ->see eat ->seven say ->seven brother->so
## [25] seven ->spike brother->when indeed ->Joseph Allah ->most
## [29] most ->people he ->rely enter ->say give ->say
## [33] seduce ->seek cow ->seven dry ->seven fat ->seven
## [37] grain ->seven green ->seven lean ->seven other ->seven
## + ... omitted several edges
V(gm) # The vertices of the "net" object
## + 35/35 vertices, named, from 489b8ed:
## [1] Joseph indeed say father Allah brother know one so
## [10] bag eat seven most he enter give seduce cow
## [19] dry fat grain green lean other do good when
## [28] see spike people rely seek take knowing Lord
E(gm)$cooc # Edge attribute
## [1] 23 20 19 18 17 12 12 11 11 11 11 10 9 9 9 8 8 8 8 7 7 7 7 7 7
## [26] 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5
V(gm)$name # Vertex attribute
## [1] "Joseph" "indeed" "say" "father" "Allah" "brother" "know"
## [8] "one" "so" "bag" "eat" "seven" "most" "he"
## [15] "enter" "give" "seduce" "cow" "dry" "fat" "grain"
## [22] "green" "lean" "other" "do" "good" "when" "see"
## [29] "spike" "people" "rely" "seek" "take" "knowing" "Lord"
V(gm)[name=="Allah"]
## + 1/35 vertex, named, from 489b8ed:
## [1] Allah
E(gm)[cooc > 10]
## + 11/50 edges from 489b8ed (vertex names):
## [1] Joseph ->say indeed ->say say ->when father ->say
## [5] Allah ->say Allah ->know Joseph ->when Allah ->indeed
## [9] father ->indeed brother->say know ->say
# You can also examine the network matrix directly:
gm[1,]
## Joseph indeed say father Allah brother know one so bag
## 0 0 1 0 0 0 0 0 0 0
## eat seven most he enter give seduce cow dry fat
## 0 0 0 0 0 0 0 0 0 0
## grain green lean other do good when see spike people
## 0 0 0 0 0 0 1 0 0 0
## rely seek take knowing Lord
## 0 0 0 0 0
gm[5,7]
## [1] 1
plot(gm) # not a pretty picture!
plot(gm, edge.arrow.size=.4, edge.curved=.1)
plot(gm, edge.arrow.size=.2, edge.color="red",
vertex.color="yellow", vertex.frame.color="#ffffff",
vertex.label=V(gm)$Label, vertex.label.color="black")
deg <- degree(gm, mode="all")
V(gm)$size <- deg
V(gm)$size
## [1] 6 10 15 5 5 5 2 1 4 1 4 10 2 2 2 1 1 1 1 1 1 1 1 2 1
## [26] 1 5 2 1 1 1 1 1 1 1
plot(gm, vertex.label=V(gm)$Label)
# same command for degree but no label
plot(gm, vertex.size=igraph::degree(gm), vertex.label=NA)
E(gm)$cooc
## [1] 23 20 19 18 17 12 12 11 11 11 11 10 9 9 9 8 8 8 8 7 7 7 7 7 7
## [26] 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5
E(gm)$width
## NULL
E(gm)$width <- E(gm)$cooc
E(gm)$width
## [1] 23 20 19 18 17 12 12 11 11 11 11 10 9 9 9 8 8 8 8 7 7 7 7 7 7
## [26] 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5
plot(gm)
#change arrow size and edge color:
E(gm)$arrow.size <- .2
E(gm)$edge.color <- "gray80"
graph_attr(gm, "layout") <- layout_with_fr
plot(gm)
plot(gm, vertex.shape="none", vertex.label=V(gm)$name,
vertex.label.font=2, vertex.label.color="blue",
vertex.label.cex=.7, edge.color="gray85")
edge.start <- ends(gm, es=E(gm), names=F)[,1]
edge.col <- V(gm)$color[edge.start]
edge.start
## [1] 1 2 3 4 5 5 1 5 4 6 7 3 8 2 9 10 6 4 4 11 3 11 3 6 12
## [26] 6 2 5 13 14 15 16 17 18 19 20 21 22 23 24 4 3 5 25 11 26 15 2 2 11
edge.col
## NULL
plot(gm, edge.color=edge.col, edge.curved=.1)
E(gm)[cooc <= 5]$color <- "red"
E(gm)[cooc > 5 & cooc <= 10]$color <- "yellow"
E(gm)[cooc > 10]$color <- "green"
plot(gm)
# repeat plot
plot(gm, vertex.size=5,
layout=layout.fruchterman.reingold,
vertex.label=NA)
To adjust the figure you just have to assign a new layout: * layout_with_dh * layout_with_fr * layout_with_kk * layout_with_sugiyama
l<-layout_with_dh(gm)
plot(gm, vertex.color=vertex_attr(gm)$cor,
vertex.size=igraph::degree(gm),
edge.width=(edge_attr(gm)$weight)/100,
edge.color="grey50",
edge.curved=0.5,
layout=l)
par(mfrow=c(2,2), oma=c(0,0,0,0), mar=c(1,1,1,1))#To plot four plots
plot(gm, vertex.color=vertex_attr(gm)$cor,vertex.label=NA,
vertex.size=igraph::degree(gm),
edge.width=(edge_attr(gm)$weight)/100,
edge.color="grey50",
edge.curved=0.3,
layout=layout_in_circle, main="layout_in_circle")
plot(gm, vertex.color=vertex_attr(gm)$cor,
vertex.size=igraph::degree(gm),
edge.width=(edge_attr(gm)$weight)/100,
edge.color="grey50",
#edge.curved=0.3,
layout=layout_as_tree, main="layout_as_tree")
plot(gm, vertex.color=vertex_attr(gm)$cor,
vertex.size=igraph::degree(gm),
edge.width=3*(edge_attr(gm)$weight)/100,
edge.color="grey50",
#edge.curved=0.3,
layout=layout_as_star, main="layout_as_star")
layouts <- grep("^layout_", ls("package:igraph"), value=TRUE)[-1]
layouts <- layouts[!grepl("bipartite|merge|norm|sugiyama|tree", layouts)]
par(mfrow=c(3,3), mar=c(1,1,1,1))
for (layout in layouts) {
print(layout)
l <- do.call(layout, list(gm))
plot(gm, edge.arrow.mode=0, edge.width=1, layout=l, main=layout) }
## [1] "layout_as_star"
## [1] "layout_components"
## [1] "layout_in_circle"
## [1] "layout_nicely"
## [1] "layout_on_grid"
## [1] "layout_on_sphere"
## [1] "layout_randomly"
## [1] "layout_with_dh"
## [1] "layout_with_drl"
## [1] "layout_with_fr"
## [1] "layout_with_gem"
## [1] "layout_with_graphopt"
## [1] "layout_with_kk"
## [1] "layout_with_lgl"
## [1] "layout_with_mds"
l <- layout_with_fr(gm)
plot(gm, layout=l)
l <- layout_with_fr(gm, niter=50)
plot(gm, layout=l)
ws <- c(1, rep(100, ecount(gm)-1))
lw <- layout_with_fr(gm, weights=ws)
plot(gm, layout=lw)
par(mfrow=c(2,2), mar=c(0,0,0,0)) # plot four figures - 2 rows, 2 columns
plot(gm, layout=layout_with_fr)
plot(gm, layout=layout_with_fr)
plot(gm, layout=l)
plot(gm, layout=l)
l <- layout_with_fr(gm)
l <- norm_coords(l, ymin=-1, ymax=1, xmin=-1, xmax=1)
# dev.off()
par(mfrow=c(2,2), mar=c(0,0,0,0))
plot(gm, rescale=F, layout=l*0.4)
plot(gm, rescale=F, layout=l*0.6)
plot(gm, rescale=F, layout=l*0.8)
plot(gm, rescale=F, layout=l*1.0)
# dev.off()
l <- layout_with_kk(gm)
plot(gm, layout=l)
l <- layout_with_graphopt(gm)
plot(gm, layout=l)
# The charge parameter below changes node repulsion:
l1 <- layout_with_graphopt(gm, charge=0.02)
l2 <- layout_with_graphopt(gm, charge=0.00000001)
par(mfrow=c(1,2), mar=c(1,1,1,1))
plot(gm, layout=l1)
plot(gm, layout=l2)
#dev.off()
par(mfrow=c(1,2), mar=c(1,1,1,1))
plot(gm, layout=layout_with_lgl, root = 1)
plot(gm, layout=layout_with_lgl, root = 5)
#dev.off()
plot(gm, layout=layout_with_mds)
gm1 <- delete_vertices(gm, "seek")
gm1 <- delete_vertices(gm1, "seduce")
gmc <- cluster_spinglass(gm1)
str(gmc) #evaluating output
## List of 4
## $ membership : chr [1:11] "Joseph" "say" "father" "brother" ...
## $ csize : chr [1:6] "indeed" "do" "good" "see" ...
## $ modularity : chr [1:10] "eat" "seven" "cow" "dry" ...
## $ temperature: chr [1:6] "Allah" "know" "most" "he" ...
## - attr(*, "class")= chr "communities"
gmc$membership #checking to which module each node belongs to
## [1] 1 2 1 1 4 1 4 1 1 1 3 3 4 4 1 1 3 3 3 3 3 3 3 2 2 1 2 3 4 4 1 2 2
plot(gm1, vertex.color=vertex_attr(gm1)$cor,
vertex.size = 3*degree(gm1),
edge.width = edge_attr(gm1)$coor,
edge.color = "grey50",
edge.curved = 0.3,
layout = layout_with_kk,
mark.groups = gmc,
mark.border=NA)
short.path <- shortest_paths(gm, from = "Joseph", to = "seven", output = "both")
short.path
## $vpath
## $vpath[[1]]
## + 3/35 vertices, named, from 489b8ed:
## [1] Joseph say seven
##
##
## $epath
## $epath[[1]]
## + 2/50 edges from 489b8ed (vertex names):
## [1] Joseph->say say ->seven
##
##
## $predecessors
## NULL
##
## $inbound_edges
## NULL
ecol <- rep("gray80", ecount(gm)) #creating a color vector for all the edges with the color "grey80"; the ecount fucntion tells you how many edges the network has
ecol[unlist(short.path$epath)] <- "red" # coloring in red only the path in which we are interested, that we calculed using the shortest_paths function.
l <- layout_with_kk(gm)
plot(gm, vertex.color=vertex_attr(gm)$cor,
vertex.size=igraph::degree(gm),
edge.width=2,
edge.color=ecol,
layout=l)
library(ggraph)
lay = create_layout(gm, layout = "kk")
ggraph(lay) +
geom_edge_link() +
geom_node_point() +
theme_graph()
# set theme to graph
# set_graph_style()
# unset graph theme
# unset_graph_style()
ggraph(lay) +
geom_edge_link() +
geom_node_point() +
geom_node_text(aes(label = name), repel=TRUE)
ggraph(lay) +
geom_edge_link() +
geom_node_point() +
geom_node_text(aes(label = name), repel=TRUE) +
geom_edge_link(aes(color = cooc))
ggraph(lay) +
geom_node_point() +
geom_node_text(aes(label = name), repel=TRUE) +
geom_node_point(aes(size = igraph::degree(gm)), colour = "#de4e96") +
geom_edge_link(aes(color = cooc))
# plot node degree as node size and alpha
ggraph(lay) +
geom_node_point() +
geom_node_text(aes(label = name), repel=TRUE) +
geom_node_point(aes(size = degree(gm), alpha = degree(gm)), color = "#de4e96") +
geom_edge_link(aes(color = cooc))
ggraph(lay) +
geom_edge_link(aes(alpha = cooc)) +
geom_node_point()
ggraph(gm, layout="kk") +
geom_edge_link() +
ggtitle("Title") # add title to the plot
ggraph(gm, layout="kk") +
geom_edge_fan(color="gray50", width=0.8, alpha=0.5) +
geom_node_point(size=2) +
theme_void()
ggraph(gm, layout = 'linear') +
geom_edge_arc(color = "orange", width=0.7) +
geom_node_point(size=3, color="gray50") +
geom_node_text(aes(label = name), repel=TRUE, size = 3) +
theme_void()
ggraph(gm, layout="kk") +
geom_edge_link(aes(color = cooc)) +
geom_node_point(aes(size = degree(gm))) +
theme_void()
ggraph(gm, layout = 'kk') +
geom_edge_arc(color="gray", curvature=0.3) +
geom_node_point(color="#de4e96", aes(size = degree(gm))) +
geom_node_text(aes(label = name), size=3, color="darkblue", repel=T) +
theme_void()
ggraph(gm) +
geom_edge_link(aes(alpha = stat(index)), show.legend = FALSE) +
geom_node_point() +
geom_node_text(aes(label = name), size=3, color="darkblue", repel=T) +
theme_graph()
ggraph(gm) +
geom_edge_link(aes(alpha = stat(index)), show.legend = FALSE) +
geom_edge_link(aes(color = cooc)) +
geom_node_point() +
geom_node_text(aes(label = name), size=3, color="darkblue", repel=T) +
theme_graph()
ggraph(gm, layout = 'linear') +
geom_edge_arc(aes(color = cooc))
ggraph(gm, layout = 'linear', circular = TRUE) +
geom_edge_arc(aes(color = cooc))
ggraph(gm, layout = 'linear', circular = TRUE) +
geom_edge_arc(aes(color = cooc)) +
geom_node_point() +
geom_node_text(aes(label = name), size=3, color="darkblue", repel=T)
ggraph(gm, layout = 'kk') +
geom_edge_fan(aes(color = cooc, width = cooc))
ggraph(gm, layout = 'kk') +
geom_edge_fan(aes(color = cooc, width = cooc)) +
geom_node_point() +
geom_node_text(aes(label = name), size=3, color="darkred", repel=T)
ggraph(gm, layout = 'kk') +
geom_edge_density(aes(fill = cooc)) +
geom_edge_link(alpha = 0.25, color = "steelblue") +
geom_node_point(color="#de4e96", aes(size = igraph::degree(gm)))
netm <- get.adjacency(gm, attr="cooc", sparse=F)
colnames(netm) <- V(gm)$name
rownames(netm) <- V(gm)$name
palf <- colorRampPalette(c("yellow", "red"))
heatmap(netm[,17:1], Rowv = NA, Colv = NA, col = palf(100),
scale="none", margins=c(10,10) )
deg.dist <- degree_distribution(gm, cumulative=T, mode="all")
deg.dist
## [1] 1.00000000 1.00000000 0.45714286 0.28571429 0.28571429 0.22857143
## [7] 0.11428571 0.08571429 0.08571429 0.08571429 0.08571429 0.02857143
## [13] 0.02857143 0.02857143 0.02857143 0.02857143
plot( x=0:max(degree(gm)), y=1-deg.dist, pch=19, cex=1.2, col="orange",
xlab="Degree", ylab="Cumulative Frequency")
wordnetwork <- head(cooccur, 100)
gm1 <- graph_from_data_frame(wordnetwork)
ggraph(gm1, layout = 'kk') +
geom_edge_density(aes(fill = cooc)) +
geom_edge_link(alpha = 0.7, color = "#57d3e6") +
geom_node_point(aes(size = degree(gm1)), colour = "#a83268") +
geom_node_text(aes(label = name), size = 3, repel=TRUE) +
ggtitle("Network of Top 100 Words")
ggraph(gm1, layout = 'kk') +
geom_edge_density(aes(fill = cooc)) +
geom_edge_link(alpha = 0.7, color = "#57d3e6") +
geom_node_point(aes(size = degree(gm1)),
colour = "#a83268") +
ggtitle("Network of Top 100 Words")
ggraph(gm1, layout = 'kk') +
geom_edge_density(aes(fill = cooc)) +
geom_edge_link(alpha = 0.7, color = "#57d3e6") +
geom_node_point(aes(size = degree(gm1)),
colour = "#a83268", show.legend = FALSE) +
ggtitle("Network of Top 100 Words")
ggraph(gm1, layout = 'linear', circular = TRUE) +
geom_edge_arc(color = "orange", width=0.7) +
geom_node_point(aes(size = degree(gm1)), alpha = igraph::degree(gm1),
colour = "#a83268") +
geom_node_text(aes(label = name), size = 3, repel=TRUE) +
theme_void() +
ggtitle("Network of Top 100 Words In Circular Layout")
ggraph(gm1, layout = 'linear', circular = TRUE) +
geom_edge_arc(color = "#57d3e6", width=0.7) +
geom_node_point(aes(size = degree(gm1)), alpha = igraph::degree(gm1),
colour = "#a83268") +
geom_node_text(aes(label = name), size = 3, repel=TRUE) +
theme_void() +
ggtitle("Network of Top 100 Words In Circular Layout")
wordnetwork <- head(cooccur, 2000)
jg <- graph_from_data_frame(wordnetwork)
ggraph(jg, layout = 'kk') +
geom_edge_density(aes(fill = cooc)) +
geom_edge_link(alpha = 0.7, color = "#57d3e6") +
geom_node_point(aes(size = igraph::degree(jg)),
colour = "#a83268", show.legend = FALSE) +
ggtitle("Data Art of Top 2000 Coccurrences")
The plot shows all the 421 nodes are connected? Perhaps we can do other analysis.
is.connected(jg) # Is it connected?
## [1] FALSE
no.clusters(jg) # How many components?
## [1] 3
table(clusters(jg)$csize) # How big are these?
##
## 2 417
## 2 1
max(degree(jg, mode="in")) # Vertex degree
## [1] 77
max(degree(jg, mode="out"))
## [1] 120
max(degree(jg, mode="all"))
## [1] 126
# In-degree distribution
plot(degree.distribution(jg, mode="in"), log="xy")
# Out-degree distribution
plot(degree.distribution(jg, mode="out"), log="xy")
is.connected(jg) being FALSE says that there are some nodes not connected.
lay = create_layout(jg, layout = "fr")
ggraph(lay) +
geom_edge_link() +
geom_node_point() +
geom_node_text(aes(label = name), size = 3)
The rather ugly plot shows the 4 nodes. Let us delete and repeat the cluster analysis. But instead of using the delete_vertices() function like in an earlier example, we just use the main component. First, find the components and then subset the graph based on those components. In igraph the largest component is not always the first one with id == 1.
cl <- clusters(jg)
jg1 <- induced_subgraph(jg, which(cl$membership == which.max(cl$csize)))
summary(jg1)
## IGRAPH 9126a4e DN-- 417 1998 --
## + attr: name (v/c), cooc (e/n)
Another approach is
jg2 <- induced_subgraph(jg,
V(jg)[components(jg)$membership == which.max(components(jg)$csize)])
jg2
## IGRAPH 912baa5 DN-- 417 1998 --
## + attr: name (v/c), cooc (e/n)
## + edges from 912baa5 (vertex names):
## [1] Joseph ->say indeed ->say say ->when father ->say
## [5] Allah ->say Allah ->know Joseph ->when Allah ->indeed
## [9] father ->indeed brother->say know ->say say ->so
## [13] one ->say indeed ->see so ->when bag ->brother
## [17] brother->Joseph father ->Joseph father ->when eat ->say
## [21] say ->see eat ->seven say ->seven brother->so
## [25] seven ->spike brother->when indeed ->Joseph Allah ->most
## [29] most ->people he ->rely enter ->say give ->say
## + ... omitted several edges
graph.density(jg2) # Density
## [1] 0.01151771
transitivity(jg2) # Transitivity
## [1] 0.2413977
# Transitivity of a random graph of the same size
g <- erdos.renyi.game(vcount(jg2), ecount(jg2), type="gnm")
transitivity(g)
## [1] 0.02121244
# Transitivity of a random graph with the same degrees
g2 <- degree.sequence.game(degree(jg2,mode="all"), method="vl")
transitivity(g2)
## [1] 0.1108868
fc <- fastgreedy.community(simplify(as.undirected(jg2)))
plot(jg2, vertex.color=vertex_attr(jg2)$cor,
vertex.size = 2,
vertex.label=NA,
edge.width = NA,
edge.color = NA,
layout = layout_with_kk,
mark.groups = fc,
mark.border=NA)
igraph documentation lists the following for clusters and communities. Interested readers can explore the different functions following the example above.
I published some work on statistical mechanics using the spin glass model a long time ago, hence the first choice. Let us explore cluster_edge_betweenness.
fc <- cluster_edge_betweenness(simplify(as.undirected(jg2)))
plot(jg2, vertex.color=vertex_attr(jg2)$cor,
vertex.size = 2,
vertex.label=NA,
edge.width = NA,
edge.color = NA,
layout = layout_with_kk,
mark.groups = fc,
mark.border=NA)
I plan to cover more examples on tidygraph in a future post.This example also introduces the use of user built functions in R. We use the gm1 network (with 100 cooccurrences ) * Plot evolution of reach from hub to neighbors + nearest + next nearest
library(tidygraph)
library(ggraph)
tbl_graph <- as_tbl_graph(simplify(as.undirected(gm1)))
hub_id <- which.max(degree(gm1))
tbl_graph
## # A tbl_graph: 57 nodes and 100 edges
## #
## # An undirected simple graph with 2 components
## #
## # Node Data: 57 x 1 (active)
## name
## <chr>
## 1 Joseph
## 2 indeed
## 3 say
## 4 father
## 5 Allah
## 6 brother
## # ... with 51 more rows
## #
## # Edge Data: 100 x 2
## from to
## <int> <int>
## 1 1 2
## 2 1 3
## 3 1 4
## # ... with 97 more rows
hub_id
## say
## 3
tbl_graph <- tbl_graph %>%
activate(nodes) %>%
mutate(
hub_dist = replace_na(bfs_dist(root = hub_id), Inf),
degree = degree(gm1),
cluster = as.factor(group_infomap())
)
tbl_graph
## # A tbl_graph: 57 nodes and 100 edges
## #
## # An undirected simple graph with 2 components
## #
## # Node Data: 57 x 4 (active)
## name hub_dist degree cluster
## <chr> <dbl> <dbl> <fct>
## 1 Joseph 1 9 1
## 2 indeed 1 12 1
## 3 say 0 30 1
## 4 father 1 9 1
## 5 Allah 1 13 3
## 6 brother 1 7 1
## # ... with 51 more rows
## #
## # Edge Data: 100 x 2
## from to
## <int> <int>
## 1 1 2
## 2 1 3
## 3 1 4
## # ... with 97 more rows
basic <- tbl_graph %>%
ggraph(layout = "kk") +
geom_edge_link(width = .1) +
geom_node_point(aes(size = degree, color = degree)) +
coord_fixed() +
guides(size = FALSE)
plot(basic)
cluster <- tbl_graph %>%
ggraph(layout = "kk") +
geom_edge_link(width = .1) +
geom_node_point(aes(size = degree, color = cluster)) +
coord_fixed() +
theme(legend.position = "none")
plot(cluster)
# Define function
reach_graph <- function(n) {
tbl_graph %>%
activate(nodes) %>%
mutate(
reach = n,
reachable = ifelse(hub_dist <= n, "reachable", "non_reachable"),
reachable = ifelse(hub_dist == 0, "Hub", reachable)
)
}
evolving_graph <- bind_graphs(reach_graph(0), reach_graph(1), reach_graph(2), reach_graph(3))
evol <- evolving_graph %>%
ggraph(layout = "kk") +
geom_edge_link(width = .1, alpha = .2) +
geom_node_point(aes(size = degree*0.1, color = reachable)) +
scale_size(range = c(1, 2)) +
scale_color_manual(values = c("Hub" = "blue",
"non_reachable" = "#00BFC4",
"reachable" = "#F8766D","")) +
coord_fixed() +
facet_nodes(~reach, ncol = 2, nrow = 2, labeller = label_both) +
theme(legend.position = "none") +
labs(title = "3 Step Reach From Hub",
subtitle = "Neighbor, Nearest Neighbor, and Next Nearest Neighbor",
caption = "Virus Spread")
plot(evol)
# Plot 2nd degree reach separately
reach_graph(1) %>%
ggraph(layout = "kk") +
geom_edge_link(width = .1, alpha = .2) +
geom_node_point(aes(size = degree, color = reachable)) +
scale_color_manual(values = c("Hub" = "blue",
"non_reachable" = "#00BFC4",
"reachable" = "#F8766D","")) +
coord_fixed() +
theme_graph() +
labs(title = "2 Step Reach From Hub",
subtitle = "Neighbor, Nearest Neighbor",
caption = "Virus Spread")
The big blue blob is the node “say” with the highest degree, as the network hub. It takes 3 steps for it to “reach” the other nodes. The hub is like a super spreader in a virus network. It will never reach the two isolated nodes. (Hence, the isolation of clusters in pandemics.)
tbl_graph <- as_tbl_graph(simplify(as.undirected(jg)))
hub_id <- which.max(degree(jg))
tbl_graph <- tbl_graph %>%
activate(nodes) %>%
mutate(
hub_dist = replace_na(bfs_dist(root = hub_id), Inf),
degree = degree(jg),
cluster = as.factor(group_infomap())
)
# Plot 2nd degree reach separately
reach_graph(1) %>%
ggraph(layout = "kk") +
geom_edge_link(width = .1, alpha = .2) +
geom_node_point(aes(size = degree, color = reachable)) +
scale_color_manual(values = c("Hub" = "blue",
"non_reachable" = "#00BFC4",
"reachable" = "#F8766D","")) +
coord_fixed() +
theme_graph() +
labs(title = "2 Step Reach From Hub",
subtitle = "Neighbor, Nearest Neighbor",
caption = "Virus Spread")
Our work on #qurananalytics rely heavily on the use of #networkscience. In this post we introduce the tools available in R for creating, plotting and analyzing graph networks.
We plan to further explore the tidygraph package and also some of the theoretical analysis on networks like issues of centrality, density etc. We have seen in the final example of concepts like “hub” (most central node) and “spread” for our Surah Yusuf word occurrence network is similar to virus networks.