1 Introduction

Prophet (SAW) said, “The best of you are those who learn the Quran and teach it.” [Bukhari]

We formally started our #qurananalytics project to accompany our #learnbydoing approach to #networkscience by defining some objectives and methods.1 The main trigger was when we came across a well prepared dataset of the Holy Quran which was quite recently published (2019-01-17). The obvious idea came to apply the current tools and methods in text analysis with the Quran datasets. In research, different methods used on the same data often yield different results. So using the new methods and tools of #datascience, #textanalytics, and #networkscience to “learn the Quran” should be interesting.

The learning and encouraging feedback we got from the blog postings led us to explore the application of these tools on the classical works on the interpretation of the Quran. We had some ideas how #networkscience tools can help us visualize the classical method of Tafseer al-Quran bi al-Quran (تفسير القرآن بالقرآن), Interpretation of the Quran with the Quran. It is considered the best method to interpret the verses of the Quran.

This article will focus on providing preliminary findings using network analysis applied to the best of the classical works of Tafseer, Tafseer Ibn Katheer.2 We have also started on applying knowledge graph techniques using the Quran data. These are for future posts.


2 Preliminaries

3 Focus on Selected Quran version and Variables

The quRan packaage has 4 versions of the Quran. It gives the full verses of the Quran, in data frames containing one row per verse.

  1. quran_ar (Quran in Arabic with vowels)
  2. quran_ar_min (Quran in Arabic without vowels)
  3. quran_en_sahih (Quran in English, Saheeh International translation)
  4. quran_en_yusufali (Quran in English, Yusuf Ali translation)

We will use selected variables (columns) from quran_en_sahih.

This covers the 6236 verses of the Quran. The difference between ayah and ayah_id is that the former restarts with a new surah while the latter just continues until the last verse of the Quran, ayah_id = 6236. We name the dataframe kvertices to be the vertices or nodes of the network graph we will build. We expect that not all verses of the Quran are interpreted by this method of (تفسير القرآن بالقرآن).


4 Import and Prepare Data Frame

Download the original dataset3 and place it in the working directory.

Next we create the from and to columns. The to verse(s) are used to explain or interpret (Tafseer) the from verse. This method of Tafseer is called Tafseer al-Quran bi al-Quran (تفسير القرآن بالقرآن), Interpretation of the Quran with the Quran. It is considered the best method to interpret the verses of the Quran. The format of the from and to columns follows exactly the ayah_title column in the kvertices dataframe.

Next we define the edges from the kathir dataframe. We add the SourceSura, SourceAyah, TargetSura, TargetAyah as edge attributes for selection purposes.


5 Create Network Graph

We now create the network graph. Some of the vertices (verses) will have no connections since there is no record in the original work of the verse being explained by another verse.

## IGRAPH 257a368 DNW- 6236 7679 -- 
## + attr: name (v/c), surah_id (v/n), ayah (v/n), ayah_id (v/n),
## | surah_title_en (v/c), revelation_type (v/c), text (v/c), SourceSura
## | (e/n), SourceAyah (e/n), TargetSura (e/n), TargetAyah (e/n),
## | common_roots (e/n), weight (e/n)
## + edges from 257a368 (vertex names):
##  [1] 1:1->1:2    1:1->1:3    1:1->1:4    1:1->3:1    1:1->3:2    1:1->4:78  
##  [7] 1:1->6:3    1:1->6:14   1:1->7:180  1:1->9:128  1:1->11:41  1:1->13:28 
## [13] 1:1->14:1   1:1->14:2   1:1->16:53  1:1->17:110 1:1->18:38  1:1->19:65 
## [19] 1:1->20:5   1:1->23:88  1:1->25:59  1:1->25:60  1:1->33:43  1:1->43:45 
## [25] 1:1->43:84  1:1->55:78  1:1->56:74  1:1->59:22  1:1->59:23  1:1->59:24 
## + ... omitted several edges

We have the correct number of vertices, 6236, that equals the number of verses in quran_en_sahih (Quran in English, Saheeh International translation).

5.1 Description of Network:

Let us start by looking some characteristics of the graph kg. It is a fairly large graph and thus many interesting measures and statistics can be studied. We will only cover the basics here.

A selected verse is interpreted by other verses in the classic of Ibn Katheer. This is laid out in a network as a directed graph, DNW- 6236 7679

Network summary:

  1. There are 2445 Source nodes
  2. There are 3279 Target nodes
  3. There are 7679 directed Edges

5.1.1 General statististics of the properties of the network:

Properties Statistics
Average degree 1.847
Diameter 35
Average path length 12.034
Modularity (resolution 1.0) 115
Node average clustering 0.064

Notes:

  1. On average, a verse is connected to (interpreted by) 1.847 verses
  2. The largest verse-to-verse-to-verse spanning is 35 verses
  3. However, the average path length is 12.034 - which means that on average, a verse is connected to 12 other sequential verses
  4. If we set the resolution to 1 (standard resolution), we can generate 115 sub-networks of verses which are highly connected (communities or cliques)
  5. The verses average clustering is 0.064 (about 143 verses = 0.064*Nodes)

5.1.2 Visuals Using Gephi

As part of the #qurananalytics project we have created a tutorial on network graphing tools4 and another tutorial on network characteristics and statistics5. We will use some of these R based tools here. However, Gephi6 combines these nicely. We plan to do a tutorial using Gephi soon.

  1. Fruchterman-Reingold layout visualization
Ibn Katheer Net Fructherman Reingold

Ibn Katheer Net Fructherman Reingold

Note:

  1. Coloring is based on “modularity class” (there are 115 communities detected based resolution 1.0).
  2. It took verses from many other verses to explain a verse; and the layers of information is extensive (visually speaking).

The largest modularity class, which stood at 5.65% of verses (nodes), is pictured below:

Largest clique based on modularity

Largest clique based on modularity

  1. Add Force Atlas 2 layout to existing graph
Force Atlas after Fruchterman Reingold

Force Atlas after Fruchterman Reingold

Note: Now when we use another layout on top of a layout, and add colorings, we can see groupings and layout from another dimension.

NOTE: We can use many “graph layouts” to extract different dimensions of information. We have tried only two here, and the possibilities are extensive.

5.2 Statistics of Node Attributes

We have computed various attributes for the nodes using Gephi (for faster calculations and visualizations), and saved them as a CSV file.

Now we perform some visualizations using ggplot. The following figure summarizes some of the centrality measures.

The column names for the data.frame for reference are:

##  [1] "Id"                         "Label"                     
##  [3] "timeset"                    "indegree"                  
##  [5] "outdegree"                  "Degree"                    
##  [7] "weighted indegree"          "weighted outdegree"        
##  [9] "Weighted Degree"            "Eccentricity"              
## [11] "closnesscentrality"         "harmonicclosnesscentrality"
## [13] "betweenesscentrality"       "pageranks"                 
## [15] "componentnumber"            "strongcompnum"             
## [17] "eigencentrality"

We can plot and study various properties, example:

This is really cool to visualize. 1_1 is explained by 32 other verses. 2_255 or Ayat Al-Kursi has 15 out and 16 in links. The network graph model is beginning to show some real value.

There are other plot combinations that can show the importance of the nodes (or verses) depending on the context. We have shown some examples here. We have a customized tutorial on the characteristics and statistics of networks based upon Word Co-occurences in Surah Taa-Haa[5].


6 Surah 87 (Al-A’laa)

We now focus on one selected short Surah, Surah 87 (Al-A’laa). The smaller number of verses will make the plots more legible. This Surah has its known virtues.7

We summarize the steps in selecting the nodes and edges for the relevant Surah or groups of verses

Now we build the graph for Surah 87. This only covers the first degree links between the source and target verses.

## IGRAPH 28bfba8 DNW- 20 15 -- 
## + attr: name (v/c), surah_id (v/n), ayah (v/n), ayah_id (v/n),
## | surah_title_en (v/c), revelation_type (v/c), text (v/c), SourceSura
## | (e/n), SourceAyah (e/n), TargetSura (e/n), TargetAyah (e/n),
## | common_roots (e/n), weight (e/n)
## + edges from 28bfba8 (vertex names):
##  [1] 87:1 ->56:74 87:3 ->20:50 87:13->35:36 87:13->43:77 87:18->87:14
##  [6] 87:18->87:15 87:18->87:16 87:18->87:17 87:19->53:36 87:19->53:37
## [11] 87:19->53:38 87:19->53:39 87:19->53:40 87:19->53:41 87:19->53:42

This is a similar plot but with the actual texts of the verses.

This shows the first degree “out” links from the various verses in Surah 87. Ibn Katheer only interpreted some verses in Surah 87 with other verses. Did Ibn Katheer use verses in Surah 87 to interpret verses in other Surahs? These are the first degree “in” links to the various verses in Surah 87. One has to then read the whole work to find out. 4 volumes of the abridged version of Ibn Katheer

This however is easily done with our network graph models kg and kg87. We can also explore in and out links beyond the first degree with our network graph model. We will explore these in the following sections.


7 Neighborhood of Graph Nodes

The neighbors of a specific node can be extracted with the ego() function. Especially in social network analysis (SNA), parts of networks are often analyzed in isolation.8 An ego-network contains a distinguised node (ego), its neighbors and all edges among them, including edges between the neighbors. An ego network is a localized structural summary of the ego node. Ego networks are useful in various settings:

An ego network is embedded into a larger network, although the larger network is sometimes partially or fully unknown. igraph can extract ego networks via the make_ego_graph() function. It extracts one or more ego networks, and note that it always returns a list of graphs, for consistency.

The ego related functions will be fundamental in our approach for creating value using a network graph model to analyze the classical work of Ibn Katheer. As such, we reproduce the help page of the related ego() functions.

Description These functions find the vertices not farther than a given limit from another fixed vertex, these are called the neighborhood of the vertex.

Usage ego_size( graph, order = 1, nodes = V(graph), mode = c(“all”, “out”, “in”), mindist = 0 )

ego( graph, order = 1, nodes = V(graph), mode = c(“all”, “out”, “in”), mindist = 0 )

make_ego_graph( graph, order = 1, nodes = V(graph), mode = c(“all”, “out”, “in”), mindist = 0 )

Arguments

Argument Description
graph Input graph.
order Integer giving the order of the neighborhood.
nodes Vertices for which the calculation is performed.
mindist Minimum distance to include the vertex in the result.
mode=“in” All vertices from which source vertex is reachable
mode=“out” Only the outgoing edges are followed
mode=“all” Ignores the direction of the edges

Details The neighborhood of a given order o of a vertex v includes all vertices which are closer to v than the order.

ego_size calculates the size of the neighborhoods for the given vertices with the given order.

ego calculates the neighborhoods of the given vertices with the given order parameter.

make_ego_graph creates (sub)graphs from all neighborhoods of the given vertices with the given order parameter. This function preserves the vertex, edge and graph attributes.

connect creates a new graph by connecting each vertex to all other vertices in its neighborhood.

Value

Below, we are looking for all verses that “are connected” with verse “87:13”. The

## [[1]]
## + 2/6236 vertices, named, from 257a368:
## [1] 20:74 43:77
## [[1]]
## + 2/6236 vertices, named, from 257a368:
## [1] 35:36 43:77
## [[1]]
## + 3/6236 vertices, named, from 257a368:
## [1] 20:74 35:36 43:77

The original text only shows that verse “87:13” is intepreted by verses [35:36, 43:77]. This is confirmed by the output from mode = “out”. With our network graph model we can show that verse “87:13” is being used to interpret verses [20:74, 43:77]. Without this model, we have to manually go to verses “20:74” and “43:77”. The mode = “in” in the ego function gives this in a simple way. Thus we now have a more wholesome view of the verse “87:13” compared to the manual classical work of Ibn Katheer.

What if we look at the next order? This is very difficult to do manually.

## [[1]]
## + 6/6236 vertices, named, from 257a368:
## [1] 20:74 43:77 7:126 35:36 9:82  35:37
## [[1]]
## + 9/6236 vertices, named, from 257a368:
## [1] 35:36 43:77 17:97 20:74 43:74 43:75 78:30 87:11 87:12
## [[1]]
## + 15/6236 vertices, named, from 257a368:
##  [1] 20:74 35:36 43:77 7:126 87:11 87:12 14:17 16:28 17:97 35:32 43:74 43:75
## [13] 78:30 9:82  35:37

Let us do some simple plots. make_ego_graph will return a list of ego networks. To plot the first one, simply plot(kg8713a[[1]])

## [[1]]
## IGRAPH 29503a7 DNW- 9 14 -- 
## + attr: name (v/c), surah_id (v/n), ayah (v/n), ayah_id (v/n),
## | surah_title_en (v/c), revelation_type (v/c), text (v/c), SourceSura
## | (e/n), SourceAyah (e/n), TargetSura (e/n), TargetAyah (e/n),
## | common_roots (e/n), weight (e/n)
## + edges from 29503a7 (vertex names):
##  [1] 17:97->78:30 20:74->35:36 20:74->43:77 20:74->87:11 20:74->87:12
##  [6] 35:36->17:97 35:36->20:74 35:36->43:74 35:36->43:75 35:36->43:77
## [11] 35:36->78:30 43:77->35:36 43:77->87:11 43:77->87:12

We want to create a graph that starts with a verse (or selected verses) and then look at

  1. first order connections, all verses directly related to it
  2. second order connections, all verses directly related to (1)
  3. third order connections, all verses directly related to (2)

make_ego_graph returns a graph for the neighbourhood for each of the vertices in the list nodes.

## [[1]]
## IGRAPH 295fcad DNW- 4 9 -- 
## + attr: name (v/c), surah_id (v/n), ayah (v/n), ayah_id (v/n),
## | surah_title_en (v/c), revelation_type (v/c), text (v/c), SourceSura
## | (e/n), SourceAyah (e/n), TargetSura (e/n), TargetAyah (e/n),
## | common_roots (e/n), weight (e/n)
## + edges from 295fcad (vertex names):
## [1] 20:74->35:36 20:74->43:77 20:74->87:13 35:36->20:74 35:36->43:77
## [6] 43:77->35:36 43:77->87:13 87:13->35:36 87:13->43:77

## [1] TRUE

Since the graph is small, we will verify these connections.

The other connections should appear in the higher orders (n = 2 or 3). We repeat the code above for order=2 and order=3, and store the results.

## [[1]]
## IGRAPH 29785bf DNW- 16 24 -- 
## + attr: name (v/c), surah_id (v/n), ayah (v/n), ayah_id (v/n),
## | surah_title_en (v/c), revelation_type (v/c), text (v/c), SourceSura
## | (e/n), SourceAyah (e/n), TargetSura (e/n), TargetAyah (e/n),
## | common_roots (e/n), weight (e/n)
## + edges from 29785bf (vertex names):
##  [1] 7:126->20:74 9:82 ->43:77 14:17->35:36 16:28->35:36 17:97->78:30
##  [6] 20:74->35:36 20:74->43:77 20:74->87:11 20:74->87:12 20:74->87:13
## [11] 35:32->35:36 35:36->17:97 35:36->20:74 35:36->43:74 35:36->43:75
## [16] 35:36->43:77 35:36->78:30 35:37->43:77 43:77->35:36 43:77->87:11
## [21] 43:77->87:12 43:77->87:13 87:13->35:36 87:13->43:77

## [1] TRUE
## [[1]]
## IGRAPH 29909ed DNW- 74 106 -- 
## + attr: name (v/c), surah_id (v/n), ayah (v/n), ayah_id (v/n),
## | surah_title_en (v/c), revelation_type (v/c), text (v/c), SourceSura
## | (e/n), SourceAyah (e/n), TargetSura (e/n), TargetAyah (e/n),
## | common_roots (e/n), weight (e/n)
## + edges from 29909ed (vertex names):
##  [1] 2:24  ->17:97  5:66  ->35:32  6:23  ->58:18  7:126 ->20:72  7:126 ->20:73 
##  [6] 7:126 ->20:74  7:126 ->20:75  9:82  ->43:77  10:72 ->7:126  10:72 ->12:101
## [11] 11:21 ->17:97  12:101->7:126  14:17 ->22:21  14:17 ->35:36  14:17 ->37:64 
## [16] 14:17 ->37:65  14:17 ->37:66  14:17 ->37:67  14:17 ->37:68  14:17 ->38:55 
## [21] 14:17 ->38:56  14:17 ->38:57  14:17 ->38:58  14:17 ->41:46  14:17 ->44:43 
## + ... omitted several edges

## [1] TRUE

The basic plots need significant improvement. This is what we will cover in the next section.

7.1 Concentric layouts

Concentric circles help to emphasize the position of certain nodes in the network. The graphlayouts package has two functions for concentric layouts, layout_with_focus() and layout_with_centrality().

The first one allows to focus the network on a specific node and arrange all other nodes in concentric circles (depending on the geodesic distance) around it. Below we focus on the our selected verse “87:13”. However, it must be a connected graph. From previous section, we have confirmed that is_connected(kg8713_3) is TRUE. The details of using these layouts for our #qurananalytics related work are explained in a previous post.

We show two options with coord_fixed() resulting in a circular format and without which giving an elliptical format. In the next three plots we will show the verse labels for each degree.

7.2 Clustering

Clustering is to find cohesive subgroups within a network. It is also referred to as community detection. Various algorithms for graph clustering are implemented in igraph. No matter which algorithm is chosen, the workflow is always the same.

## IGRAPH clustering edge betweenness, groups: 11, mod: 0.61
## + groups:
##   $`1`
##    [1] "2:24"   "11:21"  "17:97"  "20:74"  "20:124" "32:14"  "35:36"  "37:23" 
##    [9] "43:74"  "43:75"  "43:77"  "78:30"  "87:11"  "87:12"  "87:13" 
##   
##   $`2`
##   [1] "5:66"
##   
##   $`3`
##   [1] "6:23"  "16:28" "40:46" "58:18" "75:15"
##   
##   + ... omitted several groups/vertices
##  2:24  5:66  6:23 7:126  9:82 10:72 
##     1     2     3     4     5     6
##  [1] "2:24"   "11:21"  "17:97"  "20:74"  "20:124" "32:14"  "35:36"  "37:23" 
##  [9] "43:74"  "43:75"  "43:77"  "78:30"  "87:11"  "87:12"  "87:13"

7.3 Focus Layout and Clustering

Focus again on kg8713_3. Earlier we have shown how layout_with_focus() allows us to focus the network on a specific word and order all other nodes in concentric circles (depending on distance) around it. Here we combine it with clustering. The limitation is that it can only work with a fully connected network. Some clustering functions do not work on directed graphs.

## $membership
##   2:24   5:66   6:23  7:126   9:82  10:72  11:21 12:101  14:17  14:44  16:28 
##      1      1      1      1      1      1      1      1      1      1      1 
##  17:15  17:97  18:17  18:29  20:72  20:73  20:74  20:75 20:104 20:124  22:21 
##      1      1      1      1      1      1      1      1      1      1      1 
## 23:100  29:13  31:32  32:14  35:32  35:34  35:35  35:36  35:37  37:23  37:64 
##      1      1      1      1      1      1      1      1      1      1      1 
##  37:65  37:66  37:67  37:68  38:55  38:56  38:57  38:58  39:71  40:11  40:12 
##      1      1      1      1      1      1      1      1      1      1      1 
##  40:46  41:46  43:74  43:75  43:77  43:78  44:43  44:44  44:45  44:46  44:47 
##      1      1      1      1      1      1      1      1      1      1      1 
##  44:48  44:49  44:50  53:56  55:43  55:44   56:7  56:41  56:42  56:43  56:44 
##      1      1      1      1      1      1      1      1      1      1      1 
##  58:18   67:8   67:9  75:15  78:30  87:11  87:12  87:13 
##      1      1      1      1      1      1      1      1 
## 
## $csize
## [1] 74
## 
## $no
## [1] 1
## IGRAPH 29930fe DNW- 74 106 -- 
## + attr: surah_id (v/n), ayah (v/n), ayah_id (v/n), surah_title_en
## | (v/c), revelation_type (v/c), text (v/c), name (v/c), clu (v/c), size
## | (v/n), SourceSura (e/n), SourceAyah (e/n), TargetSura (e/n),
## | TargetAyah (e/n), common_roots (e/n), weight (e/n)
## + edges from 29930fe (vertex names):
##  [1] 87:13->43:77 87:13->35:36 78:30->38:58 75:15->58:18 75:15->16:28
##  [6] 75:15->6:23  67:9 ->39:71 67:9 ->17:15 58:18->6:23  56:7 ->56:41
## [11] 56:7 ->35:32 43:77->87:13 43:77->87:12 43:77->87:11 43:77->35:36
## [16] 40:11->35:37 39:71->67:9  39:71->67:8  39:71->17:97 37:68->55:44
## [21] 37:23->17:97 35:37->67:9  35:37->67:8  35:37->53:56 35:37->43:78
## + ... omitted several edges

There is only 1 cluster. Although “87:13” is the ego or center node, other nodes have a larger size [V(kg8713_3)$size <- graph.strength(kg8713_3)], based on the number of other verses directly linked with these verses.

7.3.1 Centrality Layout and Clustering

Based on a similar principle is layout_with_centrality()[5]. We create the undirected ego network kgu. We repeat with clustering (using kgu) and also look at the coreness centrality measure. We expect that cluster_louvain() gives different results than clusters().

## IGRAPH 300acb1 UNW- 74 97 -- 
## + attr: surah_id (v/n), ayah (v/n), ayah_id (v/n), surah_title_en
## | (v/c), revelation_type (v/c), text (v/c), name (v/c), clu (v/c), size
## | (v/n), weight (e/n)
## + edges from 300acb1 (vertex names):
##  [1] 2:24 --17:97  5:66 --35:32  6:23 --16:28  6:23 --58:18  6:23 --75:15 
##  [6] 7:126--10:72  7:126--12:101 7:126--20:72  7:126--20:73  7:126--20:74 
## [11] 7:126--20:75  9:82 --43:77  10:72--12:101 11:21--17:97  14:17--18:29 
## [16] 14:17--22:21  14:17--23:100 14:17--35:36  14:17--37:64  14:17--37:65 
## [21] 14:17--37:66  14:17--37:67  14:17--37:68  14:17--38:55  14:17--38:56 
## [26] 14:17--38:57  14:17--38:58  14:17--41:46  14:17--44:43  14:17--44:44 
## + ... omitted several edges

Interestingly, the cluster functions give 1 for kg8713_3 and 14 for kgu.

8 Combining Graphs (Union)

In this section we combine two separate ego networks based on two different verses. This is like independently looking at the interpretations of two separate verses together (reading different pages separated by quite a distance). We choose verses “2:255” (Ayat al-Kursi) and “16:90”. “2:255” is well known for its virtues9 and “16:90” is the command to be fair and just10.

## [[1]]
## IGRAPH 32feddc DNW- 27 44 -- 
## + attr: name (v/c), surah_id (v/n), ayah (v/n), ayah_id (v/n),
## | surah_title_en (v/c), revelation_type (v/c), text (v/c), SourceSura
## | (e/n), SourceAyah (e/n), TargetSura (e/n), TargetAyah (e/n),
## | common_roots (e/n), weight (e/n)
## + edges from 32feddc (vertex names):
##  [1] 2:255 ->3:1    2:255 ->3:2    2:255 ->13:9   2:255 ->19:64  2:255 ->19:93 
##  [6] 2:255 ->19:94  2:255 ->19:95  2:255 ->20:110 2:255 ->20:111 2:255 ->21:28 
## [11] 2:255 ->30:25  2:255 ->34:23  2:255 ->40:1   2:255 ->40:3   2:255 ->53:26 
## [16] 2:286 ->2:255  3:2   ->2:255  4:166 ->2:255  4:166 ->20:110 10:3  ->2:255 
## [21] 10:3  ->34:23  10:3  ->53:26  10:17 ->2:255  20:109->2:255  20:109->21:28 
## + ... omitted several edges
## 
## [[2]]
## IGRAPH 32feddc DNW- 7 10 -- 
## + attr: name (v/c), surah_id (v/n), ayah (v/n), ayah_id (v/n),
## | surah_title_en (v/c), revelation_type (v/c), text (v/c), SourceSura
## | (e/n), SourceAyah (e/n), TargetSura (e/n), TargetAyah (e/n),
## | common_roots (e/n), weight (e/n)
## + edges from 32feddc (vertex names):
##  [1] 16:90 ->5:45   16:90 ->7:33   16:90 ->16:126 16:90 ->17:26  16:90 ->42:40 
##  [6] 16:126->5:45   16:126->42:40  39:53 ->16:90  42:40 ->5:45   42:40 ->16:126

kge1 is a list of graphs, as expected. These should be the two ego graphs of verses “2:255” and “16:90” with the first degree links. We will display the two ego graphs separately.

## IGRAPH 32feddc DNW- 27 44 -- 
## + attr: name (v/c), surah_id (v/n), ayah (v/n), ayah_id (v/n),
## | surah_title_en (v/c), revelation_type (v/c), text (v/c), SourceSura
## | (e/n), SourceAyah (e/n), TargetSura (e/n), TargetAyah (e/n),
## | common_roots (e/n), weight (e/n)
## + edges from 32feddc (vertex names):
##  [1] 2:255 ->3:1    2:255 ->3:2    2:255 ->13:9   2:255 ->19:64  2:255 ->19:93 
##  [6] 2:255 ->19:94  2:255 ->19:95  2:255 ->20:110 2:255 ->20:111 2:255 ->21:28 
## [11] 2:255 ->30:25  2:255 ->34:23  2:255 ->40:1   2:255 ->40:3   2:255 ->53:26 
## [16] 2:286 ->2:255  3:2   ->2:255  4:166 ->2:255  4:166 ->20:110 10:3  ->2:255 
## [21] 10:3  ->34:23  10:3  ->53:26  10:17 ->2:255  20:109->2:255  20:109->21:28 
## + ... omitted several edges

## IGRAPH 32feddc DNW- 7 10 -- 
## + attr: name (v/c), surah_id (v/n), ayah (v/n), ayah_id (v/n),
## | surah_title_en (v/c), revelation_type (v/c), text (v/c), SourceSura
## | (e/n), SourceAyah (e/n), TargetSura (e/n), TargetAyah (e/n),
## | common_roots (e/n), weight (e/n)
## + edges from 32feddc (vertex names):
##  [1] 16:90 ->5:45   16:90 ->7:33   16:90 ->16:126 16:90 ->17:26  16:90 ->42:40 
##  [6] 16:126->5:45   16:126->42:40  39:53 ->16:90  42:40 ->5:45   42:40 ->16:126

We will combine these 2 ego graphs.

## IGRAPH 33b387c DN-- 33 54 -- 
## + attr: surah_id_1 (v/n), surah_id_2 (v/n), ayah_1 (v/n), ayah_2 (v/n),
## | ayah_id_1 (v/n), ayah_id_2 (v/n), surah_title_en_1 (v/c),
## | surah_title_en_2 (v/c), revelation_type_1 (v/c), revelation_type_2
## | (v/c), text_1 (v/c), text_2 (v/c), name (v/c), SourceSura_1 (e/n),
## | SourceSura_2 (e/n), SourceAyah_1 (e/n), SourceAyah_2 (e/n),
## | TargetSura_1 (e/n), TargetSura_2 (e/n), TargetAyah_1 (e/n),
## | TargetAyah_2 (e/n), common_roots_1 (e/n), common_roots_2 (e/n),
## | weight_1 (e/n), weight_2 (e/n)
## + edges from 33b387c (vertex names):
## [1] 42:40 ->16:126 42:40 ->5:45   16:126->42:40  16:126->5:45   16:90 ->42:40 
## + ... omitted several edges

We now display two versions of the resulting graph. The first one has “2:255” as the focus node.

The second version has “16:90” as the focus node.

Even at the first degree level, we see the key role of verse “39:53” as the bridge between the two ego networks. Let us just display the text for these verses.

## [1] "Allah - there is no deity except Him, the Ever-Living, the Sustainer of [all] existence. Neither drowsiness overtakes Him nor sleep. To Him belongs whatever is in the heavens and whatever is on the earth. Who is it that can intercede with Him except by His permission? He knows what is [presently] before them and what will be after them, and they encompass not a thing of His knowledge except for what He wills. His Kursi extends over the heavens and the earth, and their preservation tires Him not. And He is the Most High, the Most Great."
## [1] "Indeed, Allah orders justice and good conduct and giving to relatives and forbids immorality and bad conduct and oppression. He admonishes you that perhaps you will be reminded."
## [1] "Say, \"O My servants who have transgressed against themselves [by sinning], do not despair of the mercy of Allah. Indeed, Allah forgives all sins. Indeed, it is He who is the Forgiving, the Merciful.\""

The following plot looks at some centrality measures of the union kge1.

10 Repeat Analysis Using tidygraph

Earlier we created the kg8713_3 and kge1 networks using igraph. In this section we will show examples using tidygraph. A central aspect of tidygraph is to directly manipulate node and edge data from the tbl_graph object by activating nodes or edges. When we first create a tbl_graph object, the nodes will be activated. We can then directly calculate node or edge measures, like centrality, using tidyverse functions.

## [1] "igraph"
## [1] "tbl_graph" "igraph"
## # A tbl_graph: 16 nodes and 24 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 16 x 7 (active)
##   surah_id  ayah ayah_id surah_title_en revelation_type text               name 
##      <int> <int>   <int> <chr>          <chr>           <chr>              <chr>
## 1        7   126    1080 Al-A'raaf      Meccan          "And you do not r~ 7:126
## 2        9    82    1317 At-Tawba       Medinan         "So let them laug~ 9:82 
## 3       14    17    1767 Ibrahim        Meccan          "He will gulp it ~ 14:17
## 4       16    28    1929 An-Nahl        Meccan          "The ones whom th~ 16:28
## 5       17    97    2126 Al-Israa       Meccan          "And whoever Alla~ 17:97
## 6       20    74    2422 Taa-Haa        Meccan          "Indeed, whoever ~ 20:74
## # ... with 10 more rows
## #
## # Edge Data: 24 x 8
##    from    to SourceSura SourceAyah TargetSura TargetAyah common_roots weight
##   <int> <int>      <int>      <int>      <int>      <int>        <int>  <int>
## 1    16    12         87         13         43         77            0      2
## 2    16     8         87         13         35         36            1      2
## 3    12    16         43         77         87         13            0      2
## # ... with 21 more rows

Notice how the igraph kg8713_2 is converted into two separate tibbles, Node Data and Edge Data. But both kg8713_2 and gt are of the same igraph class. We purposely choose kg9713_2 because it is smaller thus making the plots more legible.

10.1 Direct ggraph integration

gt can directly be used with our preferred ggraph package for visualizing networks. Now it is much easier to experiment with modifications to node and edge parameters affecting layouts as it is not necessary to modify the underlying graph but only the plotting code, e.g.:

10.2 Use Selected Measures from tidygraph and Plot

We show below how to collate some of the node related measures. Now it appears in a tidy dataframe or tibble format.

Now we plot the various measures from the resulting node_measures dataframe.

Plot degree_in and degree_out together.

Plot degree (degree_in + degree_out) and betweenness together.

Plot closeness, pg_rank, eigen, br_score, coreness together.

Plot pg_rank, eigen together.

The values for br_score, and coreness are 0 or very small. Without any doubt “35:36”, “43:77” together with “20:74” and “87:13” are the most influential and important verses in this ego network.

Notice the slight difference in using the network measures with tidygraph. We can easily assemble the required measures for the nodes in a tidy dataframe. tidygraph has many functions that can give us information about nodes. We show examples of some measures that seem to measure slightly different things about the nodes.

  • degree: Number of direct connections
  • betweenness: How many shortest paths go through this node
  • pagerank: How many links pointed to me come from lot of pointed-to-nodes
  • eigen centrality: Something like the page rank but slightly different
  • closeness: How central is this node to the rest of the network
  • bridge score: Average decrease in cohesiveness if each of its edges were removed from the graph
  • coreness: K-core decomposition

11 Simple Text Analysis

With kg8713_2 in tidy form through gt we will now use some simple text analysis based on our earlier work, Quran English Word and Document Frequency With Tidytext

To work with this as a tidy dataset, we need to restructure it in the one-token-per-row format with the unnest_tokens() function.

Now that the data is in one-word-per-row format, we can manipulate it with tidy tools like dplyr. Often in text analysis, we will want to remove stop words; stop words are words that are not useful for an analysis, typically extremely common words such as “the”, “of”, “to”, and so forth in English. We can remove stop words (kept in the tidytext dataset stop_words) with an anti_join().

The stop_words dataset in the tidytext package contains stop words from three lexicons. We can use them all together, as we have here, or filter() to only use one set of stop words if that is more appropriate for a certain analysis.

11.1 Count and Plot

We can also use dplyr’s count() to find the most common words. Our word counts are stored in a tidy data frame. This allows us to pipe this directly to the ggplot2 package to create a visualization of the most common words in kg8713_2.

The plot above shows some common words in the verses that make up the second degree ego network of verse “87:13”, the text of which is Neither dying therein nor living.. The common words plotted above are all related to the nature of punishment in hell as alluded to in the verse “87:13”.


12 Summary

This post showed how to create a network graph of Tafseer Ibn Katheer limited to the classical method of Tafseer al-Quran bi al-Quran (تفسير القرآن بالقرآن), Interpretation of the Quran with the Quran. It is considered the best method to interpret the verses of the Quran.

We showed some features of network graphs that enable us to visualize aspects of the Tafseer that is almost impossible to do manually and difficlut to do without a network graph model.

For a selected Surah, in this case Surah 87, we can see the value of the network model in showing the in_degrees and out_degrees that link the verses. The original text only explicitly describes the out_degrees, other verses used to explain a particular verse. But when we created the graph, visualizing and analyzing the in and out links or connections become very easy.

For a single verse, we have again demonstrated the value of the ego network related functions that can show the links that

For a combination of selected verses we can also show how the network graph model is able to identify which verse(s) act(s) as a “bridge” linking them and at which degree do we see the “bridge”.

With the “text” atrribute of the nodes (verses) we can then use the tools of text analysis like tidytext. Here we only showed simple word counts. Despite that, the simple plot we generated based on the second degree verses that relate to the selected verse “87:13”, the keywords beautifully enhance its interpretation.

The obvious to-do list includes


13 References


  1. Introduction to Quran Analytics, https://www.linkedin.com/pulse/introduction-quran-analytics-azman-hussin

  2. Tafseer Ibn Katheer, https://en.wikipedia.org/wiki/Ibn_Kathir

  3. http://textminingthequran.com/

  4. https://rpubs.com/azmanH/696047

  5. https://rpubs.com/azmanH/708667

  6. https://gephi.org/

  7. http://www.recitequran.com/tafsir/en.ibn-kathir/87:14

  8. https://sites.fas.harvard.edu/~airoldi/pub/books/BookDraft-CsardiNepuszAiroldi2016.pdf

  9. http://www.recitequran.com/tafsir/en.ibn-kathir/2:255

  10. http://www.recitequran.com/tafsir/en.ibn-kathir/16:90