A Statistical Study of Al Bukhari’s Hadith Collection

Part 1
Generated On: Monday, 04-Apr-2016

About

This is a study of Bukhari’s collection of ahadith, in Sahih Al Bukhari via Statistical means, using Descriptive Statistics, Text Mining, and Social Network Analysis. Being constrained by time, I may not be able to explore the data from more interesting perspectives all at once. As such, this document will likely go through iterative updates, as and when I have free time on my hands.

The data for Sahih Al Bukhari was collected from Qaala Rasulallah, which is an initiative by Arees Institute. Anybody visiting the website would laud the effort that has gone into this project - a comprehensive collection of ahadith, narrators, their biographies (including their tribes, dates of birth and death, etc.) awaits a user in a user-friendly interface.

Although we have extracted data only from the aforesaid website, the ahadith are actually sourced from another website (www.sunnah.com). This means that any inaccuracies that might be present in the source will also be prevalent in the copies.
We do confirm having checked some data on the website (www.sunnah.com) prior to using the present website, and we found some inaccuracies in, as examples, recording narrator names and ahadith numbers. However, the website we eventually used did not contain similar errors - we assume that such minor errors were fixed by the website developers.

Notwithstanding, since the website under question is undergoing development, there are certain data that are approximate, such as hadith numbers/indexes, and total number of ahadith in each collection of the Kutub Sitta. These will not have a considerable impact on this study, however.

Summary Statistics

In the extracted data, 98 books were found in Bukhari’s collection. This is a single book more than what can be found on www.sunnah.com - the difference is of a book absent from www.sunnah.com, by the name of ‘Punishment of Disbelievers at War with Allah and His Apostle’.

Ahadith Per Book

We can look at the number of ahadith recorded in each book of Sahih Bukhari. You may hover over the points to obtain the book name and number of ahadith recorded in it.

As seen in the graph, the following five books comprise the highest total number of ahadith:

  • Prophetic Commentary
  • Military expeditions led by the Prophet (s)
  • Fighting for the cause of Allah (Jihad)
  • Good manners and form (Al-Adab)
  • Pilgrimage (Hajj)

The following graphs add some more information to this by showing the proportion of ahadith found in each book.

Expectedly, there are a handful books that contribute 50% of ahadith in Sahih Bukhari:

  1. Tafseer of the Prophet (s)
  2. Military expeditions led by the Prophet (s)
  3. Fighting for the cause of Allah (Jihad)
  4. Adhaan
  5. Al-Adab
  6. Hajj
  7. Dress
  8. Tauheed
  9. Nikah
  10. Ar-Riqaq
  11. Sales and Trade
  12. Merits of the Ansar
  13. Salaah
  14. Prophets
  15. Virtues and Merits of the Prophet (s) and his companions

Narrators Per Book

It may be interesting to see how many different narrators’ narrations were recorded in each book - perhaps there are some books that contain narrations from few narrators. This could indicate that Bukhari found such narrators to be preferable to others in relation to the certain topics.

Everything seems normal here — there aren’t any books that have a large number of ahadith recorded from a small number of narrators.

Ahadith Per Narrator

This section will attempt to shift the focus from books to narrators — how many ahadith have been narrated by each, how many are the books to which each narrator’s hadith get collected into, etc.

We identified 473 different narrators (there may be a chance of slight error in this number, due to the way narrators’ names are recorded in the sourced data), of which, the top 30 are shown below.

We can, as we did earlier, present the same data in another way to show how much each of the top narrators’ hadith contribute cumulatively to the collection of Bukhari.

So, we can see that these 30 narrators contribute about 75% of ahadith recorded in Sahih Bukhari. We can also notice that of these, there are 5 narrators that contribute 50% of all ahadith, and these are:

  1. Abu Hurairah
  2. Anas bin Malik
  3. ’Aisha bint Abi Bakar
  4. Abdullah bin ’Umar
  5. Abdullah bin ’Abbas

It is interesting to get a granular view of this data to see the books to which narrations from these 30 narrators were recorded. There is a possibility that some of these narrator’s narrations may have been recorded in a lot of books, so we shall only display the top 5 books by number of narrations for each narrator.

But before that, the chart below shows the number of books in which narrations have been recorded by Bukhari from each of the 30 narrators.

Books Per Narrator

The following chart shows Top 10 books to which narrations from Top 10 narrators were recorded in Sahih Bukhari, where the Top 9 were identified based on the total number of ahadith they narrated.
Additionally, Ameer al-Momineen and Hussain ibn Ali as narrators, are also presented below for the books to which their narrations were recorded.

The following chart shows all books to which narrations from Ameer Al-Momineen were recorded.

The charts generated hitherto show us that there are specific narrators from which Bukhari has primarily included ahadith in his collection. But to gain further understanding, we can turn attention to the narrators to see
1. How many times a narrator was narrated from, 2. Which narrators narrated from only specific people, 3. What relationships exist between narrators of different categories (Companions, Tabe’i, Taba Tabe’i, etc.), 4. If narrators from certain tribes narrated from only a select tribes

Such will be covered in the next section.

Narrators’ Analysis

To obtain information about narrators, I obtained data from another site also maintained by people from Arees Institute - Muslim Scholars.
The site stores data of Muslim scholars or famous people from the era of the Prophet (s) until the 15th Century. It contains valuable information such as the lineage, birth and death dates, interests, and supplements these with notes — these include miscellaneous information, such as the tribe name of a person, whether a person is a Shaykh or a Mufassir, and other biographical information. There are instances in which such data for scholars are not available.

Networks of Narrators

To begin, we can look at networks of narrators, which can outline extraordinary links between different narrators (if these exist), or show groups of narrators within networks.
We look at these networks by splitting our dataset for the following categories of narrators:

  1. Prophet’s family (s) — coloured Strong lime green
  2. Companions — coloured Vivid red
  3. Tabe’i — coloured Dark blue
  4. Taba’ Tabe’i — coloured Light pink
  5. 3rd Century Narrators — coloured Pure Yellow

Network of Prophet’s Family and his Companions

Network of Prophet’s Companions and the Tabi’un

Network of Tabi’un and Taba’ Tabi’un

Network of Taba’ Tabi’un and 3rd Century Scholars

Network of Narrators narrating from or to Prophet’s Family

Network of All Categories

The above helps only in obtaining a (very) high level view of the links in our network of Narrators, and even then it has limited legibility when a high number of narrators are visualised.
In this post, we shall endeavour to analyse the networks visualised above to obtain some insights into the network structure.