This is a study I am carrying out in my free time on Bukhari’s collection of ahadith, in Sahih Al Bukhari. I am still in the process of exploring the text collection, but unfortunately I am constrained by the amount of free time I have. This means the true spirit of exploring something is hindered, and I am unable to craft questions as much as I would like to.
Given the above, this document will most likely go through iterative updates, as and when I have free time on my hands, or even if I think about collecting more data from the Kutub al-Sittah.
The data for Sahih Al Bukhari was collected from www.Sultan.org. Although we have extracted data only from the aforesaid website, we had tested extracting the same data from other publicly available websites too. In this process, it was found that all the inspected websites had stored data in exactly the same format — that is, the source of data for all these websites is the same. This, however, means:
More about this will be discussed later.
The data was scraped from HTML source using R and the package rvest. Subsequent analyses were carried out in R, again, but making use of several packages. The entire code is available here on my Github repository.
The following graph shows the total number of ahadith found per book, labelled on the y-axis. The total number of books in Sahih al Bukhari as extracted from Sultan.org are 93.
As seen in the graph, the following five books comprise the highest total number of ahadith:
The following graphs add some more information to this by showing the proportion of ahadith found in each book.
The following graph displays the same information – except that it shows only those books containing almost 80% of ahadith in Al Bukhari, when these books are arranged in order of the amount of ahadith each contains.
From the above graph, we can say that 50% of ahadith in Sahih Bukhari are contained in 16 books, specifically:
This part of the analyses required tidying up and standardising names of narrators, many of which had been spelled varyingly (for examples, interchanging ‘ibn’ with ‘bin’, ‘Abi’ with ‘Abu’). Additionally, some narrators’ names contained typing errors, whilst others were incomplete (‘Jabir’, as an example, which may refer to the prominent companion ‘Jabir ibn Abdullah AlAnsari’, or any other companion whose first name also happens to be ‘Jabir’).
The task of standardising names was made further difficult as narrators’ names had not been entered in a standard fashion — Often, there would be multiple narrator names mentioned for a single hadith; at other times, a part of the hadith would also be found along with the narrator’s name.
This is what we had referred to in the About section earlier: The problems mentioned above were present in all other websites we inspected as well. This prevented us from obtaining a clean data set.
Still, an attempt has been made to standardise the names of narrators in order to calculate how many ahadith have been contributed by each narrator. We used string matching algorithms to programmatically identify narrators that are similar but have only been spelled differently. For example, Aishaa and Aisha would be identified as the same names, as would be Abu Bakr and Abi Bakra. Of course, even then, there are possibilities of false-positives — Abu Bakr and Abu Bakra can be different companions, and as long as only short names are provided in the data, string matching algorithms can produce false-positives.
The graphs presented below show:
The graph above shows there are a few narrators with an amount of narrations much higher than the rest.These, in order of the amount of narrations contributed, are the following:
But to see how much each narrator’s contribution is to the total number of narrations, the following graph is presented.
Another rendition of the same information as presented above, focusing only on those narrators contributing almost 60% of ahadith in Sahih Al Bukhari.
As it appears, almost 50% of ahadith in Sahih Al Bukhari come from the following seven chief narrators:
In addition to this, we can calculate how many different books in Sahih Al Bukhari each narrator contributed to. This information is presented in a graph below.
The ranking stays almost the same for the highest contributors — Aisha bint Abu Bakr, Abu Hurairah, Anas ibn Malik, Abdullah ibn Umar, and Abdullah ibn Abbas.