A brief introduction to bibliometrix

#install.packages("dplyr")
#install.packages("Matrix")
#install.packages("stringr")
#install.packages("igraph")
#install.packages("bibliometrix", dependencies=TRUE)
library("dplyr")
library("Matrix")
library("stringr")
library("igraph")
library("FactoMineR")
library("factoextra")
library("ggplot2")
library("bibliometrix") ### load bibliometrix package
D <- readFiles("ccspcautocorrelated.bib")
#D
# base ISI WoK
# M <‐ convert2df(D, dbsource = "isi", format = "bibtex")
# base SCOPUS
M<‐ convert2df(D, dbsource = "scopus", format = "bibtex")
Articles extracted   100 
Articles extracted   124 
#M

The first step is to perform a descriptive analysis of the bibliographic data frame.

results <‐ biblioAnalysis(M, sep = ";")

Functions summary and plot

S=summary(object = results, k = 10, pause = FALSE)


Main Information about data

 Articles                              124 
 Sources (Journals, Books, etc.)       60 
 Keywords Plus (ID)                    589 
 Author's Keywords (DE)                278 
 Period                                1992 - 2017 
 Average citations per article         19.56 

 Authors                               226 
 Author Appearances                    303 
 Authors of single authored articles   9 
 Authors of multi authored articles    217 

 Articles per Author                   0.549 
 Authors per Article                   1.82 
 Co-Authors per Articles               2.44 
 Collaboration Index                   2.07 
 

Annual Scientific Production

 Year    Articles
    1992        1
    1994        2
    1995        3
    1996        5
    1997        9
    1998        2
    1999        2
    2000        5
    2001        2
    2002        8
    2003        8
    2004        3
    2005        3
    2006        3
    2007        6
    2008        7
    2009        6
    2010        7
    2011        7
    2012        4
    2013        8
    2014        6
    2015        7
    2016        9
    2017        1

Annual Percentage Growth Rate 0 


Most Productive Authors

   Authors        Articles Authors        Articles Fractionalized
1      TSUNG,F           6     ZHANG,NF                      3.00
2      ADAMS,BM          4     TSUNG,F                       2.50
3      APLEY,DW          4     RUNGER,GC                     2.33
4      RUNGER,GC         4     WOODALL,WH                    2.33
5      TESTIK,MC         4     APLEY,DW                      2.00
6      WOODALL,WH        4     SUN,J                         2.00
7      AMIRI,A           3     PERRY,MB                      1.83
8      ANG,BW            3     WEI,CH                        1.75
9      ATIENZA,OO        3     ADAMS,BM                      1.50
10     CONERLY,MD        3     LU,C-W                        1.50


Top manuscripts per citations

                                                         Paper           TC TCperYear
1  WARDELL DG ;MOSKOWITZ H ;PLANTE RD,(1994),TECHNOMETRICS              220      9.57
2  LU C-W ;REYNOLDSJR MR,(1999),J QUAL TECHNOL                          172      9.56
3  LU C-W ;REYNOLDSJR MR,(1999),J QUAL TECHNOL                          130      7.22
4  ZHANG NF,(1998),TECHNOMETRICS                                        117      6.16
5  JIANG W ;WOODALL WH ;TSUI K-L,(2000),TECHNOMETRICS                    94      5.53
6  LU C-W ;REYNOLDSJR MR,(2001),J QUAL TECHNOL                           76      4.75
7  MARAGAH HD,(1992),J. STAT. COMPUT. SIMUL.                             74      2.96
8  RUNGER GC ;WILLEMAIN TR ;PRABHU S,(1995),COMMUN STAT THEORY METHODS   63      2.86
9  MONTGOMERY DOUGLASC ;WOODALL WILLIAMH,(1997),J QUAL TECHNOL           62      3.10
10 NOOROSSANA R ;AMIRI A ;SOLEIMANI P,(2008),COMMUN STAT THEORY METHODS  53      5.89


Most Productive Countries

        Country   Articles   Freq
1  USA                  51 0.4113
2  CHINA                 9 0.0726
3  TAIWAN                9 0.0726
4  TURKEY                6 0.0484
5  BRAZIL                5 0.0403
6  INDIA                 5 0.0403
7  IRAN                  5 0.0403
8  GERMANY               4 0.0323
9  HONG KONG             4 0.0323
10 UNITED KINGDOM        4 0.0323


Total Citations per Country

     Country      Total Citations Average Article Citations
1  USA                       1331                     26.10
2  TAIWAN                     461                     51.22
3  HONG KONG                  150                     37.50
4  IRAN                        78                     15.60
5  UNITED KINGDOM              78                     19.50
6  GERMANY                     64                     16.00
7  INDIA                       47                      9.40
8  TUNISIA                     34                     11.33
9  CHINA                       32                      3.56
10 SINGAPORE                   30                     30.00


Most Relevant Sources

                                             Sources        Articles
1  JOURNAL OF QUALITY TECHNOLOGY                                  19
2  QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL              11
3  JOURNAL OF APPLIED STATISTICS                                   7
4  IIE TRANSACTIONS (INSTITUTE OF INDUSTRIAL ENGINEERS)            6
5  COMMUNICATIONS IN STATISTICS: SIMULATION AND COMPUTATION        5
6  JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION               5
7  INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH                    4
8  QUALITY ENGINEERING                                             4
9  TECHNOMETRICS                                                   4
10 QINGHUA DAXUE XUEBAO/JOURNAL OF TSINGHUA UNIVERSITY             3


Most Relevant Keywords

     Author Keywords (DE)      Articles      Keywords-Plus (ID)     Articles
1  STATISTICAL PROCESS CONTROL       69 CONTROL CHARTS                    28
2  AUTOCORRELATION                   35 FLOWCHARTING                      27
3  CONTROL CHARTS                    21 QUALITY CONTROL                   22
4  AVERAGE RUN LENGTH                18 STATISTICAL PROCESS CONTROL       22
5  SPC                               13 CORRELATION METHODS               19
6  AUTOCORRELATED DATA                9 MATHEMATICAL MODELS               19
7  AUTOCORRELATED PROCESSES           9 AUTOCORRELATION                   17
8  TIME SERIES ANALYSIS               8 COMPUTER SIMULATION               15
9  AUTOCORRELATED PROCESS             6 PARAMETER ESTIMATION              11
10 CONTROL CHART                      6 REGRESSION ANALYSIS               11

Some basic plots can be drawn using the generic function :

plot(x = results, k = 10, pause = FALSE)

Analysis of Cited References

To obtain the most frequent cited manuscripts:

#M$CR[1]
CR <‐ citations(M,field="article", sep = ";")
CR$Cited[1:10]
CR
    MONTGOMERY, DC, MASTRANGELO, CM, SOME STATISTICAL PROCESS CONTROL METHODS FOR AUTOCORRELATED DATA (1991) JOURNAL OF QUALITY TECHNOLOGY, 23, PP 179-193 
                                                                                                                                                        15 
    HARRIS, TJ, ROSS, WH, STATISTICAL PROCESS CONTROL PROCEDURES FOR CORRELATED OBSERVATIONS (1991) CANADIAN JOURNAL OF CHEMICAL ENGINEERING, 69, PP 48-57 
                                                                                                                                                        14 
             LUCAS, JM, SACCUCCI, MS, EXPONENTIALLY WEIGHTED MOVING AVERAGE CONTROL SCHEMES: PROPERTIES AND ENHANCEMENTS (1990) TECHNOMETRICS, 32, PP 1-12 
                                                                                                                                                        13 
                     WARDELL, DG, MOSKOWITZ, H, PLANTE, RD, CONTROL CHARTS IN THE PRESENCE OF DATA CORRELATION (1992) MANAGEMENT SCIENCE, 38, PP 1084-1105 
                                                                                                                                                        12 
                                                     ZHANG, NF, A STATISTICAL CONTROL CHART FOR STATIONARY PROCESS DATA (1998) TECHNOMETRICS, 40, PP 24-38 
                                                                                                                                                        12 
                          JOHNSON, RA, BAGSHAW, M, THE EFFECT OF SERIAL CORRELATION ON THE PERFORMANCE OF CUSUM TESTS (1974) TECHNOMETRICS, 16, PP 103-112 
                                                                                                                                                        10 
              ALWAN, LC, ROBERTS, HV, TIME-SERIES MODELING FOR STATISTICAL PROCESS CONTROL (1988) JOURNAL OF BUSINESS AND ECONOMIC STATISTICS, 6, PP 87-95 
                                                                                                                                                         9 
                ALWAN, LC, ROBERTS, HV, TIME-SERIES MODELING FOR STATISTICAL PROCESS CONTROL (1988) JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 6, PP 87-95 
                                                                                                                                                         8 
                         BAGSHAW, M, JOHNSON, RA, THE EFFECT OF SERIAL CORRELATION ON THE PERFORMANCE OF CUSUM TESTS II (1975) TECHNOMETRICS, 17, PP 73-80 
                                                                                                                                                         8 
HARRIS, TJ, ROSS, WH, STATISTICAL PROCESS CONTROL PROCEDURES FOR CORRELATED OBSERVATIONS (1991) THE CANADIAN JOURNAL OF CHEMICAL ENGINEERING, 69, PP 48-57 
                                                                                                                                                         8 

To obtain the most frequent cited first authors:

CR <‐ citations(M, field = "author", sep = ";")
CR$Cited[1:10]
CR
MONTGOMERY, DC    WOODALL, WH      ALWAN, LC       BOX, GEP     RUNGER, GC 
           164            142             96             87             76 
      TSUNG, F    ROBERTS, HV   MOSKOWITZ, H    JENKINS, GM     PLANTE, RD 
            70             59             58             57             57 

To obtain the most frequent local cited authors:

CR <‐ localCitations(M, results, sep = ";")
CR[1:10]
CR
      TSUNG,F   MOSKOWITZ,H       JIANG,W      SCHMID,W  NOOROSSANA,R      PRABHU,S 
           70            58            35            32            22            20 
        PAN,X CASTAGLIOLA,P        WANG,Z       AMIRI,A 
           19            16            16            14 

The function dominance calculates the authors’ dominance ranking as proposed by Kumar & Kumar, 2008. Function arguments are: results (object of class bibliometrix) obtained by biblioAnalysis; and k (the number of authors to consider in the analysis).

The Dominance Factor is a ratio indicating the fraction of multi authored articles in which a scholar appears as first author.

DF <‐ dominance(results, k = 10)
DF

Authors’ hindex

The index is based on the set of the scientist’s most cited papers and the number of citations that they have received in other publications.

indices <‐ Hindex(M, authors=c("WOODALL WH"), sep = ";")
# Bornmann's impact indices:
indices$H
indices$CitationList
[[1]]
NA

To calculate the hindex of the first 15 most productive authors (in this collection):

authors=gsub(","," ",names(results$Authors)[1:15])
indices <‐ Hindex(M, authors, sep = ";")
indices$H

Lotka’s Law coefficient estimation

The function lotka estimates Lotka’s law coefficients for scientific productivity (Lotka A.J., 1926). Lotka’s law describes the frequency of publication by authors in any given field as an inverse square law, where the number of authors publishing a certain number of articles is a fixed ratio to the number of authors publishing a single article. This assumption implies that the theoretical beta coefficient of Lotka’s law is equal to 2. Using lotka function is possible to estimate the Beta coefficient of our bibliographic collection and assess,through a statistical test, the similarity of this empirical distribution with the theoretical one.

L <‐ lotka(results)
# Author Productivity. Empirical Distribution
L$AuthorProd
# Beta coefficient estimate
L$Beta
[1] 2.792959
# Constant
L$C
[1] 0.9175485
# Goodness of fit
L$R2
[1] 0.9802325
# P‐value of K‐S two sample test
L$p.value
[1] 0.8186212

You can compare the two distributions using plot function:

# Observed distribution
Observed=L$AuthorProd[,3]
# Theoretical distribution with Beta = 2
Theoretical=10^(log10(L$C)‐2*log10(L$AuthorProd[,1]))
plot(L$AuthorProd[,1],Theoretical,type="l",col="red",ylim=c(0, 1), xlab="Articles",ylab="Freq. of
Authors",main="Scientific Productivity")
lines(L$AuthorProd[,1],Observed,col="blue")
legend(x="topright",c("Theoretical (B=2)","Observed"),col=c("red","blue"),lty =
c(1,1,1),cex=0.6,bty="n")

**___**

Bibliometric network matrices

Manuscript’s attributes are connected to each other through the manuscript itself: author(s) to journal, keywords to publication date, etc. These connections of different attributes generate bipartite networks that can be represented as rectangular matrices (Manuscripts x Attributes). Furthermore, scientific publications regularly contain references to other scientific works. This generates a further network, namely, cocitation or coupling network. These networks are analysed in order to capture meaningful properties of the underlying research system, and in particular to determine the influence of bibliometric units such as scholars and journals.

Bipartite networks

cocMatrix is a general function to compute a bipartite network selecting one of the metadata attributes. For example, to create a network Manuscript x Publication Source you have to use the field tag “SO”:

For a complete list of field tags see https://images.webofknowledge.com/WOKRS410B4/help/WOS/h_fieldtags.html

A <‐ cocMatrix(M, Field = "SO", sep = ";")

A is a rectangular binary matrix, representing a bipartite network where rows and columns are manuscripts and sources respectively. The generic element is 1 if the manuscript has been published in source , 0 otherwise. The column sum is the number of manuscripts published in source. Sorting, in decreasing order, the column sums of A, you can see the most relevant publication sources:

sort(Matrix::colSums(A), decreasing = TRUE)[1:15]
                                         JOURNAL OF QUALITY TECHNOLOGY 
                                                                    19 
                     QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL 
                                                                    11 
                                         JOURNAL OF APPLIED STATISTICS 
                                                                     7 
                  IIE TRANSACTIONS (INSTITUTE OF INDUSTRIAL ENGINEERS) 
                                                                     6 
                     JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 
                                                                     5 
              COMMUNICATIONS IN STATISTICS: SIMULATION AND COMPUTATION 
                                                                     5 
                          INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH 
                                                                     4 
                                                   QUALITY ENGINEERING 
                                                                     4 
                                                         TECHNOMETRICS 
                                                                     4 
                   QINGHUA DAXUE XUEBAO/JOURNAL OF TSINGHUA UNIVERSITY 
                                                                     3 
                                  COMPUTERS AND INDUSTRIAL ENGINEERING 
                                                                     2 
                              EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 
                                                                     2 
           INTERNATIONAL JOURNAL OF INDUSTRIAL AND SYSTEMS ENGINEERING 
                                                                     2 
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING, INFORMATION AND CONTROL 
                                                                     2 
                       CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS 
                                                                     2 

Following this approach, you can compute several bipartite networks: Citation network

A <‐ cocMatrix(M, Field = "CR", sep = ". ")
sort(Matrix::colSums(A), decreasing = TRUE)[1:15]
 TECHNOMETRICS  AUTOCORRELATE    ENGINEERING    OBSERVATION AUTOCORRELATIO 
            97             87             63             62             56 
   MULTIVARIAT  COMMUNICATION    INTRODUCTIO   TRANSACTIONS   EXPONENTIALL 
            55             55             55             51             51 
   MASTRANGELO   DISTRIBUTION  INTERNATIONAL   INTERNATIONA   MANUFACTURIN 
            48             47             46             41             39 

Author network

A <‐ cocMatrix(M, Field = "AU", sep = ";")
sort(Matrix::colSums(A), decreasing = TRUE)[1:15]
            APLEY DW                TSUNG F              DYER JN  
                    4                     4                     3 
           RUNGER GC            ATIENZA OO               TANG LC  
                    3                     3                     3 
               ANG BW               LU C-W          REYNOLDSJR MR 
                    3                     3                     3 
             ZHANG NF TRIANTAFYLLOPOULOS K               TSUNG F  
                    3                     2                     2 
           TESTIK MC             CAPIZZI G            MASAROTTO G 
                    2                     2                     2 

Country network Authors’ Countries is not a standard attribute of the bibliographic data frame. You need to extract this information from affiliation attribute using the function metaTagExtraction.

M <‐ metaTagExtraction(M, Field = "AU_CO", sep = ";")
A <‐ cocMatrix(M, Field = "AU_CO", sep = ";")
sort(Matrix::colSums(A), decreasing = TRUE)[1:15]
 UNITED STATES          CHINA         TAIWAN      HONG KONG         TURKEY 
            57              9              9              7              7 
       GERMANY          KOREA          INDIA           IRAN         BRAZIL 
             6              5              5              5              5 
UNITED KINGDOM        TUNISIA      SINGAPORE          ITALY       PORTUGAL 
             4              4              4              3              3 

metaTagExtraction allows to extract the following additional field tags: Authors’ countries (Field = “AU_CO”); First author of each cited reference (Field = “CR_AU”); Publication source of each cited reference (Field = “CR_SO”); and Authors’ affiliations (Field = “AU_UN”).

Author keyword network

A <‐ cocMatrix(M, Field = "DE", sep = ";")
sort(Matrix::colSums(A), decreasing = TRUE)[1:15]
             STATISTICAL PROCESS CONTROL                          AUTOCORRELATION 
                                      69                                       35 
                          CONTROL CHARTS                       AVERAGE RUN LENGTH 
                                      21                                       18 
                                     SPC                      AUTOCORRELATED DATA 
                                      13                                        9 
                AUTOCORRELATED PROCESSES                     TIME SERIES ANALYSIS 
                                       9                                        8 
                         QUALITY CONTROL                            CONTROL CHART 
                                       6                                        6 
                  AUTOCORRELATED PROCESS MULTIVARIATE STATISTICAL PROCESS CONTROL 
                                       6                                        6 
       STATISTICAL PROCESS CONTROL (SPC)                                     EWMA 
                                       5                                        4 
                                   CUSUM 
                                       4 

Keyword Plus network

A <‐ cocMatrix(M, Field = "ID", sep = ";")
sort(Matrix::colSums(A), decreasing = TRUE)[1:15]
             CONTROL CHARTS                FLOWCHARTING STATISTICAL PROCESS CONTROL 
                         28                          26                          22 
            QUALITY CONTROL         MATHEMATICAL MODELS         CORRELATION METHODS 
                         21                          19                          19 
            AUTOCORRELATION         COMPUTER SIMULATION        PARAMETER ESTIMATION 
                         15                          15                          11 
     AUTOCORRELATED PROCESS         REGRESSION ANALYSIS                  MONITORING 
                         10                          10                           8 
            PROCESS CONTROL         MONTE CARLO METHODS         AVERAGE RUN LENGTHS 
                          7                           7                           7 

Bibliographic coupling Two articles are said to be bibliographically coupled if at least one cited source appears in the bibliographies or reference lists of both articles (Kessler, 1963). A coupling network can be obtained using the general formulation: \(B=A.A^T\). Where A is a bipartite network.

The function biblioNetwork calculates, starting from a bibliographic data frame, the most frequently used coupling networks: Authors, Sources, and Countries.

biblioNetwork uses two arguments to define the network to compute:

  1. analysis argument can be “cocitation”, “coupling”, “collaboration”, or “cooccurrences”.
  2. network argument can be “authors”, “references”, “sources”, “countries”, “universities”, “keywords”, “author_keywords”, “titles” and “abstracts”.
NetMatrix <‐ biblioNetwork(M, analysis = "coupling", network = "references", sep = ". ")

Articles with only a few references, therefore, would tend to be more weakly bibliographically coupled, if coupling strength is measured simply according to the number of references that articles contain in common. This suggests that it might be more practicable to switch to a relative measure of bibliographic coupling. couplingSimilarity function calculates Jaccard or Salton similarity coefficient among vertices of a coupling network.

NetMatrix <‐ biblioNetwork(M, analysis = "coupling", network = "authors", sep = ";")
# plot authors' similarity (first 20 authors)
net=networkPlot(NetMatrix, n = 20, Title = "Authors' Coupling", type = "fruchterman", size=FALSE, remove.multiple=TRUE, vos.path="c:/Users/DE/Desktop/bibliometRics/VOSviewer")

# calculate jaccard similarity coefficient
S <‐ couplingSimilarity(NetMatrix, type="jaccard")
# plot authors' similarity (first 20 authors)
net=networkPlot(S, n = 20, Title = "Authors' Coupling", remove.multiple=TRUE, vos.path="c:/Users/DE/Desktop/bibliometRics/VOSviewer")

Bibliographic cocitation We talk about cocitation of two articles when both are cited in a third article. Thus, cocitation can be seen as the counterpart of bibliographic coupling. A cocitation network can be obtained using the general formulation: \(C=A^t.A\). where A is a bipartite network.

NetMatrix <‐ biblioNetwork(M, analysis = "co‐citation", network = "references", sep = ". ")
net=networkPlot(NetMatrix, n = 20, Title = "Bibliographic cocitation", type = "fruchterman", size=FALSE, remove.multiple=TRUE, vos.path="c:/Users/DE/Desktop/bibliometRics/VOSviewer")

Bibliographic collaboration Scientific collaboration network is a network where nodes are authors and links are coauthorships as the latter is one of the most well documented forms of scientific collaboration (Glanzel, 2004). An author collaboration network can be obtained using the general formulation: \(AC=A^t.A\). where A is a bipartite network Manuscripts x Authors.

NetMatrix <‐ biblioNetwork(M, analysis = "collaboration", network = "authors", sep = ";")
net=networkPlot(NetMatrix, n = 20, Title = "Bibliographic collaboration", type = "fruchterman", size=FALSE, remove.multiple=TRUE, vos.path="c:/Users/DE/Desktop/bibliometRics/VOSviewer")

Visualizing bibliographic networks

Country Scientific Collaboration

# Create a country collaboration network
M <‐ metaTagExtraction(M, Field = "AU_CO", sep = ";")
NetMatrix <‐ biblioNetwork(M, analysis = "collaboration", network = "countries", sep = ";")
# Plot the network
net=networkPlot(NetMatrix, n = 20, Title = "Country Collaboration", type = "circle", size=TRUE,remove.multiple=FALSE)

net=networkPlot(NetMatrix, n = 20, Title = "Country Collaboration", type = "fruchterman", size=FALSE, remove.multiple=TRUE, vos.path="c:/Users/DE/Desktop/bibliometRics/VOSviewer")

CoCitation Network

# Create a co‐citation network
NetMatrix <‐ biblioNetwork(M, analysis = "co‐citation", network = "references", sep = ". ")
# Plot the network
net=networkPlot(NetMatrix, n = 5, Title = "Co‐Citation Network", type = "fruchterman", size=T, remove.multiple=FALSE)

Keyword cooccurrences

# Create keyword co‐occurrencies network
NetMatrix <‐ biblioNetwork(M, analysis = "co‐occurrences", network = "keywords", sep = ";")
# Plot the network
net=networkPlot(NetMatrix, n = 20, Title = "Keyword Co‐occurrences", type = "kamada", size=T)

CoWord Analysis: Conceptual structure of a field The aim of the coword analysis is to map the conceptual structure of a framework using the word cooccurrences in a bibliographic collection. The analysis can be performed through dimensionality reduction techniques such as Multidimensional Scaling (MDS) or Multiple Correspondence Analysis (MCA). Here, we show an example using the function conceptualStructure that performs a MCA to draw a conceptual structure of the field and Kmeans clustering to identify clusters of documents which express common concepts. Results are plotted on a twodimensional map. conceptualStructure includes natural language processing (NLP) routines (see the function termExtraction) to extract terms from titles and abstracts. In addition, it implements the Porter’s stemming algorithm to reduce inflected (or sometimes derived) words to their word stem, base or root form.

# Conceptual Structure using keywords
CS <- conceptualStructure(M,field="ID", minDegree=4, k.max=5, stemming=FALSE)

Historical Co-Citation Network Historiographic map is a graph proposed by E. Garfield to represent a chronological network map of most relevant co-citations resulting from a bibliographic collection. The function generates a chronological co-citation network matrix which can be plotted using histPlot:

# Create a historical co-citation network
histResults <- histNetwork(M, n = 10, sep = ". ")
Error in `$<-.data.frame`(`*tmp*`, "LCS", value = numeric(0)) : 
  replacement has 0 rows, data has 10
