Background:

Co-authorship collaboration across disciplines and organizations is essential in research field. Social Networking Analysis provides tools to measure and visualize the cohesion between research groups.

The Yale Institute for Global Health (YIGH) which was established on July 2019 aiming to to speed the translation of new scientific discoveries into better health for all. YIGH fosters interdisciplinary research collaboration among Yale faculty and across international partners.

We will use Yale Institute for Global Health (YIGH) members co-authorship data to measure the cohesion at YIGH network and copmare it with the last 3 years of the previous global program.

Install Packages.

We will use “igraph” package for calculating measurements.

library(igraph)
## Warning: package 'igraph' was built under R version 3.6.2
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:igraph':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readxl)
## Warning: package 'readxl' was built under R version 3.6.2

Import data

e <- read_excel("C:/Users/12035/OneDrive/Desktop/YIGH edgelists by year/YIGHedgelist2.xlsx")
n <- read_excel("C:/Users/12035/OneDrive/Desktop/YIGH edgelists by year/YIGHnodeslist.xlsx")

The edgelist consists of source (author1), target (author2), weight of co-authorship and Year of co-authorship.

head(e)
## # A tibble: 6 x 4
##    Year source target weight
##   <dbl>  <dbl>  <dbl>  <dbl>
## 1  2010      1     45      1
## 2  2011      1     27      2
## 3  2013      1     27      2
## 4  2015      1     27      1
## 5  2016      1     27      1
## 6  2019      1     27      1

The nodelist consists of author id, author name, total number of documents, status (new or old YIGH member), department and color which is department’s color .

head(n)
## # A tibble: 6 x 6
##      id name                 Documentation status Department color 
##   <dbl> <chr>                        <dbl> <chr>  <chr>      <chr> 
## 1     1 Aksoy, S.                      201 new    Public     gold  
## 2     2 Alfaro-Murillo, J.A.             9 new    Public     gold  
## 3     3 Altice, F.L.                   353 old    Public     gold  
## 4     4 Annamalai, A.                   11 old    Psychiatry violet
## 5     5 Arnold, L.                       9 old    Pediatrics red   
## 6     6 Bell, M.L.                      28 new    Public     gold

Cohesion measurements analysis

1- Density:

Group cohesion is simply the total sum of interaction between people (dyads) in the group. We can then compare that total with the total for other groups of similar size, to get a sense of the relative amount of cohesion in each.

Alternatively, we could normalize this total by dividing by the maximum possible, facilitating comparisons across graphs of different sizes. The maximum possible ties in undirected graph equals n*(n-1)/2, where n is the number of nodes in the graph. This measurements is called density.

Now we will measure network density for 2016, 2017, 2018 and 2019 (Year of YIGH establishment)

#subset 2016, 2017, 2018, 2019 data from the edgelist

e2019 <- subset(e, e$Year == "2019")
e2018 <- subset(e, e$Year == "2018")
e2017 <- subset(e, e$Year == "2017")
e2016 <- subset(e, e$Year == "2016")

#select source, target and weight columns

e2019 <- e2019[, c(2,3,4)]
e2018 <- e2018[, c(2,3,4)]
e2017 <- e2017[, c(2,3,4)]
e2016 <- e2016[, c(2,3,4)]

#Create object graph from edgelists

g2019 <- graph.data.frame(e2019, directed = FALSE)
g2018 <- graph.data.frame(e2018, directed = FALSE)
g2017 <- graph.data.frame(e2017, directed = FALSE)
g2016 <- graph.data.frame(e2016, directed = FALSE)

Let’s calculate density:

#2019 density
graph.density(g2019, loops = FALSE)
## [1] 0.03188406
#2018 density
graph.density(g2018, loops = FALSE)
## [1] 0.03413462
#2017 density
graph.density(g2017, loops = FALSE)
## [1] 0.03116769
#2016 density
graph.density(g2016, loops = FALSE)
## [1] 0.03367434

We can see that the density of the network at 2019 (after YIGH establishment) is lower than the previous year 2018, but at the same time it is higher than 2017.

2-Average node degree:

Average node degree is the average number of ties between dyads in the network. Average degree = density * (n-1), which n = number of nodes.

#2019
mean(degree(g2019))
## [1] 2.2
#2018
mean(degree(g2018))
## [1] 2.184615
#2017
mean(degree(g2017))
## [1] 2.088235
#2016
mean(degree(g2016))
## [1] 2.289855

3- Diameter:

Diameter is the longest shortest path between any two nodes in the network.

#2019

diameter(g2019)
## [1] 12
#2018

diameter(g2018)
## [1] 16
#2017

diameter(g2017)
## [1] 24
#2016

diameter(g2016)
## [1] 12

As we can see the longest shortest path between any two nodes in the network is the shortest at 2019 in comparison to 2018 and 2017, so the cohesion is higher at 2019.

4- Average path length:

Average path length is the average number of steps along the shortest paths for all possible pairs of network nodes. It measures the efficiency of information transfer between actors.

#2019
average.path.length(g2019)
## [1] 3.91366
#2018
average.path.length(g2018)
## [1] 4.938924
#2017
average.path.length(g2017)
## [1] 6.957049
#2016
average.path.length(g2016)
## [1] 3.272269

The average path length at 2019 is lower than 2018 and 2017. The lower the average length, the higher the cohesion.

5- Components ratio:

it measures the cohesion based on the number and size of the components in the graph. Components are the various disconnected parts inside the network. As the network structure may be dense but the network components may be isolated from each other, it is useful to use components ratio (CR). CR = c-1/n-1, c= number of components, n = number of nodes. The lower the components ratio, the higher the cohesion.

#2019
c <- components(g2019)$no #get components number
cr <- (c-1)/ (nrow(n) - 1) #get components ratio
cr
## [1] 0.0625
#2018
c <- components(g2018)$no
cr <- (c-1)/ (nrow(n) - 1)
cr
## [1] 0.0625
#2017
c <- components(g2017)$no
cr <- (c-1)/ (nrow(n) - 1)
cr
## [1] 0.0625
#2016
c <- components(g2016)$no
cr <- (c-1)/ (nrow(n) - 1)
cr
## [1] 0.0625

The component ratio is the same for the four years.

Now we will identify identities in the network using K core method.

K core:

K core is a maximal group of entities, all of which are connected to at least k other entities in the group.

First we will create object graph from the whole edgelist

e <- e[, c(2,3,4)]
g <- graph.data.frame(e,vertices = n ,directed = FALSE)

Now we will get coreness scores for each author. We have 113 authors.

coreness(g)
##            Aksoy, S. Alfaro-Murillo, J.A.         Altice, F.L. 
##                   12                    7                   13 
##        Annamalai, A.           Arnold, L.           Bell, M.L. 
##                    2                    2                    6 
##         Brault, M.A.           Bucala, R.        Canarie, M.F. 
##                    5                   10                    1 
##        Canavan, M.E.         Cappello, M.        Chawarski, M. 
##                    7                    7                   12 
##             Chen, K.             Chen, L.             Chen, X. 
##                    4                    4                   12 
##         Childs, J.E.         Cleary, P.D.            Cohen, T. 
##                   12                    7                    8 
##       Cudahy, P.G.T.          Curry, L.A.          Davis, J.L. 
##                    3                    7                    2 
##          Desai, M.M.           Dubrow, R.           Fikrig, E. 
##                   11                    3                   10 
##             Fish, D.      Forsyth, B.W.C.      Friedland, G.H. 
##                   10                    2                   13 
##        Galvani, A.P.    Godri-Pollitt, K.      Gonsalves, G.S. 
##                   12                    2                    7 
##       Gonzalez, A.L.             Grey, M.       Grubaugh, N.D. 
##                    3                   12                    2 
##         Hawley, N.L.           Heimer, R.        Holford, T.R. 
##                    5                   13                   17 
##            Hsieh, E.        Humphries, D.        Iennaco, J.D. 
##                    2                    7                    2 
##        Iheanacho, T.         Inhorn, M.C.           Jordan, A. 
##                    4                    1                    3 
##        Kennedy, H.P.          Kershaw, T.        Khoshnood, K. 
##                   10                   13                    7 
##             Ko, A.I.            Kurth, A.       Leaderer, B.P. 
##                   12                    3                   17 
##        Leckman, J.F.           Levy, B.R.         Lipska, K.J. 
##                    6                    6                    4 
##               Ma, S.        Magriples, U.           Makuch, R. 
##                   16                   13                    2 
##         Mamoun, C.B.    McMahon-Pratt, D.       McNamara, R.L. 
##                    1                    2                    1 
##         Miller, A.M.           Mowafi, H.      Münstermann, L. 
##                    1                    2                    1 
##              Nam, S.       Nelson, L.R.E.         Ngaruiya, C. 
##                   11                    1                    2 
##       Niccolai, L.M.      Nunez-Smith, M.          Ogbuagu, O. 
##                   13                    7                    5 
##           Omer, S.B.      Pachankis, J.E.         Paintsil, E. 
##                    3                    7                    8 
##        Paltiel, A.D.     Panter-Brick, C.           Parikh, S. 
##                    8                    3                    3 
##  Perez-Escamilla, R.      Pettigrew, M.M.         Pitzer, V.E. 
##                    7                    5                    8 
##        Ponguta, L.A.         Portillo, C.          Rabin, T.L. 
##                    3                    3                    6 
##         Rastegar, A.          Risch, H.A.      Rohrbaugh, R.M. 
##                    5                    3                    4 
##      Rosenheck, R.A.      Ryan-Krause, P.         Sadler, L.S. 
##                    4                    1                   10 
##       Saltzman, W.M.      Schlesinger, M.   Schottenfeld, R.S. 
##                    2                    1                   12 
##   Schulman-Green, D.       Schwartz, J.I.           Shenoi, S. 
##                    6                    6                   13 
##         Shepherd, J.          Sheth, S.S.       Sindelar, J.L. 
##                    2                    1                    1 
##       Skonieczny, M.       Spiegelman, D.          Spudich, S. 
##                    2                    3                    1 
##   Talbert-Slagle, K.          Tschudi, C.         Vasiliou, V. 
##                    7                    1                    6 
##          Vázquez, M.        Vermund, S.H.         Vinetz, J.M. 
##                    2                   10                    8 
##           Vlahov, D.     Weinberger, D.M.          Weiss, B.L. 
##                    6                    8                   12 
##          White, M.A.       Whittemore, R.     Wickersham, J.A. 
##                    2                   12                   11 
##         Wunder, E.A.         Yaesoubi, R.            Zhang, H. 
##                   12                    8                   12 
##            Zhang, Y.             Zhao, H. 
##                   17                   12

Let’s get a summary of coreness scores.

summary(coreness(g))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   6.000   6.434  10.000  17.000

The lowest core is 1 and the highest is 17.

Let’ visualize the network to see which departments are at the same entity. We will color the nodes with department color and label them with K core number. Department color: Public Health = yellow. Psychiatry = violet. Pediatrics = red. Internal Medicine(IM) = blue. Nursing = grey. Anthropolgy = brown. Obs&gyn = green. Emergency Medicine(EM) = Khaki

kcore <- coreness(g)    # Extract k-cores as a data object.
V(g)$core <- kcore      # Add the cores as a vertex attribute
V(g)$label <- V(g)$Department
plot.igraph(g, vertex.color=V(g)$color, vertex.label = V(g)$core,vertex.label.cex = 0.6,vertex.label.font = 2, vertex.size = 10)

As we can see, Public Health in yellow color is included in all enetities (connected or core and isolated or periphery). Emergency Medicine in khaki color is in periphery with K core 2. Psychiatry in violet color in semiperiphery and core with core scores (2,3,4,12). IM with blue color is distributed in periphery, semiperiphery and core. The same is Nursing in grey color. Prdiatrics(red) is distributed in periphery and semiperiphery. Obs&gyn (green), one in the periphery (core 1) and one in the core (core 13). Last, Anthropology is in the periphery with core scores 1, 3.

We can conclude that Public Health, IM and Nursing are connected to all entities. EM and anthropology are at the isolated communities. The other specialties are in between.