Datacamp audition

Minoo Ashtiani jafarilab.com, Pasteur Institute of Iran

January 23, 2018

Centrality Analysis

  • What is centrality analysis?
  • Which node is the most important and why?
  • Which centrality measure can best fit on your complex network topology?

Centrality measures

  • A function that assigns a numerical value to each vertex of a network is called centrality
  • Let \(G = (V, E)\) be a directed or undirected graph. A function \(C: V\rightarrow R\) is called a centrality.

Centrality measure types

Generate a simple random graph:

library(igraph)
# n = number of nodes, m = the number of edges
erdos.gr <- sample_gnm(n=100, m=200) 
erdos.gr
## IGRAPH f8c91a6 U--- 100 200 -- Erdos renyi (gnm) graph
## + attr: name (g/c), type (g/c), loops (g/l), m (g/n)
## + edges from f8c91a6:
##  [1]  1-- 7  3--11  2--12  8--12  8--15  7--16 14--20 17--21 14--22  6--23
## [11]  8--23  4--24  5--24  3--26 23--28  8--29 11--30 18--30  3--31 18--31
## [21] 16--32 11--33 15--34 13--35 12--36 11--37  4--38 17--38 20--39  4--40
## [31] 14--40 24--40 36--40  5--41 28--41  4--42 21--42 32--42  1--43 29--45
## [41] 38--45 17--47 19--47 14--48 28--48  4--49  9--49 19--49  3--50 10--50
## [51] 12--50 15--51 18--51 19--51  6--52 28--52  8--53 27--53 40--53 31--54
## [61] 35--55 47--55  4--56 16--56 46--56  8--57 34--57 21--58 31--58 45--58
## [71] 23--59 34--59 17--60 34--60  5--61 26--61 30--61 37--62 54--62 55--62
## + ... omitted several edges

Local centrality measures

  • Degree centrality
degree.cent <- degree(erdos.gr, mode = "all")
degree.cent
##   [1] 2 1 4 7 6 4 4 8 3 4 5 7 3 5 5 5 4 7 3 4 3 2 5 7 0 5 2 6 4 4 6 4 2 5 4
##  [36] 5 2 5 1 5 4 4 1 3 6 3 3 5 3 9 5 4 4 2 3 4 3 5 2 5 3 7 5 1 3 0 5 3 7 3
##  [71] 5 5 4 0 3 1 1 2 3 8 7 4 8 3 3 6 4 1 5 2 6 3 3 7 3 5 2 4 4 5

Global centrality measures

  • Closeness centrality
closeness.cent <- closeness(erdos.gr, mode="all")
closeness.cent
##   [1] 0.0013888889 0.0014925373 0.0016420361 0.0017123288 0.0016638935
##   [6] 0.0015503876 0.0015948963 0.0017543860 0.0015479876 0.0016339869
##  [11] 0.0016103060 0.0017391304 0.0015698587 0.0016694491 0.0016806723
##  [16] 0.0016835017 0.0016000000 0.0017361111 0.0015337423 0.0015576324
##  [21] 0.0015455951 0.0015197568 0.0017064846 0.0017211704 0.0001010101
##  [26] 0.0016891892 0.0014903130 0.0017094017 0.0016778523 0.0016501650
##  [31] 0.0017211704 0.0016286645 0.0014771049 0.0016366612 0.0015649452
##  [36] 0.0017123288 0.0014771049 0.0016694491 0.0013568521 0.0016977929
##  [41] 0.0016233766 0.0016077170 0.0012269939 0.0015455951 0.0017152659
##  [46] 0.0015772871 0.0014970060 0.0017543860 0.0015337423 0.0017543860
##  [51] 0.0016977929 0.0015898251 0.0016286645 0.0015432099 0.0015360983
##  [56] 0.0016420361 0.0015576324 0.0016611296 0.0015455951 0.0016977929
##  [61] 0.0015873016 0.0016863406 0.0016891892 0.0014705882 0.0016077170
##  [66] 0.0001010101 0.0016583748 0.0015576324 0.0017271157 0.0015923567
##  [71] 0.0016447368 0.0016260163 0.0016051364 0.0001010101 0.0016077170
##  [76] 0.0013568521 0.0015037594 0.0015243902 0.0015360983 0.0016835017
##  [81] 0.0017094017 0.0015873016 0.0017699115 0.0016233766 0.0016286645
##  [86] 0.0017605634 0.0016447368 0.0013513514 0.0016366612 0.0015600624
##  [91] 0.0016750419 0.0016447368 0.0015847861 0.0017667845 0.0015600624
##  [96] 0.0016129032 0.0015600624 0.0016420361 0.0016339869 0.0016666667

Which centrality measure is appropriate?

  • The CINNA (Central Informative Nodes in Network Analysis) is an R package for computing, analyzing and comparing centrality measures.

How does it work?

library(CINNA)
data("zachary")
zachary
## IGRAPH 455c916 U--- 34 78 -- 
## + attr: id (v/n)
## + edges from 455c916:
##  [1]  1-- 2  1-- 3  2-- 3  1-- 4  2-- 4  3-- 4  1-- 5  1-- 6  1-- 7  5-- 7
## [11]  6-- 7  1-- 8  2-- 8  3-- 8  4-- 8  1-- 9  3-- 9  3--10  1--11  5--11
## [21]  6--11  1--12  1--13  4--13  1--14  2--14  3--14  4--14  6--17  7--17
## [31]  1--18  2--18  1--20  2--20  1--22  2--22 24--26 25--26  3--28 24--28
## [41] 25--28  3--29 24--30 27--30  2--31  9--31  1--32 25--32 26--32 29--32
## [51]  3--33  9--33 15--33 16--33 19--33 21--33 23--33 24--33 30--33 31--33
## [61] 32--33  9--34 10--34 14--34 15--34 16--34 19--34 20--34 21--34 23--34
## [71] 24--34 27--34 28--34 29--34 30--34 31--34 32--34 33--34

Proper centrality measures

pr_cent<-proper_centralities(zachary)
##  [1] "subgraph centrality scores"                      
##  [2] "Topological Coefficient"                         
##  [3] "Average Distance"                                
##  [4] "Barycenter Centrality"                           
##  [5] "BottleNeck Centrality"                           
##  [6] "Centroid value"                                  
##  [7] "Closeness Centrality (Freeman)"                  
##  [8] "ClusterRank"                                     
##  [9] "Decay Centrality"                                
## [10] "Degree Centrality"                               
## [11] "Diffusion Degree"                                
## [12] "DMNC - Density of Maximum Neighborhood Component"
## [13] "Eccentricity Centrality"                         
## [14] "eigenvector centralities"                        
## [15] "K-core Decomposition"                            
## [16] "Geodesic K-Path Centrality"                      
## [17] "Katz Centrality (Katz Status Index)"             
## [18] "Kleinberg's authority centrality scores"         
## [19] "Kleinberg's hub centrality scores"               
## [20] "clustering coefficient"                          
## [21] "Lin Centrality"                                  
## [22] "Lobby Index (Centrality)"                        
## [23] "Markov Centrality"                               
## [24] "Radiality Centrality"                            
## [25] "Shortest-Paths Betweenness Centrality"           
## [26] "Current-Flow Closeness Centrality"               
## [27] "Closeness centrality (Latora)"                   
## [28] "Communicability Betweenness Centrality"          
## [29] "Community Centrality"                            
## [30] "Cross-Clique Connectivity"                       
## [31] "Entropy Centrality"                              
## [32] "EPC - Edge Percolated Component"                 
## [33] "Laplacian Centrality"                            
## [34] "Leverage Centrality"                             
## [35] "MNC - Maximum Neighborhood Component"            
## [36] "Hubbell Index"                                   
## [37] "Semi Local Centrality"                           
## [38] "Closeness Vitality"                              
## [39] "Residual Closeness Centrality"                   
## [40] "Stress Centrality"                               
## [41] "Load Centrality"                                 
## [42] "Flow Betweenness Centrality"                     
## [43] "Information Centrality"

Centrality calculation & recognition of most informative centrality measures

calculate_centralities(zachary, include = pr_cent[1:5])%>%
  pca_centralities(scale.unit = TRUE)

…