This week we are exploring properties of the entire network. I am continuing the analysis with Wikipedia page-link frequency (citations ) compared to the collaboration distance and erdos number for a small sample of authors. A few more authors have been added to the list, but this is still a very small-world network. Also, it is probably skewed because not all the authors had entries in the database I used to collect collaboration distance/erdos number. I am still looking for an api or rest service to retrieve this data, so if you know of any R, python lib or rest service, let me know.
A sample of eleven sixteen Network Scientists were chosen at random from a list in Wikipedia1. An independent sample was collected for each author using the name as the seed value to a maximum depth of three for each iteration. The total number of links is the citation frequency of the author.
Erdos number 2 3 4.
The resulting dataset contains an originating node (the author’s name), the Erdos number (associated with the originating node) and the total citations (page-links) for the author.
Collaboration distance.5.
Two types of reciprocity are calculated. First the correlation:
## [1] 0.1121911
The correlation measure may be interpreted as the net tendency for edges of similar relative value (with respect to the mean edge value) to occur within the same dyads. If all all edge values are identical then the correlation reciprocity should be 1 by definition. The reciprocity correlation is very low, which I suspect, could be due to the wide range of the edge values:
Maximum edge value
## [1] 480
Minimum edge value
## [1] 0
Testing the dyadic reciprocity:
# Reciprocity
grecip(citation_erdos_mtrx, citation_coll_mtrx, measure ="dyadic")
## Mut
## 0.7416667
The dyadic reciprocity of the graph is the proportion of dyads which are symmetric. Since this is the basis of the graph, it should be at least 50%. The number is much higher, which is most likely due to the small pool that the samples are chosen from within a closely related set of data.
Transitivity
## [1] 0.8395311
Centrality
Key-player has been calculated using centrality measures, this was done using both eigenvalue and degree, both resulted in the same key player who is surprisingly is Olaf Sporns. Visually node #71 looks much more important than the others. Olaf had a lower citation count and collaborated less than Stephen P. Borgatti or Albert-László Barabási in this sample. There is some other, as yet to be discovered, factors at play.
# Eigenvalue centrality
sna::evcent(citation_erdos_mtrx, citation_coll_mtrx, auth_cit_erdos$erdos, g=1)
indx <- which( sna::evcent(citation_erdos_mtrx, citation_coll_mtrx, auth_cit_erdos$erdos)==max(sna::evcent(citation_erdos_mtrx, citation_coll_mtrx, auth_cit_erdos$erdos)))
auth_cit_erdos$author[[indx]]
auth_cit_erdos$author[[indx]]
## [1] "Olaf_Sporns"
# Degree centrality
deg_indx <- which(sna::degree(citation_erdos_mtrx, citation_coll_mtrx, auth_cit_erdos$erdos) == max(sna::degree(citation_erdos_mtrx, citation_coll_mtrx, auth_cit_erdos$erdos)))
auth_cit_erdos$author[[deg_indx]]
auth_cit_erdos$author[[deg_indx]]
## [1] "Olaf_Sporns"
# auth_cit_erdos
# actor_collab[]
g<- igraph::graph.data.frame(d =c(auth_cit_erdos, cit_collab, cit_erdos), directed=FALSE, vertices = NULL)
plot(g)
Removing Olaf:
For this chose reciprocity for an invariant. This seems to be highly dependent upon Olaf. However, since this is not a highly disconnected graph removing a key player doesn’t have a great impact.
# Reciprocity
grecip(citation_erdos_mtrx, citation_coll_mtrx, measure ="correlation")
## [1] 0.1121911
# Reciprocity
grecip(citation_erdos_mtrx, citation_coll_mtrx, measure ="dyadic")
## Mut
## 0.7416667
g<- igraph::graph.data.frame(d =c(new_auth_cit_erdos, cit_collab[c(2:13, 15,16), c(2:13, 15,16)], cit_erdos[c(2:13, 15,16),c(2:13, 15,16) ]), directed=FALSE, vertices = NULL)
plot(g)
ANOVA, in lieu of t-test
TOOLS>STATISTICS>ANOVA
--------------------------------------------------------------------------------
Dependent variable: "C:\Users\dev1\MyData\data\Lab08\collab.##h" Col 1
Independent variable: "C:\Users\dev1\MyData\data\Lab08\erd.##h" Col 1
# of permutations: 5000
Random seed: 17568
ANALYSIS OF VARIANCE
Source DF SSQ F-Statistic Significance
============== ============== ============== ============== ==============
Treatment 4 195.00 0.5216 0.7229
Error 11 1028.00
Total 15 1223.00
R-Square/Eta-Square: 0.159
----------------------------------------
Running time: 00:00:01
Output generated: 21 Oct 15 22:58:43
UCINET 6.587 Copyright (c) 1992-2015 Analytic Technologies
Reciprocity doesn’t appear to be a strong indicator for this analysis because almost all actors (author) are in dyadic relationships in a small world network. Additionally it’s mainly composed of authors who are, for the most part, working in a related field. As such, this is a co-citation matrix with a proportionality constant based on the collaboration number and the erdos number. Therefore, a high reciprocity should be expected for the dyadic similarity. The low result for the correlational reciprocity is most likely attributable the the wide range of edge values printed in the max in min above.
There are a couple of triads, but the transitivity is very sensitive to NA values. If these are altered the transitivity becomes valid, otherwise it’s NA. A larger sample size would probably exhibit more resiliency.
When trying the t-test, I kept getting an error and suggested using the ANOVA instead. I’ve included it here to fulfill that requirement. Unsurprisingly, it indicates that the results aren’t by chance.
Most of the measurement, although interesting are not very pertinent to the citation/ collaboration network analysis with the exception of key-player. That was an interesting result, since I wouldn’t expect the outcome based on a visual inspection, though it may be due to the unlabeled nodes. But I suspect that it has more to do with the small-network.
Removing a key player in a highly disconnected graph doesn’t have too much of an impact, as it would in a more highly connected graph it.
An Erdős number describes a person’s degree of separation from Erdős himself, based on their collaboration with him.↩
Erdős alone was assigned the Erdős number of 0 (for being himself), while his immediate collaborators could claim an Erdős number of 1, their collaborators have Erdős number at most 2, and so on.↩
Retrieved from http://www.ams.org/mathscinet/collaborationDistance.html↩
Retrieved from http://www.ams.org/mathscinet/collaborationDistance.html↩