This analysis was done just out of curiosity and is not intended to be taken seriously.
This document shows analysis of publicly available data on coauthor relationships of faculty members at the University of Maryland Center for Environmental Science (UMCES). UMCES departments are: Appalachian Laboratory (AL), Chesapeake Biological Laboratory (CBL), Horn Point Laboratory (HPL), and Institute of Marine and Environmental Technology (IMET). Full code for these results is available on GitHub, but data folders with the faculty lists are kept offline.
Below is an interactive graph of UMCES faculty coauthorship, with details further below.
The UMCES faculty list is taken as of 2020-02-26. If you are one of the faculty and wish your name to be removed from analysis just let me know.
The coauthorship network is based on data extracted from Google Scholar on 2020-10-12. Nodes of the network are UMCES faculty members; an edge between faculty members A and B exists if A is listed (at least once) among authors of publications in B’s profile or vice versa.
Why Google Scholar (advantages):
Possible problems (disadvantages):
The plots below show “within-UMCES-collaborativeness” by counting how many coauthors from UMCES each faculty member has (that is, node degree in the coauthorship network). Red points represent faculty members without Google Scholar account (see data description above).
Each faculty member had a different chance to establish collaborations within UMCES. For example, junior faculty members are likely to have fewer collaborations, and the next plot shows it.
Node degree is one of many measures of node centrality (sort of importance in a network context). Another common measure is betweenness centrality based on the the number of shortest paths in a network that pass through the specific node (in other words, how often the specific node appears in an arbitrage position).
And repeat with grouping by faculty rank.
Below is number of publications retrieved from Google Scholar for each faculty member.
Here investigate whether network clusters match the formal affiliation of faculty to different departments. The network clusters represent communities (color rectangles on the clustering dendrogram below) densely connected by the coauthorship links.
From several readily available algorithms, fast greedy algorithm was used, which identified 9 communities.
Knowing the actual affiliations of the faculty, the clustering can be checked using several evaluation criteria, one of which is purity (Section 16 in Manning, Raghavan, and Schutze 2008):
\[Purity(\Omega,C) = \frac{1}{N}\sum_{k}\max_{j}|\omega_k\cap c_j|,\]
where \(\Omega=\{\omega_1,\ldots,\omega_K \}\) is the set of identified clusters and \(C=\{c_1,\ldots,c_J\}\) is the set of classes. That is, within each class \(j=1,\ldots,J\) find the size of the most populous cluster from the \(K-j\) unassigned clusters. Then, sum together the \(\min(K,J)\) sizes found and divide by the sample size \(N\).
When classes represent the laboratory affiliation (that was not used in clustering) and clusters are the communities obtained by tracking coauthorship links, \(Purity =\) 0.68.
Below is a matrix showing percentage distribution of within-UMCES collaborations and answers the question: considering collaborators from UMCES, what is the proportion of collaborators from the home lab and other labs?
| AL | CBL | HPL | IMET | Total | |
|---|---|---|---|---|---|
| AL | 72.3 | 2.3 | 25.4 | 0.0 | 100 |
| CBL | 0.5 | 81.2 | 14.4 | 3.9 | 100 |
| HPL | 7.0 | 19.9 | 68.8 | 4.2 | 100 |
| IMET | 0.0 | 14.2 | 11.0 | 74.8 | 100 |
Example inference from the table above: from all UMCES collaborators of AL authors, 72.3% are from AL, 2.3% are from CBL, 25.4% are from HPL, and 0% are from IMET.
| AL | CBL | HPL | IMET | |
|---|---|---|---|---|
| AL | 13.3 | 0.4 | 4.7 | 0.0 |
| CBL | 0.2 | 35.5 | 6.3 | 1.7 |
| HPL | 2.6 | 7.4 | 25.5 | 1.6 |
| IMET | 0.0 | 3.3 | 2.6 | 17.6 |
Example inference from the table above: for 100 publications from AL, there are on average 13.3 collaborators from AL, 0.4 collaborators from CBL, 4.7 collaborators from HPL, and 0 collaborators from IMET.
Please, remember about the data limitations, such as using only about 6 first authors in multi-author publications and absence of Google Scholar accounts for some faculty members.
It’s been fun.
Manning, C. D., P. Raghavan, and H. Schutze. 2008. Introduction to Information Retrieval. New York: Cambridge University Press.