Trying to Create My Network

Documentation of my efforts to try and create an author network from my TaD articles.

Lissie Bates-Haus, Ph.D. https://github.com/lbateshaus (U Mass Amherst DACSS MS Student)https://www.umass.edu/sbs/data-analytics-and-computational-social-science-program/ms
2022-04-17

My process of collecting articles has been document on my Rpubs.

I now have 122 articles stored as pdfs from six journals related to experimental reserach in the field of Political Science. These articles are also documented in an EndNotes library.

What I really do not want to do is code the whole thing by hand.

These are the decisions I’ve made so far about network structure.

Nodes will be authors. Edges will be a tie if they have co-authored an article together, but this will be undirected. My reasoning for this is, for some articles, “First Author” position is meaningful For some it’s “Last Author.” For some, it’s a coin toss or alphabetical and not important.

Edge Attributes I am considering:

I am not planning to include author affiliatin because I am more intersted in who is publishing together and where they are publishing than where these authors are coming from.

First attempt:

I exported all of my citations as a txt file, in the Author-Date format. I imported it into excel (multiple trials) to figure out how to get it in deliminated in some way. It’s imperfect. To say the least.

Okay my first understanding is that this is the wrong format because multiple authors are listed in different configurations (Last, F vs F. Last). So now I need to explore formats to see what the best one is.

Right now, I’m going to work with APA. I’ve exported all my citations as a txt file and opened it Excel using a space as the delimter, figuring it might be easier to combine columns than to separate them.

Next up: google “how to merge columns in excel without losing data” = CONCAT

I spent several hours working on cleaning up the data. I consolidated author names in LastName, F.I. M.I. format, fixed any missing years of publication, concatenated article titles and publication names.

I’m going experiment with different formats. First, I’m going to try and create adjacency matrix. First, I copied all author names into a new worksheet and paired down so that each name appears only once, alphabetical by LastName. If I had questions, I went back to original EndNotes entries, and if needed, pulled up the original article. This left me with a total 229 unique authors.

Now I’m wondering if the identifier should be the article title? I think I need to talk to Meredith to figure out how to structure this!

4/17/2022

I decided to try and build a weighted adjacency matrix from authors/coauthors (a lower diagonal matrix since we are not doing a directed network). I enlisted my husband to help me with this very tedious project. About 10 minutes into it, he noted it would be very easy to build this matrix in Python and agreed to do for me in a Python colab on my Ethics repo.

It wasn’t quite as easy as he thought it would be, but we did end up with an upper matrix with co-authors connected by the number of times they’ve authored together (weight edges). Authors who are unconnected are connected with a zero.

Read in CSV as a matrix:

setwd("~/DACSS/697E Network Analysis/Final Project")
library(readr)

authors_df <- read_csv("coauthorMatrix.csv")
coauthorMatrix <- as.matrix(read_csv("coauthorMatrix.csv"))
ethics_authors <- read_csv("ethics_authors.csv")


authors_adj <- coauthorMatrix[ , -1]

dim(coauthorMatrix)
[1] 230 231
dim(authors_adj)
[1] 230 230

Create igprah object from matrix (this is a weighted, undirected, upper matrix):

library(igraph)
library(statnet)

authors.ig <- graph_from_adjacency_matrix(authors_adj, mode = "upper", weighted = TRUE)

Get some basic descriptives and check to see if it read in as weighted and undirected:

vcount(authors.ig)
[1] 230
ecount(authors.ig)
[1] 274
is_weighted(authors.ig)
[1] TRUE
is_directed(authors.ig)
[1] FALSE
igraph::is_connected(authors.ig)
[1] FALSE
vertex_attr_names(authors.ig)
[1] "name"
edge_attr_names(authors.ig)
[1] "weight"

[Note: I’ve been trying to figure out how to create a network object for statnet as well, but it keeps failing.]

plot(authors.ig)

Initial plot is terrible! Let’s look at some network structure:

igraph::components(authors.ig)
$membership
              Adida, C.                Aida, M. 
                      1                       2 
         Alvarez, R. M.         Andersen, S. C. 
                      3                       4 
        Anderson, S. E.           Andersson, K. 
                      5                       6 
              Annan, J.           Arceneaux, K. 
                      7                       8 
              Arias, E.              Atwell, P. 
                      9                      10 
           Avdeenko, A.         Badrinathan, S. 
                     11                      12 
             Bailey, K.         Baldassarri, D. 
                     13                      14 
            Baldwin, K.               Balán, P. 
                     15                       9 
               Barr, A.             Basedau, M. 
                     16                      17 
              Beath, A.            Bhandari, A. 
                     18                       9 
         Biggers, D. R.              Binder, M. 
                     19                      13 
              Blair, G.            Blair, R. A. 
                     20                      21 
           Blattman, C.          Bloeser, A. J. 
                      7                      22 
            Boas, T. C.              Bolsen, T. 
                     23                      24 
           Bradburn, N.              Brader, T. 
                     25                      26 
           Brancati, D.            Brierley, S. 
                     27                       1 
       Broockman, D. E.         Buntaine, M. T. 
                     26                       5 
          Butler, D. M.            Chang, H. I. 
                     28                      29 
    Cheeseman, N. I. C.                Chen, J. 
                     30                      31 
            Choi, D. D.               Chong, A. 
                     32                      33 
           Christia, F.              Condon, M. 
                     18                      13 
            Coppock, A.           Corazzini, L. 
                     13                      34 
               Dahl, M.             Daniels, B. 
                     35                       5 
        De Vries, C. E.               Demel, S. 
                     28                      16 
            Diamond, L.          Dinesen, P. T. 
                     25                      35 
            Doherty, D.          Dowling, C. M. 
                     19                      19 
           Driscoll, J.            Dynes, A. M. 
                     23                      28 
        Einstein, K. L.               Ekman, J. 
                     36                       6 
             Endres, K.          Enikolopov, R. 
                     13                      18 
            Enos, R. D.           Faller, J. K. 
                     37                      10 
            Fang, A. H.              Farrer, B. 
                     19                      13 
          Fearon, J. D.          Ferraro, P. J. 
                     38                      24 
         Findley, M. G.             Fishkin, J. 
                     39                      25 
               Foos, F.              Fowler, A. 
                     40                      37 
             Freire, D.            Fujiwara, T. 
                     41                      42 
            Gaikwad, N.             Galasso, V. 
                     43                      44 
            Galdino, M.         Gell-Redman, M. 
                     41                      45 
          Gerber, A. S.               Giger, N. 
                     19                      46 
            Gilardi, F.         Gilligan, M. J. 
                     40                      11 
           Glick, D. M.              Gobien, S. 
                     36                      17 
              Goist, M.               Gooch, A. 
                     47                      19 
           Gottlieb, J.            Green, D. P. 
                      1                      13 
           Grose, C. R.            Grossman, G. 
                     48                      14 
              Hager, A.                Haim, D. 
                     49                      50 
                Han, H.       Hassell, H. J. G. 
                     51                      28 
          Henderson, G.           Hendry, D. J. 
                     51                      19 
             Hensel, L.              Hermle, J. 
                     49                      49 
             Hicken, A.          Hidalgo, F. D. 
                     50                      23 
            Hill, S. J.           Hjortskov, M. 
                     19                       4 
           Hoffmann, L.          Holbein, J. B. 
                     17                      52 
           Huber, G. A.           Hughes, D. A. 
                     19                      45 
          Humphreys, M.                Imai, K. 
                     38                      20 
           Jakobsen, M.           Jensen, N. M. 
                      4                      39 
               John, P.            Kalla, J. L. 
                     40                      26 
           Karim, S. M.        Karpowitz, C. F. 
                     21                      53 
            Kern, F. G.               Kogan, V. 
                     47                      13 
         Kostadinov, L.             Kousser, T. 
                     40                      13 
             Kramon, E.           Krasno, J. S. 
                      1                      13 
               Kube, S.   Kyhse-Andersen, J. H. 
                     34                      54 
               Lanz, S.          Larimer, C. W. 
                     46                      13 
           Larreguy, H.           Larson, J. M. 
                      9                      55 
             Leiras, M.          Levendusky, M. 
                      3                      48 
              Levin, I.            Lewis, J. I. 
                      3                      55 
      León-Ciliotta, G.                 Liu, M. 
                     33                       5 
             Lloren, A.           Loewen, P. J. 
                     56                      42 
             Lundin, S.               Lyall, J. 
                      6                      20 
    López-Moctezuma, G.        MacKenzie, M. K. 
                     42                      42 
              Magni, G.             Malesky, E. 
                     57                      58 
           Malhotra, N.             Mann, C. B. 
                     48                       8 
           Margalit, Y.             Marinov, N. 
                     48                      40 
           Marshall, J.         Maréchal, M. A. 
                      9                      34 
         Matland, R. E.        McAndrews, J. R. 
                     59                      42 
       McClendon, G. H.           McConnell, C. 
                      1                      48 
           McCurley, C.             Melo, M. A. 
                     22                      23 
           Meredith, M.          Michelitch, K. 
                     19                      14 
        Mignozzetti, U.            Miles, M. R. 
                     41                      28 
             Miller, L.          Miranda, J. J. 
                     16                      24 
          Mondak, J. J.           Monson, J. Q. 
                     22                      53 
           Morse, B. S.            Moynihan, D. 
                     21                      54 
            Mummolo, J.           Murray, G. R. 
                     60                      59 
          Nannicini, T.           Nathan, N. L. 
                     44                      10 
             Nellis, G.        Nickerson, D. W. 
                     43                       8 
             Nicolò, A.          Nielson, D. L. 
                     34                      39 
            Nunnari, S.               Nyhan, B. 
                     44                      61 
           Ofosu, G. K.            Olsen, A. L. 
                      1                      54 
                Pan, J.        Panagopoulos, C. 
                     31                      13 
Parks Van Houweling, R.             Pe Lero, C. 
                     48                      42 
            Peiffer, C.           Peisakhin, L. 
                     30                      29 
            Persson, M.            Peterson, E. 
                      6                      60 
           Poertner, M.             Pomares, J. 
                     32                       3 
      Ponce de Leon, Z.              Porter, E. 
                     57                      13 
           Prediger, S.           Preece, J. R. 
                     17                      53 
           Querubín, P.           Ravanilla, N. 
                      9                      50 
         Reardon, C. E.             Reifler, J. 
                     19                      61 
          Rhinehart, S.           Rivera, M. U. 
                     62                      45 
        Robinson, A. L.              Rogers, T. 
                     63                       2 
               Roth, C.                Roza, V. 
                     49                      33 
            Rozenas, A.            Rubenson, D. 
                     29                      42 
            Ryan, T. J.            Sambanis, N. 
                     26                      32 
           Sanovich, S.              Scacco, A. 
                     29                      64 
     Schimmelfennig, F.            SchiØLer, M. 
                     40                      35 
            Schuler, P.        Schwam-Baird, M. 
                     58                      13 
               Seim, B.          Sharman, J. C. 
                     63                      39 
              Shayo, M.            Singh, S. P. 
                     48                      65 
                Siu, A.               Solaz, H. 
                     25                      28 
        Stoddard, O. B.             Taussig, M. 
                     53                      58 
           Townsley, J.          Tran, A. N. H. 
                     66                      58 
              Ubeda, P.            Valdivia, M. 
                     16                      33 
               Vega, G.          Wantchekon, L. 
                     33                      42 
          Warren, S. S.        Weinstein, J. M. 
                     64                      38 
          Werfel, S. H.            White, A. R. 
                     67                      10 
            Wood, A. K.                  Xu, Y. 
                     48                      31 
           Young, L. E.             Zelizer, A. 
                     68                      13 
         Zetterberg, P.               Zhang, B. 
                      6                       5 
        de Rooij, E. A.            de Vries, C. 
                     40                      46 

$csize
 [1]  6  2  4  3  5  5  2  3  6  4  2  1 15  3  1  4  4  3 11  3  3  3
[23]  4  3  4  4  1  6  4  2  3  3  5  4  3  2  2  3  4  7  3  8  2  3
[45]  3  3  2  8  4  3  2  1  4  3  2  1  2  4  2  2  2  1  2  2  1  1
[67]  1  1

$no
[1] 68
igraph::components(authors.ig)$csize
 [1]  6  2  4  3  5  5  2  3  6  4  2  1 15  3  1  4  4  3 11  3  3  3
[23]  4  3  4  4  1  6  4  2  3  3  5  4  3  2  2  3  4  7  3  8  2  3
[45]  3  3  2  8  4  3  2  1  4  3  2  1  2  4  2  2  2  1  2  2  1  1
[67]  1  1

Before going further, I need to think a bit about how this network is structured. This is a weighted, undirected network with isolates (and should have no loops), with 230 nodes and 274 edges. Edges are weighted by the number of times authors have co-published.

authors.ig
IGRAPH f4ef0ff UNW- 230 274 -- 
+ attr: name (v/c), weight (e/n)
+ edges from f4ef0ff (vertex names):
 [1] Adida, C.      --Gottlieb, J.    
 [2] Adida, C.      --Kramon, E.      
 [3] Adida, C.      --McClendon, G. H.
 [4] Aida, M.       --Rogers, T.      
 [5] Alvarez, R. M. --Leiras, M.      
 [6] Alvarez, R. M. --Levin, I.       
 [7] Alvarez, R. M. --Pomares, J.     
 [8] Andersen, S. C.--Hjortskov, M.   
+ ... omitted several edges

Try some different plots to see if I can make it more palatable:

l <- layout_with_fr(authors.ig)

plot(authors.ig,layout=layout_with_kk, frame.width = 3)

Okay, I really don’t know how to do this, so I’m going to pause and talk to Meredith!