Grateful Research: Assignment 4

Status & Eigenvector Centrality

Kristina Becvar
March 25, 2022

I am continuing to use the Grateful Dead song writing data set that I used in assignment 3 to examine co-writing links and centrality.

The data set consists of the links between co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled.

There are 26 songwriters that contributed to the songs played over the course of the Grateful Dead history, resulting in 26 nodes in the dataset.

There are a total of 183 (updated and still under review!) unique songs played, and the varies combinations of co-writing combinations are now represented in a binary affiliation matrix.

I have not weighted this version of the data; I am trying to build it from a binary affiliation matrix first, and hope to later add the number of times a given song was played live as a weight.

Loading the dataset and creating the network to begin this assignment:

gd_vertices <- read.csv("data/gd_nodes.csv", header=T, stringsAsFactors=F)
gd_affiliation <- read.csv("data/gd_affiliation_matrix.csv", row.names = 1, header = TRUE, check.names = FALSE)
gd_matrix <- as.matrix(gd_affiliation)

Inspecting the first 8 columns of the data structure in the affiliation matrix format:

dim(gd_matrix)
[1]  26 183
gd_matrix[1:10, 1:4]
               Alabama Getaway Alice D Millionaire Alligator Althea
Eric Andersen                0                   0         0      0
John Barlow                  0                   0         0      0
Bob Bralove                  0                   0         0      0
Andrew Charles               0                   0         0      0
John Dawson                  0                   0         0      0
Willie Dixon                 0                   0         0      0
Jerry Garcia                 1                   1         0      1
Donna Godchaux               0                   0         0      0
Keith Godchaux               0                   0         0      0
Gerrit Graham                0                   0         0      0

Now I can create the single mode network and examine the bipartite projection. After converting the matrix to a square adjacency matrix, I can look at the full matrix.

I can also call the adjacency matrix count for co-writing incidences between certain songwriters, such as between writing partners Jerry Garcia and Robert Hunter and between John Barlow and Bob Weir.

gd_projection <- gd_matrix%*%t(gd_matrix)
dim(gd_projection)
[1] 26 26
gd_projection[1:10, 1:4]
               Eric Andersen John Barlow Bob Bralove Andrew Charles
Eric Andersen              1           0           0              0
John Barlow                0          26           1              0
Bob Bralove                0           1           3              0
Andrew Charles             0           0           0              1
John Dawson                0           0           0              0
Willie Dixon               0           0           0              0
Jerry Garcia               0           0           0              0
Donna Godchaux             0           0           0              0
Keith Godchaux             0           0           0              0
Gerrit Graham              0           0           0              0
gd_projection["Jerry Garcia", "Robert Hunter"]
[1] 78
gd_projection["John Barlow", "Bob Weir"]
[1] 21

Now I will use this adjacency matrix to create both igraph and statnet network objects and take a look at their resulting feature.

This is a non-directed, unweighted dataset.

#Create Igraph and Statnet Objects

gd_network_ig <- graph.adjacency(gd_projection,mode="undirected") #igraph object
gd_network_stat <- network(gd_projection, directed=F, matrix.type="adjacency") #statnet object

#Inspect New Objects

print(gd_network_stat)
 Network attributes:
  vertices = 26 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 65 
    missing edges= 0 
    non-missing edges= 65 

 Vertex attribute names: 
    vertex.names 

No edge attributes
igraph::vertex_attr_names(gd_network_ig)
[1] "name"
igraph::edge_attr_names(gd_network_ig)
character(0)
head(V(gd_network_ig)$name)
[1] "Eric Andersen"  "John Barlow"    "Bob Bralove"   
[4] "Andrew Charles" "John Dawson"    "Willie Dixon"  
is_directed(gd_network_ig)
[1] FALSE
is_weighted(gd_network_ig)
[1] FALSE
is_bipartite(gd_network_ig)
[1] FALSE

Looking at the dyad/triad census info in igraph and statnet:

igraph::dyad.census(gd_network_ig)
$mut
[1] 738

$asym
[1] 0

$null
[1] -413
igraph::triad.census(gd_network_ig)
 [1] 1788    0  488    0    0    0    0    0    0    0  237    0    0
[14]    0    0   87
sna::dyad.census(gd_network_stat)
     Mut Asym Null
[1,]  65    0  260
sna::triad.census(gd_network_stat)
      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U
[1,] 1451   0 825    0    0    0    0    0    0    0 237    0    0
     120C 210 300
[1,]    0   0  87

Knowing this network has 26 vertices, I want to see if the triad census is working correctly by comparing the following data, which I can confirm it is here!

Show code
#possible triads in network
26*25*24/6
[1] 2600
Show code
sum(igraph::triad.census(gd_network_ig))
[1] 2600

Looking next at the global v. average local transitivity of the network in igraph and confirming global transitivity in statnet:

#network transitivity: statnet
gtrans(gd_network_stat)
[1] 0.5240964
#global clustering cofficient: igraph
transitivity(gd_network_ig, type="global")
[1] 0.5240964
#average local clustering coefficient: igraph
transitivity(gd_network_ig, type="average")
[1] 0.7755587

This transitivity tells me that the average local network transitivity is significantly higher than the global transitivity, indicating, again from my still naive network knowledge, that the overall network is generally more loose, and that there is a more connected sub-network.

Looking at the geodesic distance tells me that on average, I can confirm that the path length is just over 2, so on average, each node is two “stops” from each other on the geodisic path.

average.path.length(gd_network_ig,directed=F)
[1] 2.01

Getting a look at the components of the network comfirms that there are 2 components in the network, and 25 of the 26 nodes make up the giant component with 1 isolate.

names(igraph::components(gd_network_ig))
[1] "membership" "csize"      "no"        
igraph::components(gd_network_ig)$no 
[1] 2
igraph::components(gd_network_ig)$csize
[1] 25  1

Next I can get to looking at the network density, centrality, and centralization.

The network density measure: First with just the call “graph.density” and then with adding “loops=TRUE”. In igraph, I know that its’ default output assumes that loops are not included but does not remove them, which wwe had corrected with the addition of “loops=TRUE” per the course tutorials when comparing output to statnet. In this case, the statnet output is far different, so I am not sure what is happening with this aspect of the network.

graph.density(gd_network_ig, loops=TRUE)
[1] 2.102564
network.density(gd_network_stat)
[1] 0.2

The network degree measure: This gives me a clear output showing the degree of each particular node (songwriter). It is not suprising, knowing my subject matter, that Jerry Garcia is the highest degree node in this network as the practical and figurative head of the band. The other band members’ degree measures are not necessarily what I expected, though. I did not anticipate that his songwriting partner, Robert Hunter, would have a lower degree than band members Phil Lesh and Bob Weir. Further, I did not anticipate that the degree measure of band member ‘Pigpen’ would be so high given his early death in the first years of the band’s touring life.

Show code
igraph::degree(gd_network_ig)
  Eric Andersen     John Barlow     Bob Bralove  Andrew Charles 
              3              81              14               3 
    John Dawson    Willie Dixon    Jerry Garcia  Donna Godchaux 
              4               4             328              12 
 Keith Godchaux   Gerrit Graham     Frank Guida     Mickey Hart 
             16               3               4              36 
  Bruce Hornsby   Robert Hunter Bill Kreutzmann       Ned Lagin 
              4             313             100               3 
      Phil Lesh      Peter Monk   Brent Mydland     Dave Parker 
            149               3              41               7 
Robert Petersen          Pigpen     Joe Royster   Rob Wasserman 
             13              95               4              10 
       Bob Weir   Vince Welnick 
            213              13 
Show code
sna::degree(gd_network_stat)
 [1]  2  6 10  2  4  4 20 12 14  2  4 14  0 22 18  2 28  2  8 10  4 16
[23]  4 10 34  8

To look further I will create a dataframe in igraph first, then statnet:

Show code
ig_nodes<-data.frame(name=V(gd_network_ig)$name, degree=igraph::degree(gd_network_ig))

ig_nodes
                           name degree
Eric Andersen     Eric Andersen      3
John Barlow         John Barlow     81
Bob Bralove         Bob Bralove     14
Andrew Charles   Andrew Charles      3
John Dawson         John Dawson      4
Willie Dixon       Willie Dixon      4
Jerry Garcia       Jerry Garcia    328
Donna Godchaux   Donna Godchaux     12
Keith Godchaux   Keith Godchaux     16
Gerrit Graham     Gerrit Graham      3
Frank Guida         Frank Guida      4
Mickey Hart         Mickey Hart     36
Bruce Hornsby     Bruce Hornsby      4
Robert Hunter     Robert Hunter    313
Bill Kreutzmann Bill Kreutzmann    100
Ned Lagin             Ned Lagin      3
Phil Lesh             Phil Lesh    149
Peter Monk           Peter Monk      3
Brent Mydland     Brent Mydland     41
Dave Parker         Dave Parker      7
Robert Petersen Robert Petersen     13
Pigpen                   Pigpen     95
Joe Royster         Joe Royster      4
Rob Wasserman     Rob Wasserman     10
Bob Weir               Bob Weir    213
Vince Welnick     Vince Welnick     13
Show code
stat_nodes<-data.frame(name=gd_network_stat%v%"vertex.names", degree=sna::degree(gd_network_stat))

stat_nodes
              name degree
1    Eric Andersen      2
2      John Barlow      6
3      Bob Bralove     10
4   Andrew Charles      2
5      John Dawson      4
6     Willie Dixon      4
7     Jerry Garcia     20
8   Donna Godchaux     12
9   Keith Godchaux     14
10   Gerrit Graham      2
11     Frank Guida      4
12     Mickey Hart     14
13   Bruce Hornsby      0
14   Robert Hunter     22
15 Bill Kreutzmann     18
16       Ned Lagin      2
17       Phil Lesh     28
18      Peter Monk      2
19   Brent Mydland      8
20     Dave Parker     10
21 Robert Petersen      4
22          Pigpen     16
23     Joe Royster      4
24   Rob Wasserman     10
25        Bob Weir     34
26   Vince Welnick      8

The igraph and statnet dataframes give very different results.

A quick look at the summary statistics confirms for me the minimum, maximum, median, and mean node degree data using each package.

summary(ig_nodes)
     name               degree      
 Length:26          Min.   :  3.00  
 Class :character   1st Qu.:  4.00  
 Mode  :character   Median : 12.50  
                    Mean   : 56.77  
                    3rd Qu.: 71.00  
                    Max.   :328.00  
summary(stat_nodes)
     name               degree  
 Length:26          Min.   : 0  
 Class :character   1st Qu.: 4  
 Mode  :character   Median : 8  
                    Mean   :10  
                    3rd Qu.:14  
                    Max.   :34  

I’m taking a look at the dataframe of the degree nodes, though since it is not a directed network the in and out degrees are not measured or relevant to our network. But it is still interesting to look at how igraph and statnet handle these datasets differently.

Statnet

Show code
#create a dataframe of the total, in and out-degree of nodes in the stat network
gd_stat_nodes <- data.frame(name=gd_network_stat%v%"vertex.names",
    totdegree=sna::degree(gd_network_stat),
    indegree=sna::degree(gd_network_stat, cmode="indegree"),
    outdegree=sna::degree(gd_network_stat, cmode="outdegree"))

#sort the top total degree of nodes in the stat network
arrange(gd_stat_nodes, desc(totdegree))%>%slice(1:5)
             name totdegree indegree outdegree
1        Bob Weir        34       17        17
2       Phil Lesh        28       14        14
3   Robert Hunter        22       11        11
4    Jerry Garcia        20       10        10
5 Bill Kreutzmann        18        9         9

Igraph

Show code
#create a dataframe of the total, in and out-degree of nodes in the igraph network
gd_ig_nodes<-data.frame(name=V(gd_network_ig)$name, 
                     degree=igraph::degree(gd_network_ig), mode="tot",
                     degree=igraph::degree(gd_network_ig), mode="in",
                     degree=igraph::degree(gd_network_ig), mode="out")

#sort the top total degree of nodes in the igraph network
arrange(gd_ig_nodes, desc(degree))%>%slice(1:5)
                           name degree mode degree.1 mode.1 degree.2
Jerry Garcia       Jerry Garcia    328  tot      328     in      328
Robert Hunter     Robert Hunter    313  tot      313     in      313
Bob Weir               Bob Weir    213  tot      213     in      213
Phil Lesh             Phil Lesh    149  tot      149     in      149
Bill Kreutzmann Bill Kreutzmann    100  tot      100     in      100
                mode.2
Jerry Garcia       out
Robert Hunter      out
Bob Weir           out
Phil Lesh          out
Bill Kreutzmann    out

Overall Eigenvector Score

The Eigenvector centrality score for each node can be accessed by calling “vector”, and I can examine the top eigenvector scores in the igraph network:

#Eigenvector centrality, top 10 in igraph network

eigen_ig <- eigen_centrality(gd_network_ig)
eigen_gd_ig <- data.frame(eigen_ig)
arrange(eigen_gd_ig[1], desc(vector))%>%slice(1:10)
                    vector
Robert Hunter   1.00000000
Jerry Garcia    0.96094165
Bob Weir        0.18725953
Phil Lesh       0.15133380
Bill Kreutzmann 0.09223647
Pigpen          0.07985305
Mickey Hart     0.02523896
John Barlow     0.01773746
Keith Godchaux  0.01382256
Vince Welnick   0.01192303

Bonacich Power

The Bonacich power centrality score for each node can be accessed first just using defaults, including setting the index to “1”; then, I can “rescale” so that all of the scores sum “1”.

To display my results, I have to run the calculations and save the results as a dataframe to recall, since the command “bonpow()” is the same in igraph and statnet, which is causing trouble in running then knitting this file.

I need to understand more nuance to the Bonacich power measure in order to fully understand what these two measures say about my specific network.

Show code
#Compute Bonpow scores

#bp_ig1 <- bonpow(gd_network_ig) #with a default index of "1"
#bonpow_gd_ig1 <- data.frame(bp_ig1)
#write.csv(bonpow_gd_ig1, file = "bonpow_gd_ig1.csv")

#Rescaled so that they sum to "1"

#bp_ig2 <- bonpow(gd_network_ig, rescale = TRUE) #with a default index of "1"
#bonpow_gd_ig2 <- data.frame(bp_ig2)
#write.csv(bonpow_gd_ig2, file = "bonpow_gd_ig2.csv")
Show code
#Read in dataframe from previous chunk

bon1 <- read.csv("bonpow_gd_ig1.csv")
bon2 <- read.csv("bonpow_gd_ig2.csv")

totalbonpow <- merge(bon1,bon2)

totalbonpow
                 X      bp_ig1      bp_ig2
1   Andrew Charles  0.08220268  0.01522717
2  Bill Kreutzmann -0.70115475 -0.12988143
3      Bob Bralove -0.22064550 -0.04087222
4         Bob Weir -0.54308358 -0.10060043
5    Brent Mydland  0.52651322  0.09753095
6    Bruce Hornsby  0.00000000  0.00000000
7      Dave Parker -0.89144078 -0.16512988
8   Donna Godchaux  1.23038839  0.22791631
9    Eric Andersen -0.28021530 -0.05190689
10     Frank Guida  3.07056607  0.56878957
11   Gerrit Graham -0.28021530 -0.05190689
12    Jerry Garcia -0.25514168 -0.04726227
13     Joe Royster  3.07056607  0.56878957
14     John Barlow -0.31662818 -0.05865199
15     John Dawson  0.09708065  0.01798315
16  Keith Godchaux  1.17992241  0.21856802
17     Mickey Hart  0.15330194  0.02839755
18       Ned Lagin  0.08220268  0.01522717
19      Peter Monk  0.08220268  0.01522717
20       Phil Lesh -0.18066559 -0.03346637
21          Pigpen -0.52573655 -0.09738708
22   Rob Wasserman -0.41469644 -0.07681809
23   Robert Hunter -0.17351422 -0.03214166
24 Robert Petersen  1.11819222  0.20713317
25   Vince Welnick -0.07953575 -0.01473315
26    Willie Dixon -0.43204347 -0.08003144

Creating a dataframe summarizing all of this information and doing basic visualization on a couple of them:

Show code
gd_adjacency <- as.matrix(as_adjacency_matrix(gd_network_ig))
gd_adjacency_2 <- gd_adjacency %*% gd_adjacency

#calculate portion of reflected centrality
gd_reflective <- diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_reflective <- ifelse(is.nan(gd_reflective),0,gd_reflective)

#calculate derived centrality
gd_derived <- 1-diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_derived <- ifelse(is.nan(gd_derived),1,gd_derived)


centrality_gd <-data.frame(id=1:vcount(gd_network_ig),
                        name=V(gd_network_ig)$name,
                        degree_all=igraph::degree(gd_network_ig),
                        BC_power=power_centrality(gd_network_ig),
                        degree_norm=igraph::degree(gd_network_ig,normalized=T),
                        EV_cent=centr_eigen(gd_network_ig,directed = F)$vector,
                        reflect_EV=gd_reflective*centr_eigen(gd_network_ig,directed = F)$vector,
                        derive_EV=gd_derived*centr_eigen(gd_network_ig,directed = F)$vector)

row.names(centrality_gd)<-NULL
centrality_gd
   id            name degree_all    BC_power degree_norm      EV_cent
1   1   Eric Andersen          3 -0.28021530        0.12 6.852805e-04
2   2     John Barlow         81 -0.31662818        3.24 1.773746e-02
3   3     Bob Bralove         14 -0.22064550        0.56 8.992246e-03
4   4  Andrew Charles          3  0.08220268        0.12 5.538095e-04
5   5     John Dawson          4  0.09708065        0.16 7.176110e-03
6   6    Willie Dixon          4 -0.43204347        0.16 7.041156e-04
7   7    Jerry Garcia        328 -0.25514168       13.12 9.609417e-01
8   8  Donna Godchaux         12  1.23038839        0.48 5.313952e-03
9   9  Keith Godchaux         16  1.17992241        0.64 1.382256e-02
10 10   Gerrit Graham          3 -0.28021530        0.12 6.852805e-04
11 11     Frank Guida          4  3.07056607        0.16 2.932974e-04
12 12     Mickey Hart         36  0.15330194        1.44 2.523896e-02
13 13   Bruce Hornsby          4  0.00000000        0.16 2.574501e-17
14 14   Robert Hunter        313 -0.17351422       12.52 1.000000e+00
15 15 Bill Kreutzmann        100 -0.70115475        4.00 9.223647e-02
16 16       Ned Lagin          3  0.08220268        0.12 5.538095e-04
17 17       Phil Lesh        149 -0.18066559        5.96 1.513338e-01
18 18      Peter Monk          3  0.08220268        0.12 5.538095e-04
19 19   Brent Mydland         41  0.52651322        1.64 2.659589e-03
20 20     Dave Parker          7 -0.89144078        0.28 5.385443e-03
21 21 Robert Petersen         13  1.11819222        0.52 2.274921e-03
22 22          Pigpen         95 -0.52573655        3.80 7.985305e-02
23 23     Joe Royster          4  3.07056607        0.16 2.932974e-04
24 24   Rob Wasserman         10 -0.41469644        0.40 5.146870e-03
25 25        Bob Weir        213 -0.54308358        8.52 1.872595e-01
26 26   Vince Welnick         13 -0.07953575        0.52 1.192303e-02
     reflect_EV    derive_EV
1  8.512801e-06 0.0006767677
2  4.171627e-03 0.0135658315
3  2.393769e-04 0.0087528693
4  9.466828e-06 0.0005443426
5  4.752391e-05 0.0071285863
6  1.242557e-05 0.0006916900
7  3.326255e-01 0.6283162014
8  1.213231e-04 0.0051926286
9  2.487863e-04 0.0135737779
10 8.512801e-06 0.0006767677
11 1.113787e-05 0.0002821595
12 1.390973e-03 0.0238479844
13 1.593739e-17 0.0000000000
14 3.713275e-01 0.6286724511
15 9.710558e-03 0.0825259133
16 9.466828e-06 0.0005443426
17 2.214058e-02 0.1291932241
18 9.466828e-06 0.0005443426
19 6.119022e-04 0.0020476869
20 4.829994e-05 0.0053371431
21 1.438169e-04 0.0021311045
22 9.031643e-03 0.0708214079
23 1.113787e-05 0.0002821595
24 1.077879e-04 0.0050390825
25 4.070942e-02 0.1465501114
26 3.311952e-04 0.0115918330
attach(centrality_gd)
breaks<-round(vcount(gd_network_ig))
hist(degree_all,breaks=breaks,
     main=paste("Distribution of Total Degree Scores in GD Songwriters ",sep=""),
     xlab="Total Degree Score")
hist(EV_cent,breaks=breaks,
     main=paste("Distribution of Eigenvector Centrality Scores in GD Songwriters ",sep=""),
    xlab="Eigenvector Centrality Score")
hist(BC_power,breaks=breaks,
     main=paste("Distribution of Bonacich Power Scores in GD Songwriters",sep=""),
     xlab="Bonacich Power Score")

I can independently look at the correlations between all scores now. Using prompts from this week’s tutorial, it looks that all of the variables except Bonacich power are strongly correlated, so I think I’ll want to begin subsetting my network to get more meaninful interpretations.

names(centrality_gd) #Find the columns we want to run the correlation on
[1] "id"          "name"        "degree_all"  "BC_power"   
[5] "degree_norm" "EV_cent"     "reflect_EV"  "derive_EV"  
cols<-c(3:8) #All except the id and name in this instance
corMat<-cor(centrality_gd[,cols],use="complete.obs") #Specify those in the bracket
corMat #Let's look at it, which variables are most strongly correlated?
            degree_all   BC_power degree_norm    EV_cent reflect_EV
degree_all   1.0000000 -0.2782755   1.0000000  0.9131592  0.8729045
BC_power    -0.2782755  1.0000000  -0.2782755 -0.1782509 -0.1481903
degree_norm  1.0000000 -0.2782755   1.0000000  0.9131592  0.8729045
EV_cent      0.9131592 -0.1782509   0.9131592  1.0000000  0.9946549
reflect_EV   0.8729045 -0.1481903   0.8729045  0.9946549  1.0000000
derive_EV    0.9314936 -0.1943027   0.9314936  0.9983162  0.9869907
             derive_EV
degree_all   0.9314936
BC_power    -0.1943027
degree_norm  0.9314936
EV_cent      0.9983162
reflect_EV   0.9869907
derive_EV    1.0000000

However, I will also make a pretty visualization of the correlation matrix, because this is a lot of work, and fun should be had as well!

Citations:

Allan, Alex; Grateful Dead Lyric & Song Finder: https://whitegum.com/~acsa/intro.htm

ASCAP. 18 March 2022.

Dodd, David; The Annotated Grateful Dead Lyrics: http://artsites.ucsc.edu/gdead/agdl/

Schofield, Matt; The Grateful Dead Family Discography: http://www.deaddisc.com/

Photo by Grateful Dead Productions

This information is intended for private research only, and not for any commercial use. Original Grateful Dead songs are ©copyright Ice Nine Music