I am using a data set of co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled.

In this example, I used a node list where unique IDs are numbers which correspond to the name of a songwriter.

The edgelist is in a separate spreadsheet where the first two columns are the IDs of the source and the target node (songwriter ID), regardless of whether the network is directed, for each edge. Each row contains an observation of a connection between writers for a given song, and since there are multiple collaborations, there may be multiple rows of writer combinations for a given song ID. If there was only one writer on a song, that songwriter’s ID is indicated in both the source and target column for that song.

The following columns are edge attributes. In my edgelist, I have the two songwriters representing the co-writing relationship in columns “1” and “2”, the song ID in column “3”, the song name in column “4”, and the number of times the corresponding song was played live is indicated in column “5”.

I have NOT utilized the number of times the song was played live as a network weight at this point. Additionally, this edgelist format is not the ideal format, but it is the first step in the process I am working through to utilize different methods of working through the data. In the next post, I will use the data in the form of an affiliation matrix.

# Loading nodes and vertices

gd_vertices <- read.csv("_data/gd_nodes.csv")
gd_edgelist <- read.csv("_data/gd_clean_data.csv")

Converting network data into igraph objects using the “graph.data.frame: function, which takes two data frames: d and vertices.

“d” describes the edges of the network and “vertices” the nodes.

set.seed(1234)
grateful_data <- graph_from_data_frame(d = gd_edgelist, vertices = gd_vertices, directed = FALSE)

Now to check the vertices and edges in the graph I’ve created to ensure they represent the data accurately, and confirm that all of the attributes have been represented properly:

head(V(grateful_data)$name)
## [1] "Eric Andersen"  "John Barlow"    "Bob Bralove"    "Andrew Charles"
## [5] "John Dawson"    "Willie Dixon"
head(E(grateful_data)$song.id)
## [1] 1 2 2 2 2 2
head(E(grateful_data)$song.name)
## [1] "Alabama Getaway"     "Alice D Millionaire" "Alice D Millionaire"
## [4] "Alice D Millionaire" "Alice D Millionaire" "Alice D Millionaire"
head(E(grateful_data)$weight)
## NULL
is_directed(grateful_data)
## [1] FALSE
is_weighted(grateful_data)
## [1] FALSE
is_bipartite(grateful_data)
## [1] FALSE
igraph::vertex_attr_names(grateful_data)
## [1] "name"
igraph::edge_attr_names(grateful_data)
## [1] "song.id"      "song.name"    "times.played"

Next I want to take a first look at the network:

plot(grateful_data)

It’s basically plotting what I want it to illustrate, though I will need to do a lot more work to make the graph represent anything meaningful!

Finishing the look at the basic network information such as the dyad and triad census:

igraph::dyad.census(grateful_data)
## $mut
## [1] 558
## 
## $asym
## [1] 0
## 
## $null
## [1] -233
igraph::triad.census(grateful_data)
##  [1] 2043    0  233    0    0    0    0    0    0    0  237    0    0    0    0
## [16]   87

Knowing this network has 26 vertices, I want to see if the triad census is working correctly by comparing the following data, which I can confirm it is here!

#possible triads in network
26*25*24/6
## [1] 2600
sum(igraph::triad.census(grateful_data))
## [1] 2600

Looking next at the global v. average local transitivity of the network:

#get global clustering cofficient: igraph
transitivity(grateful_data, type="global")
## [1] 0.5240964
#get average local clustering coefficient: igraph
transitivity(grateful_data, type="average")
## [1] 0.7755587

This transitivity tells me that the average network transitivity is significantly higher than the global transitivity, indicating, from my still naive network knowledge, that the overall network is generally more loose, and that there is a more connected sub-network.

Looking at the geodesic distance tells me that on average, the path length is just over 2.

average.path.length(grateful_data,directed=F)
## [1] 2.01

Getting a look at the components of the network shows that there are 2 components in the network, and 25 of the 26 nodes make up the giant component with 1 isolate.

names(igraph::components(grateful_data))
## [1] "membership" "csize"      "no"
igraph::components(grateful_data)$no 
## [1] 2
igraph::components(grateful_data)$csize
## [1] 25  1

This is a great start - now I can get to looking at the network density, centrality, and centralization.

The network density measure: First with just the call “graph.density” and then with adding “loops=TRUE”. Since I’m using igraph, I know that its’ default output assumes that loops are not included but does not remove them, which can be corrected with the addition of “loops=TRUE” per the course tutorials when comparing output to statnet. This gives me confidence that my network density is closer to 1.58.

graph.density(grateful_data)
## [1] 1.716923
graph.density(grateful_data, loops=TRUE)
## [1] 1.589744

The network degree measure: This gives me a clear output showing the degree of each particular node (songwriter). It is not suprising, knowing my subject matter, that Jerry Garcia is the highest degree node in this network as the practical and figurative head of the band. The other band members’ degree measures are not necessarily what I expected, though. I did not anticipate that his songwriting partner, Robert Hunter, would have a lower degree than band members Phil Lesh and Bob Weir. Further, I did not anticipate that the degree measure of band member ‘Pigpen’ would be so high given his early death in the first years of the band’s touring life.

igraph::degree(grateful_data)
##   Eric Andersen     John Barlow     Bob Bralove  Andrew Charles     John Dawson 
##               1              30              12               1               2 
##    Willie Dixon    Jerry Garcia  Donna Godchaux  Keith Godchaux   Gerrit Graham 
##               2             215              16              19               1 
##     Frank Guida     Mickey Hart   Bruce Hornsby   Robert Hunter Bill Kreutzmann 
##               2              25               4             136             121 
##       Ned Lagin       Phil Lesh      Peter Monk   Brent Mydland     Dave Parker 
##               1             158               1              24              10 
## Robert Petersen          Pigpen     Joe Royster   Rob Wasserman        Bob Weir 
##               5             119               2              10             188 
##   Vince Welnick 
##              11

To look further I will create a dataframe for easier review going forward.

grateful_nodes<-data.frame(name=V(grateful_data)$name, degree=igraph::degree(grateful_data))

grateful_nodes
##                            name degree
## Eric Andersen     Eric Andersen      1
## John Barlow         John Barlow     30
## Bob Bralove         Bob Bralove     12
## Andrew Charles   Andrew Charles      1
## John Dawson         John Dawson      2
## Willie Dixon       Willie Dixon      2
## Jerry Garcia       Jerry Garcia    215
## Donna Godchaux   Donna Godchaux     16
## Keith Godchaux   Keith Godchaux     19
## Gerrit Graham     Gerrit Graham      1
## Frank Guida         Frank Guida      2
## Mickey Hart         Mickey Hart     25
## Bruce Hornsby     Bruce Hornsby      4
## Robert Hunter     Robert Hunter    136
## Bill Kreutzmann Bill Kreutzmann    121
## Ned Lagin             Ned Lagin      1
## Phil Lesh             Phil Lesh    158
## Peter Monk           Peter Monk      1
## Brent Mydland     Brent Mydland     24
## Dave Parker         Dave Parker     10
## Robert Petersen Robert Petersen      5
## Pigpen                   Pigpen    119
## Joe Royster         Joe Royster      2
## Rob Wasserman     Rob Wasserman     10
## Bob Weir               Bob Weir    188
## Vince Welnick     Vince Welnick     11

A quick look at the summary statistics confirms for me the minimum, maximum, median, and mean node degree data.

summary(grateful_nodes)
##      name               degree      
##  Length:26          Min.   :  1.00  
##  Class :character   1st Qu.:  2.00  
##  Mode  :character   Median : 10.50  
##                     Mean   : 42.92  
##                     3rd Qu.: 28.75  
##                     Max.   :215.00

Now I want to take a step back and try to visually represent this data better.

# Community detection algoritm 
community <- cluster_louvain(grateful_data) 

# Attach communities to relevant vertices
V(grateful_data)$color <- community$membership 

# Graph layout
layout <- layout.random(grateful_data) 

# igraph plot 
plot(grateful_data, layout = layout)

Better, but not quite.

ggraph(grateful_data, layout = "fr") +
  geom_edge_link() + 
  geom_node_point(aes(color = factor(color))) + 
  geom_node_text(aes(label = name), repel = TRUE) +
  theme_void() +
  theme(legend.position = "none") 

That is starting to look more meaningful!

# Set size to degree centrality 
V(grateful_data)$size = degree(grateful_data)

# Additional customisation for better legibility 
ggraph(grateful_data, layout = "fr") +
  geom_edge_arc(strength = 0.2, width = 0.5, alpha = 0.15) + 
  geom_node_point(aes(size = size, color = factor(color))) + 
  geom_node_text(aes(label = name, size = size), repel = TRUE) +
  theme_void() +
  theme(legend.position = "none") 

There is a lot more to do, but this is a great start.

Citations:

Allan, Alex; Grateful Dead Lyric & Song Finder: https://whitegum.com/~acsa/intro.htm

ASCAP. 18 March 2022.

Dodd, David; The Annotated Grateful Dead Lyrics: http://artsites.ucsc.edu/gdead/agdl/

Schofield, Matt; The Grateful Dead Family Discography: http://www.deaddisc.com/

This information is intended for private research only, and not for any commercial use. Original Grateful Dead songs are ©copyright Ice Nine Music