I am using a data set of co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled.
In this example, I used a node list where unique IDs are numbers which correspond to the name of a songwriter.
The edgelist is in a separate spreadsheet where the first two columns are the IDs of the source and the target node (songwriter ID), regardless of whether the network is directed, for each edge. Each row contains an observation of a connection between writers for a given song, and since there are multiple collaborations, there may be multiple rows of writer combinations for a given song ID. If there was only one writer on a song, that songwriter’s ID is indicated in both the source and target column for that song.
The following columns are edge attributes. In my edgelist, I have the two songwriters representing the co-writing relationship in columns “1” and “2”, the song ID in column “3”, the song name in column “4”, and the number of times the corresponding song was played live is indicated in column “5”.
I have NOT utilized the number of times the song was played live as a network weight at this point. Additionally, this edgelist format is not the ideal format, but it is the first step in the process I am working through to utilize different methods of working through the data. In the next post, I will use the data in the form of an affiliation matrix.
# Loading nodes and vertices
gd_vertices <- read.csv("_data/gd_nodes.csv")
gd_edgelist <- read.csv("_data/gd_clean_data.csv")
Converting network data into igraph objects using the “graph.data.frame: function, which takes two data frames: d and vertices.
“d” describes the edges of the network and “vertices” the nodes.
set.seed(1234)
grateful_data <- graph_from_data_frame(d = gd_edgelist, vertices = gd_vertices, directed = FALSE)
Now to check the vertices and edges in the graph I’ve created to ensure they represent the data accurately, and confirm that all of the attributes have been represented properly:
head(V(grateful_data)$name)
## [1] "Eric Andersen" "John Barlow" "Bob Bralove" "Andrew Charles"
## [5] "John Dawson" "Willie Dixon"
head(E(grateful_data)$song.id)
## [1] 1 2 2 2 2 2
head(E(grateful_data)$song.name)
## [1] "Alabama Getaway" "Alice D Millionaire" "Alice D Millionaire"
## [4] "Alice D Millionaire" "Alice D Millionaire" "Alice D Millionaire"
head(E(grateful_data)$weight)
## NULL
is_directed(grateful_data)
## [1] FALSE
is_weighted(grateful_data)
## [1] FALSE
is_bipartite(grateful_data)
## [1] FALSE
igraph::vertex_attr_names(grateful_data)
## [1] "name"
igraph::edge_attr_names(grateful_data)
## [1] "song.id" "song.name" "times.played"
Next I want to take a first look at the network:
plot(grateful_data)
It’s basically plotting what I want it to illustrate, though I will need to do a lot more work to make the graph represent anything meaningful!
Finishing the look at the basic network information such as the dyad and triad census:
igraph::dyad.census(grateful_data)
## $mut
## [1] 558
##
## $asym
## [1] 0
##
## $null
## [1] -233
igraph::triad.census(grateful_data)
## [1] 2043 0 233 0 0 0 0 0 0 0 237 0 0 0 0
## [16] 87
Knowing this network has 26 vertices, I want to see if the triad census is working correctly by comparing the following data, which I can confirm it is here!
#possible triads in network
26*25*24/6
## [1] 2600
sum(igraph::triad.census(grateful_data))
## [1] 2600
Looking next at the global v. average local transitivity of the network:
#get global clustering cofficient: igraph
transitivity(grateful_data, type="global")
## [1] 0.5240964
#get average local clustering coefficient: igraph
transitivity(grateful_data, type="average")
## [1] 0.7755587
This transitivity tells me that the average network transitivity is significantly higher than the global transitivity, indicating, from my still naive network knowledge, that the overall network is generally more loose, and that there is a more connected sub-network.
Looking at the geodesic distance tells me that on average, the path length is just over 2.
average.path.length(grateful_data,directed=F)
## [1] 2.01
Getting a look at the components of the network shows that there are 2 components in the network, and 25 of the 26 nodes make up the giant component with 1 isolate.
names(igraph::components(grateful_data))
## [1] "membership" "csize" "no"
igraph::components(grateful_data)$no
## [1] 2
igraph::components(grateful_data)$csize
## [1] 25 1
This is a great start - now I can get to looking at the network density, centrality, and centralization.
The network density measure: First with just the call “graph.density” and then with adding “loops=TRUE”. Since I’m using igraph, I know that its’ default output assumes that loops are not included but does not remove them, which can be corrected with the addition of “loops=TRUE” per the course tutorials when comparing output to statnet. This gives me confidence that my network density is closer to 1.58.
graph.density(grateful_data)
## [1] 1.716923
graph.density(grateful_data, loops=TRUE)
## [1] 1.589744
The network degree measure: This gives me a clear output showing the degree of each particular node (songwriter). It is not suprising, knowing my subject matter, that Jerry Garcia is the highest degree node in this network as the practical and figurative head of the band. The other band members’ degree measures are not necessarily what I expected, though. I did not anticipate that his songwriting partner, Robert Hunter, would have a lower degree than band members Phil Lesh and Bob Weir. Further, I did not anticipate that the degree measure of band member ‘Pigpen’ would be so high given his early death in the first years of the band’s touring life.
igraph::degree(grateful_data)
## Eric Andersen John Barlow Bob Bralove Andrew Charles John Dawson
## 1 30 12 1 2
## Willie Dixon Jerry Garcia Donna Godchaux Keith Godchaux Gerrit Graham
## 2 215 16 19 1
## Frank Guida Mickey Hart Bruce Hornsby Robert Hunter Bill Kreutzmann
## 2 25 4 136 121
## Ned Lagin Phil Lesh Peter Monk Brent Mydland Dave Parker
## 1 158 1 24 10
## Robert Petersen Pigpen Joe Royster Rob Wasserman Bob Weir
## 5 119 2 10 188
## Vince Welnick
## 11
To look further I will create a dataframe for easier review going forward.
grateful_nodes<-data.frame(name=V(grateful_data)$name, degree=igraph::degree(grateful_data))
grateful_nodes
## name degree
## Eric Andersen Eric Andersen 1
## John Barlow John Barlow 30
## Bob Bralove Bob Bralove 12
## Andrew Charles Andrew Charles 1
## John Dawson John Dawson 2
## Willie Dixon Willie Dixon 2
## Jerry Garcia Jerry Garcia 215
## Donna Godchaux Donna Godchaux 16
## Keith Godchaux Keith Godchaux 19
## Gerrit Graham Gerrit Graham 1
## Frank Guida Frank Guida 2
## Mickey Hart Mickey Hart 25
## Bruce Hornsby Bruce Hornsby 4
## Robert Hunter Robert Hunter 136
## Bill Kreutzmann Bill Kreutzmann 121
## Ned Lagin Ned Lagin 1
## Phil Lesh Phil Lesh 158
## Peter Monk Peter Monk 1
## Brent Mydland Brent Mydland 24
## Dave Parker Dave Parker 10
## Robert Petersen Robert Petersen 5
## Pigpen Pigpen 119
## Joe Royster Joe Royster 2
## Rob Wasserman Rob Wasserman 10
## Bob Weir Bob Weir 188
## Vince Welnick Vince Welnick 11
A quick look at the summary statistics confirms for me the minimum, maximum, median, and mean node degree data.
summary(grateful_nodes)
## name degree
## Length:26 Min. : 1.00
## Class :character 1st Qu.: 2.00
## Mode :character Median : 10.50
## Mean : 42.92
## 3rd Qu.: 28.75
## Max. :215.00
Now I want to take a step back and try to visually represent this data better.
# Community detection algoritm
community <- cluster_louvain(grateful_data)
# Attach communities to relevant vertices
V(grateful_data)$color <- community$membership
# Graph layout
layout <- layout.random(grateful_data)
# igraph plot
plot(grateful_data, layout = layout)
Better, but not quite.
ggraph(grateful_data, layout = "fr") +
geom_edge_link() +
geom_node_point(aes(color = factor(color))) +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void() +
theme(legend.position = "none")
That is starting to look more meaningful!
# Set size to degree centrality
V(grateful_data)$size = degree(grateful_data)
# Additional customisation for better legibility
ggraph(grateful_data, layout = "fr") +
geom_edge_arc(strength = 0.2, width = 0.5, alpha = 0.15) +
geom_node_point(aes(size = size, color = factor(color))) +
geom_node_text(aes(label = name, size = size), repel = TRUE) +
theme_void() +
theme(legend.position = "none")
There is a lot more to do, but this is a great start.
Citations:
Allan, Alex; Grateful Dead Lyric & Song Finder: https://whitegum.com/~acsa/intro.htm
ASCAP. 18 March 2022.
Dodd, David; The Annotated Grateful Dead Lyrics: http://artsites.ucsc.edu/gdead/agdl/
Schofield, Matt; The Grateful Dead Family Discography: http://www.deaddisc.com/
This information is intended for private research only, and not for any commercial use. Original Grateful Dead songs are ©copyright Ice Nine Music