Networks either explicitly, or implicitly represent the flow of resources, information, influence or similar factors through a set of entities. It naturally follows that an analyst may wish to track the potential flow of those factors through the network. It can be particularly helpful to identify the paths or entities that hold the potential to facilitate, impede, or otherwise broker the flow of such factors.
We will be using data from Padgett and Ansell’s work on the Medici family. To download the data, go to the Network Data link on the course website. You will find the edgelist data (Padgett.csv) and the attribute data (party.csv) there.
Create a folder labeled “Brokerage” someplace on your computer, such as your Desktop or wherever you will be able to easily find it again. Then, set your working directory to that folder in RStudio:
Start by loading igraph.
library(igraph)
Next, load the data.
padgett <- read.csv(file.choose(), header=FALSE)
g <- graph.data.frame(padgett, directed=FALSE)
Then add in the political party attribute for each of the Florentine families.
att <- read.csv(file.choose(), header=TRUE) # Read in "party.csv"
# That is the attribute file
V(g)$party <- as.character(att$party[match(V(g)$name,att$family)]) # Match the attribute by family
V(g)$color <- V(g)$party # We will color nodes by party
V(g)$color <- gsub("Medici","gold",V(g)$color) # Medicis are gold
V(g)$color <- gsub("Oligarch","green",V(g)$color) # Oligarchs are green
V(g)$color <- gsub("Split","gray70",V(g)$color) # Split affiliation is gray
plot.igraph(g, layout=layout.fruchterman.reingold,
vertex.label.cex=0.75)
The process of finding and plotting cutpoints in a network is a multi-step process.
The code below runs through the series of interdependent steps using a “pipe” (%>%) operator. Brendan is fond of the efficiency that these provide. So, rather than disturb the elegance of the code below, we’ll just explain what you are seeing, how to use it, and how you can learn more about it.
All the pipe really does is imply a chain. Think of the %>% symbol as piping an object forward into a function. The result can then be piped further. If you want to read it aloud, substitute %>% with and then apply it to…“. So, you would read the code below as saying:”Take the igraph network g and then apply it to the function ‘articulation points’, and then apply the result to the function ‘as.list’, and then apply that result to the function ‘names’, … you get the point.
g %>%
articulation_points() %>%
as.list() %>%
names() %>%
as.data.frame() %>%
`colnames<-`("Cut Points")
## Cut Points
## 1 GUADAGNI
## 2 ALBIZZI
## 3 MEDICI
## 4 SALVIATI
The code above works succinctly. But, if you are interested in disaggregating it a little, the following code does the same thing:
ap <- articulation_points(g)
cp <- as.list(ap)
Cut.Points <- names(cp)
cuts <- as.data.frame(Cut.Points)
Either method is fine. It is a matter of personal preference.
To learn more about piping, and how it can be used, see the help section ?'%>%'
As above, here, we are using the %in% operator, which lets you work with collections of objects. For more information on this operator, see “Checking Collections of Objects with %in%”, in the webpage “Logical Functions in R”. You can find that page under “Resources for SNA in R”, on the left-side navigation panel of the course website.
V(g)$color = ifelse(V(g) %in%
articulation_points(g),
"salmon", "lightblue")
plot(g, vertex.label.cex=.75, edge.width = 5)
Bicomponents are the portions of a network that cannot be disconnected by the removal of a single edge.
bc <- biconnected_components(g)
summary(bc)
## Length Class Mode
## no 1 -none- numeric
## tree_edges 6 -none- list
## component_edges 6 -none- list
## components 6 -none- list
## articulation_points 4 igraph.vs numeric
The output tells us that there is one bicomponent (no) in this network. There are six edges that attach the pendants and other subcomponents to the bicomponent (tree_edges). Component-Edges are sets of edgelists that describe each of the six subcomponents in this network. The components are actually lists of the nodes that are in each of the components - thus, there are six vertex sets. The articulation_points are the nodes that would disconnect the network if they were removed. There are four articulation points.
The output above is just a count of how many of each type of item are present. To access each, try bc$comonent_edges to actually see - in this case - the edgelists, or bc$components[3] to see just the third edgelist in the list.
Given that we have just one bicomponent, we can plot it pretty easily.
biconnected_components(g)$components
## [[1]]
## + 2/16 vertices, named, from ef71bd1:
## [1] LAMBERTES GUADAGNI
##
## [[2]]
## + 2/16 vertices, named, from ef71bd1:
## [1] GINORI ALBIZZI
##
## [[3]]
## + 10/16 vertices, named, from ef71bd1:
## [1] TORNABUON RIDOLFI STROZZI CASTELLAN BARBADORI PERUZZI BISCHERI
## [8] GUADAGNI ALBIZZI MEDICI
##
## [[4]]
## + 2/16 vertices, named, from ef71bd1:
## [1] SALVIATI PAZZI
##
## [[5]]
## + 2/16 vertices, named, from ef71bd1:
## [1] SALVIATI MEDICI
##
## [[6]]
## + 2/16 vertices, named, from ef71bd1:
## [1] MEDICI ACCIAIUOL
largest_component <- lapply(biconnected_components(g)$components,
length) %>% which.max()
# largest_component
V(g)$color <- ifelse(V(g) %in%
biconnected_components(g)$components[[largest_component]],
"salmon","lightblue")
plot(g)
Think of this as how influential each node can be. We can operationalize this as the neighborhood around the node two, or three steps out. That gives us a rough idea of the node’s neighborhood. But, if you would like to get a direct approximation of a node’s weak ties, subtract degree from reach.
You have already had experience with reach centrality. However, the version you see below is slightly modified. The version you used earlier provided a normalized version of the reach function. Below, you will generate a count of the number of neighbors at two steps out.
# Function for 2-step reach
reach2<-function(x){
r=vector(length=vcount(x))
for (i in 1:vcount(x)){
n=neighborhood(x,2,nodes=i)
ni=unlist(n)
l=length(ni)
r[i]=(l)}
r}
# Function for 3-step reach
reach3<-function(x){
r=vector(length=vcount(x))
for (i in 1:vcount(x)){
n=neighborhood(x,3,nodes=i)
ni=unlist(n)
l=length(ni)
r[i]=(l)}
r}
#
# # Now, run the calculations.
Reach_2 <- reach2(g) # Note the differences between the object
Reach_3 <- reach3(g) # names and the function names!
To see how many weak ties each node has, we first need to calculate how many nodes are in each node’s neighborhood at two steps out (Reach2). Then, we need only subtract the number of nodes that are ajacent to the node (degree).
Reach_2 - degree(g)
## ACCIAIUOL ALBIZZI BARBADORI BISCHERI CASTELLAN GUADAGNI MEDICI PAZZI
## 6 8 8 6 4 6 6 2
## PERUZZI RIDOLFI PUCCI GINORI STROZZI LAMBERTES TORNABUON SALVIATI
## 4 9 -1 3 5 4 8 6
To plot the betweenness of an edge, use the following script.
E(g)$width <- edge_betweenness(g)
g <- graph.data.frame(padgett, directed=FALSE)
plot.igraph(g,
edge.width = igraph::edge.betweenness(g)+1, # The "+1" was added to make edgewidths non-zero.
edge.color = heat.colors(igraph::edge.betweenness(g)+1), # The "+1" was added to make edgewidths non-zero.
vertex.shape="sphere", # Here, we are using sphere because it looks cool.
vertex.size=20,
vertex.label.font=2, # Here, we are using bold font.
vertex.color="lightgreen")
g
## IGRAPH 3996ce2 UN-- 16 21 --
## + attr: name (v/c)
## + edges from 3996ce2 (vertex names):
## [1] ACCIAIUOL--MEDICI ALBIZZI --GINORI ALBIZZI --GUADAGNI
## [4] ALBIZZI --MEDICI BARBADORI--CASTELLAN BARBADORI--MEDICI
## [7] BISCHERI --GUADAGNI BISCHERI --PERUZZI BISCHERI --STROZZI
## [10] CASTELLAN--PERUZZI CASTELLAN--STROZZI GUADAGNI --LAMBERTES
## [13] GUADAGNI --TORNABUON MEDICI --RIDOLFI MEDICI --SALVIATI
## [16] MEDICI --TORNABUON PAZZI --SALVIATI PERUZZI --STROZZI
## [19] RIDOLFI --STROZZI RIDOLFI --TORNABUON PUCCI --PUCCI
For a list of the edge betweenness values:
Edge.Betweenness <- as.data.frame(edge_betweenness(g, directed = FALSE))
el <- get.edgelist(g)
el <- as.data.frame(el)
el$EB <- Edge.Betweenness
el
## V1 V2 edge_betweenness(g, directed = FALSE)
## 1 ACCIAIUOL MEDICI 14.000000
## 2 ALBIZZI GINORI 14.000000
## 3 ALBIZZI GUADAGNI 16.333333
## 4 ALBIZZI MEDICI 22.333333
## 5 BARBADORI CASTELLAN 12.500000
## 6 BARBADORI MEDICI 18.500000
## 7 BISCHERI GUADAGNI 17.166667
## 8 BISCHERI PERUZZI 7.500000
## 9 BISCHERI STROZZI 8.333333
## 10 CASTELLAN PERUZZI 6.000000
## 11 CASTELLAN STROZZI 5.500000
## 12 GUADAGNI LAMBERTES 14.000000
## 13 GUADAGNI TORNABUON 12.833333
## 14 MEDICI RIDOLFI 15.333333
## 15 MEDICI SALVIATI 26.000000
## 16 MEDICI TORNABUON 12.833333
## 17 PAZZI SALVIATI 14.000000
## 18 PERUZZI STROZZI 4.500000
## 19 RIDOLFI STROZZI 14.333333
## 20 RIDOLFI TORNABUON 5.000000
## 21 PUCCI PUCCI 0.000000
Constraint is based on each node’s egonetwork and it describes the limitations in brokerage potential that are inherent to having few or redundant ties. If a node only has access to a few alters, or if the ties that a node maintains are shared with others in their egonetwork, then it follows that they will have very little ability to access resources or information that others in their network don’t already have.
Keeping in mind that constraint varies between zero (0) and 1.125, we need only subtract a node’s constraint value from the maximum possible value of constraint to reflect their freedom to broker information or resources that are not already shared by others in their egonetwork.
const <- constraint(g)
invConstraint <- 1.125 - const # (Inverse constraint = brokerage potential)
const
## ACCIAIUOL ALBIZZI BARBADORI BISCHERI CASTELLAN GUADAGNI MEDICI PAZZI
## 1.0000000 0.3333333 0.5000000 0.4822531 0.4822531 0.2500000 0.2098765 1.0000000
## PERUZZI RIDOLFI PUCCI GINORI STROZZI LAMBERTES TORNABUON SALVIATI
## 0.6558642 0.4598765 0.0000000 1.0000000 0.4583333 1.0000000 0.4598765 0.5000000
invConstraint # (Brokerage Potential)
## ACCIAIUOL ALBIZZI BARBADORI BISCHERI CASTELLAN GUADAGNI MEDICI PAZZI
## 0.1250000 0.7916667 0.6250000 0.6427469 0.6427469 0.8750000 0.9151235 0.1250000
## PERUZZI RIDOLFI PUCCI GINORI STROZZI LAMBERTES TORNABUON SALVIATI
## 0.4691358 0.6651235 1.1250000 0.1250000 0.6666667 0.1250000 0.6651235 0.6250000
Note the Pucci family. They have no ties at all. Isolates like the Pucci family in this network have an undefined constraint value and the constraint() function returns a value of zero (0). For that reason, when we convert constraint to inverse constraint (brokerage potential), isolates will appear to have the maximum possible brokerage potential. It is, therefore, recommended to remove isolates before running this analysis when isolates are present in a network.
Igraph does not include a “brokerage roles” function. You will, therefore have to make use of the sna package in statnet.
As you may realize at this point, igraph and statnet do not always play well together. Each has a different style of data storage for saving networks. On the bright side, you are not the first to have this problem. The intergraph package was created to allow you to convert igraph networks into networks that work with statnet, and visa versa.
If you haven’t done so already, install intergraph to convert the network.
install.packages("intergraph", dependencies = TRUE)
Now, you can load the intergraph package and convert the igraph network that you have been using up to now.
library(intergraph)
We can convert an igraph network with an attribute to an statnet object. But first, add the “Party” attribute.
Now, convert the network into a statnet object.
net <- asNetwork(g) # Convert igraph network into an sna object
Once the network has been converted, if you have not done so already, start statnet and run brokerage roles.
library(statnet)
The brokerage function in sna/statnet produces a lot of information. The only part of that information that we want in this case is the table that lists the number of times each node fulfills a particular type of brokerage role. For whatever reason, that table is called “raw.nli”.
The brokerage function in sna requires just two inputs: the name of the network (formatted for the sna package); and information about where to find the attribute (denoted as cl).
In the code below, we extract the attribute using get.vertex.attribute(net, "party") and use it for the cl argument. On the outside of the argument parentheses, we include $raw.nli to identify just the part of the output that we want.
brokerage(net, cl=get.vertex.attribute(net, "party"))$raw.nli
## w_I w_O b_IO b_OI b_O t
## ACCIAIUOL 0 0 0 0 0 0
## ALBIZZI 0 2 0 0 4 6
## BARBADORI 0 0 0 0 2 2
## BISCHERI 4 0 0 0 0 4
## CASTELLAN 0 0 2 2 0 4
## GUADAGNI 2 0 4 4 2 12
## MEDICI 4 6 9 9 0 28
## PAZZI 0 0 0 0 0 0
## PERUZZI 2 0 0 0 0 2
## RIDOLFI 0 0 2 2 0 4
## PUCCI 0 0 0 0 0 0
## GINORI 0 0 0 0 0 0
## STROZZI 2 0 3 3 0 8
## LAMBERTES 0 0 0 0 0 0
## TORNABUON 0 0 2 2 0 4
## SALVIATI 0 2 0 0 0 2
What we see above is the author’s shorthand for the various brokerage roles. For an overview of these roles, check out the help section for this function (?brokerage).
Briefly, you may interpret these roles as:
If you run the function without $raw.nli appended to the end, you will see that it produces fourteen different forms of output. Because there are so many output options in the brokerage algorithm - that you are free to explore as your needs change - it is worth mentioning that you can also produce a normalized score that will give the magnitude of the differences between nodes, rather than the raw number of times. Use this approach if you prefer to simplify the table by displaying how the nodes differ by order of magnitude.
The brokerage function does this by providing normalized output that is scaled on the z distribution, referred to “z scores.” Z scores are calculated by comparing each number to the average for the distribution and dividing by the standard deviation. \[ \begin{equation} z=\frac{x - \bar{x}} s \end{equation} \]
What this produces is a scale, where the center (or what is typical of the data) is set to zero (0), and the standard deviation is one (1). You will therefore, see both negative and positive values. To get an idea of what this gives you, consider that anything greater than 1.96 or less than -1.96 is significantly different from the “typical” at the p=0.05 level of significance. So, at a glance, you can see which nodes stand out as being either significantly greater or significantly less than the average for the network.
To produce the normalized scores, add z.nli to the function. It is also a good idea to use the round() function to reduce the number of digits and make the output more legible. An example of each is listed, but not run, below.
# normalized scores
brokerage(net, cl=get.vertex.attribute(net, "party"))$z.nli
# Not run in this page
# Normalized, rounded to 2 digits
round(brokerage(net, cl=get.vertex.attribute(net, "party"))$z.nli, 2)
## w_I w_O b_IO b_OI b_O t
## ACCIAIUOL -0.62 -0.70 -0.72 -0.72 -0.70 -1.12
## ALBIZZI -0.22 0.14 -0.50 -0.50 0.92 0.20
## BARBADORI -0.22 -0.92 -0.50 -0.50 -0.02 -0.68
## BISCHERI 3.90 -0.76 -0.70 -0.70 -0.73 -0.24
## CASTELLAN -0.54 -0.76 0.45 0.45 -0.73 -0.24
## GUADAGNI 1.68 -0.76 1.60 1.60 0.71 1.53
## MEDICI 2.78 4.10 4.25 4.25 -0.70 5.06
## PAZZI -0.62 -0.70 -0.72 -0.72 -0.70 -1.12
## PERUZZI 1.68 -0.76 -0.70 -0.70 -0.73 -0.68
## RIDOLFI -0.62 -0.70 0.38 0.38 -0.70 -0.24
## PUCCI -0.62 -0.70 -0.72 -0.72 -0.70 -1.12
## GINORI -0.62 -0.70 -0.72 -0.72 -0.70 -1.12
## STROZZI 1.68 -0.76 1.03 1.03 -0.73 0.64
## LAMBERTES -0.54 -0.76 -0.70 -0.70 -0.73 -1.12
## TORNABUON -0.62 -0.70 0.38 0.38 -0.70 -0.24
## SALVIATI -0.22 0.14 -0.50 -0.50 -0.95 -0.68
Note that the Medicis stand out as significantly different from all others in the network in almost all of the various roles.