Analyzing Brokerage in Networks Using R

Practicum Setup

Networks either explicitly, or implicitly represent the flow of resources, information, influence or similar factors through a set of entities. It naturally follows that an analyst may wish to track the potential flow of those factors through the network. It can be particularly helpful to identify the paths or entities that hold the potential to facilitate, impede, or otherwise broker the flow of such factors.

We will be using data from Padgett and Ansell’s work on the Medici family. To download the data, go to the Network Data link on the course website. You will find the edgelist data (Padgett.csv) and the attribute data (party.csv) there.

This part is important. Please be sure to do this part again

Create a folder labeled “Brokerage” someplace on your computer, such as your Desktop or wherever you will be able to easily find it again. Then, set your working directory to that folder in RStudio:

Session
- Set Working Directory
- Choose Directory…

Getting Started

Start by loading igraph.

library(igraph)

Next, load the data.

padgett <- read.csv(file.choose(), header=FALSE)
g <- graph.data.frame(padgett, directed=FALSE)

Then add in the political party attribute for each of the Florentine families.

att <- read.csv(file.choose(), header=TRUE) # Read in "party.csv"
                                            # That is the attribute file
V(g)$party <- as.character(att$party[match(V(g)$name,att$family)]) # Match the attribute by family

V(g)$color <- V(g)$party     # We will color nodes by party
V(g)$color <- gsub("Medici","gold",V(g)$color) # Medicis are gold
V(g)$color <- gsub("Oligarch","green",V(g)$color) # Oligarchs are green
V(g)$color <- gsub("Split","gray70",V(g)$color) # Split affiliation is gray

plot.igraph(g, layout=layout.fruchterman.reingold, 
            vertex.label.cex=0.75)

Cutpoints

Determining Cut Points

The process of finding and plotting cutpoints in a network is a multi-step process.

The code below runs through the series of interdependent steps using a “pipe” (%>%) operator. Brendan is fond of the efficiency that these provide. So, rather than disturb the elegance of the code below, we’ll just explain what you are seeing, how to use it, and how you can learn more about it.

All the pipe really does is imply a chain. Think of the %>% symbol as piping an object forward into a function. The result can then be piped further. If you want to read it aloud, substitute %>% with and then apply it to…“. So, you would read the code below as saying:”Take the igraph network g and then apply it to the function ‘articulation points’, and then apply the result to the function ‘as.list’, and then apply that result to the function ‘names’, … you get the point.

g %>%
  articulation_points() %>%
  as.list() %>%
  names() %>%
  as.data.frame() %>%
  `colnames<-`("Cut Points")

##   Cut Points
## 1   GUADAGNI
## 2    ALBIZZI
## 3     MEDICI
## 4   SALVIATI

The code above works succinctly. But, if you are interested in disaggregating it a little, the following code does the same thing:

ap <- articulation_points(g)
cp <- as.list(ap)
Cut.Points <- names(cp)
cuts <- as.data.frame(Cut.Points)

Either method is fine. It is a matter of personal preference.

To learn more about piping, and how it can be used, see the help section ?'%>%'

Plotting Cut Points

As above, here, we are using the %in% operator, which lets you work with collections of objects. For more information on this operator, see “Checking Collections of Objects with %in%”, in the webpage “Logical Functions in R”. You can find that page under “Resources for SNA in R”, on the left-side navigation panel of the course website.

V(g)$color = ifelse(V(g) %in%
                    articulation_points(g),   
                    "salmon", "lightblue")

plot(g, vertex.label.cex=.75, edge.width = 5)

Bicomponents

Bicomponents are the portions of a network that cannot be disconnected by the removal of a single edge.

bc <- biconnected_components(g)
summary(bc)

##                     Length Class     Mode   
## no                  1      -none-    numeric
## tree_edges          6      -none-    list   
## component_edges     6      -none-    list   
## components          6      -none-    list   
## articulation_points 4      igraph.vs numeric

The output tells us that there is one bicomponent (no) in this network. There are six edges that attach the pendants and other subcomponents to the bicomponent (tree_edges). Component-Edges are sets of edgelists that describe each of the six subcomponents in this network. The components are actually lists of the nodes that are in each of the components - thus, there are six vertex sets. The articulation_points are the nodes that would disconnect the network if they were removed. There are four articulation points.

The output above is just a count of how many of each type of item are present. To access each, try bc$comonent_edges to actually see - in this case - the edgelists, or bc$components[3] to see just the third edgelist in the list.

Given that we have just one bicomponent, we can plot it pretty easily.

Plotting Largest Component (the Bicomponent)

biconnected_components(g)$components

## [[1]]
## + 2/16 vertices, named, from ef71bd1:
## [1] LAMBERTES GUADAGNI 
## 
## [[2]]
## + 2/16 vertices, named, from ef71bd1:
## [1] GINORI  ALBIZZI
## 
## [[3]]
## + 10/16 vertices, named, from ef71bd1:
##  [1] TORNABUON RIDOLFI   STROZZI   CASTELLAN BARBADORI PERUZZI   BISCHERI 
##  [8] GUADAGNI  ALBIZZI   MEDICI   
## 
## [[4]]
## + 2/16 vertices, named, from ef71bd1:
## [1] SALVIATI PAZZI   
## 
## [[5]]
## + 2/16 vertices, named, from ef71bd1:
## [1] SALVIATI MEDICI  
## 
## [[6]]
## + 2/16 vertices, named, from ef71bd1:
## [1] MEDICI    ACCIAIUOL

largest_component <- lapply(biconnected_components(g)$components, 
                            length) %>% which.max()
# largest_component

V(g)$color <- ifelse(V(g) %in%
                    biconnected_components(g)$components[[largest_component]],
                    "salmon","lightblue")

plot(g)

Reach Centrality

Think of this as how influential each node can be. We can operationalize this as the neighborhood around the node two, or three steps out. That gives us a rough idea of the node’s neighborhood. But, if you would like to get a direct approximation of a node’s weak ties, subtract degree from reach.

You have already had experience with reach centrality. However, the version you see below is slightly modified. The version you used earlier provided a normalized version of the reach function. Below, you will generate a count of the number of neighbors at two steps out.

# Function for 2-step reach


reach2<-function(x){
    r=vector(length=vcount(x))
    for (i in 1:vcount(x)){
    n=neighborhood(x,2,nodes=i)
    ni=unlist(n)
    l=length(ni)
    r[i]=(l)}
    r}

# Function for 3-step reach
reach3<-function(x){
    r=vector(length=vcount(x))
    for (i in 1:vcount(x)){
    n=neighborhood(x,3,nodes=i)
    ni=unlist(n)
    l=length(ni)
    r[i]=(l)}
    r}
# 
# # Now, run the calculations.
Reach_2 <- reach2(g)    # Note the differences between the object
Reach_3 <- reach3(g)    #   names and the function names!

Weak Ties

To see how many weak ties each node has, we first need to calculate how many nodes are in each node’s neighborhood at two steps out (Reach2). Then, we need only subtract the number of nodes that are ajacent to the node (degree).

Reach_2 - degree(g)

## ACCIAIUOL   ALBIZZI BARBADORI  BISCHERI CASTELLAN  GUADAGNI    MEDICI     PAZZI 
##         6         8         8         6         4         6         6         2 
##   PERUZZI   RIDOLFI     PUCCI    GINORI   STROZZI LAMBERTES TORNABUON  SALVIATI 
##         4         9        -1         3         5         4         8         6

Edge Betweenness

To plot the betweenness of an edge, use the following script.

E(g)$width <- edge_betweenness(g)


g <- graph.data.frame(padgett, directed=FALSE)

plot.igraph(g,
            edge.width = igraph::edge.betweenness(g)+1,              # The "+1" was added to make edgewidths non-zero.
            edge.color = heat.colors(igraph::edge.betweenness(g)+1), # The "+1" was added to make edgewidths non-zero.
            vertex.shape="sphere",  # Here, we are using sphere because it looks cool.
            vertex.size=20,
            vertex.label.font=2,    # Here, we are using bold font.
            vertex.color="lightgreen")

## IGRAPH 3996ce2 UN-- 16 21 -- 
## + attr: name (v/c)
## + edges from 3996ce2 (vertex names):
##  [1] ACCIAIUOL--MEDICI    ALBIZZI  --GINORI    ALBIZZI  --GUADAGNI 
##  [4] ALBIZZI  --MEDICI    BARBADORI--CASTELLAN BARBADORI--MEDICI   
##  [7] BISCHERI --GUADAGNI  BISCHERI --PERUZZI   BISCHERI --STROZZI  
## [10] CASTELLAN--PERUZZI   CASTELLAN--STROZZI   GUADAGNI --LAMBERTES
## [13] GUADAGNI --TORNABUON MEDICI   --RIDOLFI   MEDICI   --SALVIATI 
## [16] MEDICI   --TORNABUON PAZZI    --SALVIATI  PERUZZI  --STROZZI  
## [19] RIDOLFI  --STROZZI   RIDOLFI  --TORNABUON PUCCI    --PUCCI

For a list of the edge betweenness values:

Edge.Betweenness <- as.data.frame(edge_betweenness(g, directed = FALSE))
el <- get.edgelist(g)
el <- as.data.frame(el)
el$EB <- Edge.Betweenness
el

##           V1        V2 edge_betweenness(g, directed = FALSE)
## 1  ACCIAIUOL    MEDICI                             14.000000
## 2    ALBIZZI    GINORI                             14.000000
## 3    ALBIZZI  GUADAGNI                             16.333333
## 4    ALBIZZI    MEDICI                             22.333333
## 5  BARBADORI CASTELLAN                             12.500000
## 6  BARBADORI    MEDICI                             18.500000
## 7   BISCHERI  GUADAGNI                             17.166667
## 8   BISCHERI   PERUZZI                              7.500000
## 9   BISCHERI   STROZZI                              8.333333
## 10 CASTELLAN   PERUZZI                              6.000000
## 11 CASTELLAN   STROZZI                              5.500000
## 12  GUADAGNI LAMBERTES                             14.000000
## 13  GUADAGNI TORNABUON                             12.833333
## 14    MEDICI   RIDOLFI                             15.333333
## 15    MEDICI  SALVIATI                             26.000000
## 16    MEDICI TORNABUON                             12.833333
## 17     PAZZI  SALVIATI                             14.000000
## 18   PERUZZI   STROZZI                              4.500000
## 19   RIDOLFI   STROZZI                             14.333333
## 20   RIDOLFI TORNABUON                              5.000000
## 21     PUCCI     PUCCI                              0.000000

Burt’s Constraint

Constraint is based on each node’s egonetwork and it describes the limitations in brokerage potential that are inherent to having few or redundant ties. If a node only has access to a few alters, or if the ties that a node maintains are shared with others in their egonetwork, then it follows that they will have very little ability to access resources or information that others in their network don’t already have.

Keeping in mind that constraint varies between zero (0) and 1.125, we need only subtract a node’s constraint value from the maximum possible value of constraint to reflect their freedom to broker information or resources that are not already shared by others in their egonetwork.

const <- constraint(g)
invConstraint <- 1.125 - const  # (Inverse constraint = brokerage potential)
const

## ACCIAIUOL   ALBIZZI BARBADORI  BISCHERI CASTELLAN  GUADAGNI    MEDICI     PAZZI 
## 1.0000000 0.3333333 0.5000000 0.4822531 0.4822531 0.2500000 0.2098765 1.0000000 
##   PERUZZI   RIDOLFI     PUCCI    GINORI   STROZZI LAMBERTES TORNABUON  SALVIATI 
## 0.6558642 0.4598765 0.0000000 1.0000000 0.4583333 1.0000000 0.4598765 0.5000000

invConstraint # (Brokerage Potential)

## ACCIAIUOL   ALBIZZI BARBADORI  BISCHERI CASTELLAN  GUADAGNI    MEDICI     PAZZI 
## 0.1250000 0.7916667 0.6250000 0.6427469 0.6427469 0.8750000 0.9151235 0.1250000 
##   PERUZZI   RIDOLFI     PUCCI    GINORI   STROZZI LAMBERTES TORNABUON  SALVIATI 
## 0.4691358 0.6651235 1.1250000 0.1250000 0.6666667 0.1250000 0.6651235 0.6250000

Note the Pucci family. They have no ties at all. Isolates like the Pucci family in this network have an undefined constraint value and the constraint() function returns a value of zero (0). For that reason, when we convert constraint to inverse constraint (brokerage potential), isolates will appear to have the maximum possible brokerage potential. It is, therefore, recommended to remove isolates before running this analysis when isolates are present in a network.

Brokerage Roles

Igraph does not include a “brokerage roles” function. You will, therefore have to make use of the sna package in statnet.

As you may realize at this point, igraph and statnet do not always play well together. Each has a different style of data storage for saving networks. On the bright side, you are not the first to have this problem. The intergraph package was created to allow you to convert igraph networks into networks that work with statnet, and visa versa.

If you haven’t done so already, install intergraph to convert the network.

install.packages("intergraph", dependencies = TRUE)

Now, you can load the intergraph package and convert the igraph network that you have been using up to now.

library(intergraph)

We can convert an igraph network with an attribute to an statnet object. But first, add the “Party” attribute.

Now, convert the network into a statnet object.

net <- asNetwork(g)  # Convert igraph network into an sna object

Once the network has been converted, if you have not done so already, start statnet and run brokerage roles.

library(statnet)

The brokerage function in sna/statnet produces a lot of information. The only part of that information that we want in this case is the table that lists the number of times each node fulfills a particular type of brokerage role. For whatever reason, that table is called “raw.nli”.

The brokerage function in sna requires just two inputs: the name of the network (formatted for the sna package); and information about where to find the attribute (denoted as cl).

In the code below, we extract the attribute using get.vertex.attribute(net, "party") and use it for the cl argument. On the outside of the argument parentheses, we include $raw.nli to identify just the part of the output that we want.

brokerage(net, cl=get.vertex.attribute(net, "party"))$raw.nli

##           w_I w_O b_IO b_OI b_O  t
## ACCIAIUOL   0   0    0    0   0  0
## ALBIZZI     0   2    0    0   4  6
## BARBADORI   0   0    0    0   2  2
## BISCHERI    4   0    0    0   0  4
## CASTELLAN   0   0    2    2   0  4
## GUADAGNI    2   0    4    4   2 12
## MEDICI      4   6    9    9   0 28
## PAZZI       0   0    0    0   0  0
## PERUZZI     2   0    0    0   0  2
## RIDOLFI     0   0    2    2   0  4
## PUCCI       0   0    0    0   0  0
## GINORI      0   0    0    0   0  0
## STROZZI     2   0    3    3   0  8
## LAMBERTES   0   0    0    0   0  0
## TORNABUON   0   0    2    2   0  4
## SALVIATI    0   2    0    0   0  2

What we see above is the author’s shorthand for the various brokerage roles. For an overview of these roles, check out the help section for this function (?brokerage).

Briefly, you may interpret these roles as:

w_I is a coordinator role (A -> A -> A)
w_O is an “itinerant broker” (or a Consultant) (A -> B -> A)
b_{IO} is a representative (A -> B -> B)
b_{OI} is a gatekeeper (A -> A -> B)
b_O is a liaison (A -> B -> C)
t is simply the total number of times a node fills any of the above roles.

If you run the function without $raw.nli appended to the end, you will see that it produces fourteen different forms of output. Because there are so many output options in the brokerage algorithm - that you are free to explore as your needs change - it is worth mentioning that you can also produce a normalized score that will give the magnitude of the differences between nodes, rather than the raw number of times. Use this approach if you prefer to simplify the table by displaying how the nodes differ by order of magnitude.

The brokerage function does this by providing normalized output that is scaled on the z distribution, referred to “z scores.” Z scores are calculated by comparing each number to the average for the distribution and dividing by the standard deviation. \[ \begin{equation} z=\frac{x - \bar{x}} s \end{equation} \]

What this produces is a scale, where the center (or what is typical of the data) is set to zero (0), and the standard deviation is one (1). You will therefore, see both negative and positive values. To get an idea of what this gives you, consider that anything greater than 1.96 or less than -1.96 is significantly different from the “typical” at the p=0.05 level of significance. So, at a glance, you can see which nodes stand out as being either significantly greater or significantly less than the average for the network.

To produce the normalized scores, add z.nli to the function. It is also a good idea to use the round() function to reduce the number of digits and make the output more legible. An example of each is listed, but not run, below.

# normalized scores
brokerage(net, cl=get.vertex.attribute(net, "party"))$z.nli   
        # Not run in this page

# Normalized, rounded to 2 digits
round(brokerage(net, cl=get.vertex.attribute(net, "party"))$z.nli, 2)

##             w_I   w_O  b_IO  b_OI   b_O     t
## ACCIAIUOL -0.62 -0.70 -0.72 -0.72 -0.70 -1.12
## ALBIZZI   -0.22  0.14 -0.50 -0.50  0.92  0.20
## BARBADORI -0.22 -0.92 -0.50 -0.50 -0.02 -0.68
## BISCHERI   3.90 -0.76 -0.70 -0.70 -0.73 -0.24
## CASTELLAN -0.54 -0.76  0.45  0.45 -0.73 -0.24
## GUADAGNI   1.68 -0.76  1.60  1.60  0.71  1.53
## MEDICI     2.78  4.10  4.25  4.25 -0.70  5.06
## PAZZI     -0.62 -0.70 -0.72 -0.72 -0.70 -1.12
## PERUZZI    1.68 -0.76 -0.70 -0.70 -0.73 -0.68
## RIDOLFI   -0.62 -0.70  0.38  0.38 -0.70 -0.24
## PUCCI     -0.62 -0.70 -0.72 -0.72 -0.70 -1.12
## GINORI    -0.62 -0.70 -0.72 -0.72 -0.70 -1.12
## STROZZI    1.68 -0.76  1.03  1.03 -0.73  0.64
## LAMBERTES -0.54 -0.76 -0.70 -0.70 -0.73 -1.12
## TORNABUON -0.62 -0.70  0.38  0.38 -0.70 -0.24
## SALVIATI  -0.22  0.14 -0.50 -0.50 -0.95 -0.68

Note that the Medicis stand out as significantly different from all others in the network in almost all of the various roles.