Practicum 8 - Structural Equavelence

Practicum Setup

Networks either explicitly, or implicitly represent the flow of resources, information, influence or similar factors through a set of entities. It naturally follows that an analyst may wish to track the potential flow of those factors through the network. It can be particularly helpful to identify the paths or entities that hold the potential to facilitate, impede, or otherwise broker the flow of such factors.

This part is important. Please be sure to do this part again

Create a folder labeled “Practicum 8” someplace on your computer, such as your Desktop or wherever you will be able to easily find it again. Then, set your working directory to that folder in RStudio:

Session
- Set Working Directory
- Choose Directory…

Getting Started

This time, we will be using two new packages - blockmodeling and concoR - in addition to statnet. Although igraph does not offer a great deal of analyses for structrual equivalence, it is exceptionally easy to use, and will plot networks faster and more easily than statnet.

We’ll, therefore, begin by loading igraph and statnet, and then go on to working with the other packages.

library(statnet)
library(igraph)

Next, load the data.

For the purpose of example, we will be using data from Padgett and Ansell’s work on the Medici family. To download the data, go to the Network Data link on the course website: https://goo.gl/3rkUK4. You will find the edgelist data (Padgett.csv) and the attribute data (mediciNames.csv) in at least one of the folders there.

For your own purposes, we strongly suggest that you try out any of the other networks that we have tried so far this semester. We are using Padgett’s Medici network here for two reasons: (1) you know it well at this point; and (2) is is a small enough network to be able visualize it well and clearly on a website.

el <- read.csv(file.choose(), header=FALSE)
g <- graph.data.frame(el, directed=FALSE) # load igraph data
net<-network(el, directed=FALSE, matrix.type="edgelist") # load statnet data

For now, our goals are modest. The topics to be covered in this module are:

Structural Equivalence
- Euclidian Distance
- CONCOR
- Blockmodel Optimization
Automorphic Equivalence (Both of the following options are the same as the ones above.)
- Blockmodel Optimization
- Euclidian Distance
Regular Equivalence
- REGE
- CATREGE
- Regular Equivalence Optimization

Structural Equivalence Methods

The first thing to keep in mind about structural equivalence is that your espectations of finding perfectly equivalent vertices should be low. Perfectly substitutable vertices are rare. So, we tend to mitigate our expectations and seek more roughly equivalent vertices.

CONCOR!

I am excited about this, since I did not realize that the program had been implemented in R. Sadly, however, it took me a few hours to get it working well enough to pass this on to you. Still, concoR is a pretty direct method for calculating structural equivalence.

The original CONCOR algorithm was developed by Ron Breiger, Scott Boorman, and Phipps Arabie. If you are interested, you can check out their original (1975) paper: “An Algorithm for Clustering Relational Data with Applications to Social Network Analysis and Comparison with Multidimensional Scaling. Journal of Mathematical Psychology, 12: 328–383. The original version was written in Fortran. Since then, Adam Slez has rewritten the program in R. You can find out more by checking his website: Bad Hessian.

Although it was developed with structural equivalence in mind, we tend to use CONCOR for equivalence in general, since we rarely expect to see true structrual equivalence in a network.

Because Adam Slez has not committed his concoR package to CRAN, we will have to install it from his github site. You will only have to do this once.

First, however, you will have to install devtools is you have not done so already.

install.packages("devtools", dependencies = TRUE)

# Load devtools
library(devtools)

# Install concoR
devtools::install_github("aslez/concoR")

You are now ready to use concoR.

We will be using concoR with both, statnet, and igraph. We will use statnet to draw the block diagram, and we’ll use igraph to draw the network (because it is easier in igraph). So, pay careful attention to where we are using g (for igraph network data) and net (for statnet network data). It matters.

CONCOR requires a matrix, or stack of matrices to make its calculations. So, start by loading concoR and extracting a matrix from an igraph network.

# load CONCOR
library(concoR)

# Convert the network to a matrix
mat <- as.matrix(get.adjacency(g))

If you like, take a look at how the various nodes are initially correlated. (Though, you can skip this step if you prefer. It won’t change the outcome in any way.)

m0 <- cor(mat) # Calculate a correlation matrix
round(m0, 2)   # Round the output to 2 significant digits

##           ACCIAIUOL ALBIZZI BARBADORI BISCHERI CASTELLAN GUADAGNI MEDICI
## ACCIAIUOL      1.00    0.54      0.68    -0.12     -0.12    -0.15  -0.20
## ALBIZZI        0.54    1.00      0.30     0.18     -0.23    -0.28  -0.37
## BARBADORI      0.68    0.30      1.00    -0.18     -0.18    -0.22  -0.29
## BISCHERI      -0.12    0.18     -0.18     1.00      0.59    -0.28  -0.37
## CASTELLAN     -0.12   -0.23     -0.18     0.59      1.00    -0.28  -0.04
## GUADAGNI      -0.15   -0.28     -0.22    -0.28     -0.28     1.00   0.15
## MEDICI        -0.20   -0.37     -0.29    -0.37     -0.04     0.15   1.00
## PAZZI         -0.07   -0.12     -0.10    -0.12     -0.12    -0.15   0.33
## PERUZZI       -0.12   -0.23      0.30     0.18      0.18     0.09  -0.37
## RIDOLFI        0.54    0.18      0.30     0.18      0.18     0.09  -0.04
## PUCCI         -0.07   -0.12     -0.10    -0.12     -0.12    -0.15  -0.20
## GINORI        -0.07   -0.12     -0.10    -0.12     -0.12     0.45   0.33
## STROZZI       -0.15   -0.28      0.22     0.09      0.09     0.00  -0.15
## LAMBERTES     -0.07    0.54     -0.10     0.54     -0.12    -0.15  -0.20
## TORNABUON      0.54    0.59      0.30     0.18     -0.23    -0.28  -0.04
## SALVIATI       0.68    0.30      0.43    -0.18     -0.18    -0.22  -0.29
##           PAZZI PERUZZI RIDOLFI PUCCI GINORI STROZZI LAMBERTES TORNABUON
## ACCIAIUOL -0.07   -0.12    0.54 -0.07  -0.07   -0.15     -0.07      0.54
## ALBIZZI   -0.12   -0.23    0.18 -0.12  -0.12   -0.28      0.54      0.59
## BARBADORI -0.10    0.30    0.30 -0.10  -0.10    0.22     -0.10      0.30
## BISCHERI  -0.12    0.18    0.18 -0.12  -0.12    0.09      0.54      0.18
## CASTELLAN -0.12    0.18    0.18 -0.12  -0.12    0.09     -0.12     -0.23
## GUADAGNI  -0.15    0.09    0.09 -0.15   0.45    0.00     -0.15     -0.28
## MEDICI     0.33   -0.37   -0.04 -0.20   0.33   -0.15     -0.20     -0.04
## PAZZI      1.00   -0.12   -0.12 -0.07  -0.07   -0.15     -0.07     -0.12
## PERUZZI   -0.12    1.00    0.18 -0.12  -0.12    0.46     -0.12     -0.23
## RIDOLFI   -0.12    0.18    1.00 -0.12  -0.12   -0.28     -0.12      0.18
## PUCCI     -0.07   -0.12   -0.12  1.00  -0.07   -0.15     -0.07     -0.12
## GINORI    -0.07   -0.12   -0.12 -0.07   1.00   -0.15     -0.07     -0.12
## STROZZI   -0.15    0.46   -0.28 -0.15  -0.15    1.00     -0.15      0.09
## LAMBERTES -0.07   -0.12   -0.12 -0.07  -0.07   -0.15      1.00      0.54
## TORNABUON -0.12   -0.23    0.18 -0.12  -0.12    0.09      0.54      1.00
## SALVIATI  -0.10   -0.18    0.30 -0.10  -0.10   -0.22     -0.10      0.30
##           SALVIATI
## ACCIAIUOL     0.68
## ALBIZZI       0.30
## BARBADORI     0.43
## BISCHERI     -0.18
## CASTELLAN    -0.18
## GUADAGNI     -0.22
## MEDICI       -0.29
## PAZZI        -0.10
## PERUZZI      -0.18
## RIDOLFI       0.30
## PUCCI        -0.10
## GINORI       -0.10
## STROZZI      -0.22
## LAMBERTES    -0.10
## TORNABUON     0.30
## SALVIATI      1.00

The more highly correlated nodes should already be somewhat apparent. But, you will have to sort through a lot of correlations to work out which nodes belong in which equivalence classes if you stop at this point.

Next, run the concoR algorithm to identify the various structrual equivalence “blocks.”

Functional note: The list() command in concor_hca is necessary if you are using only one matrix. The program was developed to work with arrays (lists of matrices), so it doesn’t play well with single matrices without this command.

blks <- concor_hca(list(mat), p = 2) 
blks

##    block    vertex
## 1      1 ACCIAIUOL
## 5      2   ALBIZZI
## 2      1 BARBADORI
## 6      2  BISCHERI
## 9      3 CASTELLAN
## 13     4  GUADAGNI
## 14     4    MEDICI
## 15     4     PAZZI
## 10     3   PERUZZI
## 3      1   RIDOLFI
## 11     3     PUCCI
## 16     4    GINORI
## 12     3   STROZZI
## 7      2 LAMBERTES
## 8      2 TORNABUON
## 4      1  SALVIATI

The output gives the vertex names, as well as the “blocks” or classes that each vertex was classified into. We can visualize this information in one of two ways: as a block matrix, or as a network visualization. We’ll do each below.

First, we can plot the network in statnet, using the blockmodel function. Note: We are using network data that was formatted for statnet here. (net)

# Visualize using "blockmodel" function in statnet
#    NOTE: use statnet "net" data
blk_mod <- blockmodel(net, blks$block) 
blk_mod

## 
## Network Blockmodel:
## 
## Block membership:
## 
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 
##  1  2  1  2  3  4  4  4  3  1  3  4  3  2  2  1 
## 
## Reduced form blockmodel:
## 
##   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
##         Block 1   Block 2   Block 3   Block 4
## Block 1  0.0000 0.0625000 0.3125000 0.0625000
## Block 2  0.0625 0.1666667 0.3750000 0.1875000
## Block 3  0.3125 0.3750000 0.3333333 0.0000000
## Block 4  0.0625 0.1875000 0.0000000 0.1666667

The “reduced form blockmodel” presents the density of each block in the matrix. Here, you can see that most of the relatively dense blocks are off the diagonal. These are peripheral roles (blocks 1 & 2) - linked to the core (block 3). Block 4 is a core without much of a periphery.

Plot the blockmodel representation for a better idea of how the vertices are ordered. We cannot fit the vertex names onto the block matrix without them running together, so we’ll use vertex numbers instead for visual clarity.

plot(blk_mod)

Next, we can visualize using igraph, since you are more used to visualizing in igraph at this point.

# Add the block designations to igraph data
V(g)$blocks <- blks$block

plot.igraph(g, vertex.color=V(g)$blocks) # plot in igraph

CONCOR tends to be - literally - categorical in its assignments, even when such a thing is not necessarily advisable. After all, the applicaiton of structural equivalence to a real world network is more of an exercise in mitigating ambiguity. We, therefore, have other ways of applying the concept to a network. One such tool is the use of blockmodels, which produce output like the block diagram above.

Optimization

Aleš Žiberna, a student of Vlado Batagelj, has written the blockmodeling package on R. You will need to install the package, but it is well worth the time (and cost!).

install.packages("blockmodeling", dependencies=TRUE)

The “optimization” approach is where you assign some number of random partitions and require the algorithm to re-sort the network to a point where the various blocks contain a best fit to the network. There are a number of possible options for soting the network under the approach command. Here, we use the sum of squares methods. Also, the blocks command asks the algorithm to find “complete” blocks (all 1s, and no 0s) if possible. Try it with and without this.

For more - and much better - information on your options with this algorithm, and generalized blockmodeling in general, check out his publicaiton on the topic: ŽIBERNA, Aleš (2007): Generalized Blockmodeling of Valued Networks. Social Networks, Jan. 2007, vol. 29, no. 1, 105-126.

library(blockmodeling)

mat <- as.matrix(get.adjacency(g)) # Extract the matrix in igraph (if you haven't already)

    # Try a two block partition.
class2 <- opt.random.par(M=mat, k=2, rep=10, approach="ss", blocks="com")
    # Tru a four block partition
class4 <- opt.random.par(M=mat, k=4, rep=10, approach="ss", blocks="com")

We can plot these side-by-side to compare them.

par(mfrow=c(1,2)) # set the plot window for one row and two columns
plot(class2, main="")
  title("Two Block Partition")
plot(class4, main="") 
  title("Four Block Partition")
par(mfrow=c(1,1)) # reset the plot window back to one row and one column

Explore the output a little.

# See what is packed into the outpute object
ls(class4)

# You will see:
# [1] "best"    "call"    "checked.par"   "err"   "initial.param" "M"   "nIter"     
#
class4$best  #  Look inside "best"

# The best partition can therefore be found in: 

class4$best$best1$clu

We can use the partition the same way we did in concoR. Add the partition to the network and plot it in igraph.

# Add the block designations to igraph data
V(g)$opt.blocks <- class4$best$best1$clu


plot.igraph(g, vertex.color=V(g)$opt.blocks) # plot in igraph

Euclidean Distance

Euclidean distance is a method of calculating similarities by comparing common distances from some nodes to other nodes. It may be used for either structural equivalence or automorphic equivalence.

Below is a modification of what Carter Butts wrote into sna (statnet).

## IN STATNET ## 
#Cluster based on structural equivalence

# BEWARE: The blockmodeling package interferes
#         with this function.
#         Detach blockmodeling before you begin. 
detach("package:blockmodeling", unload=TRUE)

eq<-equiv.clust(net,
                method="euclidean",
                mode="graph")

# Form a blockmodel 
b<-blockmodel(net, eq, h=4, mode = "graph") # h = 4 is the highest you can go for Medici
plot(b)      # Plot it

Once again, you can take the equivalency clusters (ec) that you generated above and use them in a network vsiualization.

Regular Equivalence

REGE

REGE is actually a set of algorithms that compute similarities - or dissimilarities - between vertices that equate to regular equivalence.

Check ?REGE for your options.

library(blockmodeling) # Reload the blockmodeling package

mat <- as.matrix(get.adjacency(g)) # Extract the matrix in igraph (if you haven't already)

D<-REGE(M=mat)$E 
plot.mat(mat, clu=cutree(hclust(d=as.dist(1-D),method="ward.D"),
   k=4))  #REGE returns similarities, which have to be converted to disimilarities

# Try another variation of REGE
D2<-REGE.ownm.for(M=mat)$E 
plot.mat(mat, clu=cutree(hclust(d=as.dist(1-D2),method="ward.D"),
k=4))

That is all for now. There is no specific deliverable for the practicum. But, be sure to try these with any of the networks that we have used so far in the term. Given the computationally intensive nature of these algorithms, however, you are advised to start with the smaller networks.

Good luck!